The carriage return (CR) case
I encountered a strange behaviour by testing text input processing functions, where I thought I found a vulnerability. Some feedback and history about characters management.
The characters
The history with introduction to security
Representation
Make the experience, open a file textbook.txt
, write this content:
This is my wonderful textbook!
Save it then display it: cat textbook.txt
. You should see your text, nothing exceptional.
Some explanations: as you may know, in most of the cases, all characters are 8-bits single bytes that can be represented as hexadecimal.
You can display the hexadecimal representation of your file with xxd
:
❯ xxd textbook.txt
00000000: 5468 6973 2069 7320 6d79 2077 6f6e 6465 This is my wonde
00000010: 7266 756c 2074 6578 7462 6f6f 6b21 0a rful textbook!.
The T
is represented by 54
. The end-of-line (Or LF
, Line Feed) is represented by 0A
.
Yes, in the jungle of encoding schemes, you could encounter some outliers like characters encoded on 7-bits bytes (Original ASCII) or two bytes or more (For special characters, emoji, ...) ... Let's keep it simple today.
Here are the 128 first characters in ASCII table:
Hexadecimal | Character |
---|---|
00 | NUL |
01 | SOH |
02 | STX |
03 | ETX |
04 | EOT |
05 | ENQ |
06 | ACK |
07 | BEL |
08 | BS |
09 | HT |
0A | LF |
0B | VT |
0C | FF |
0D | CR |
0E | SO |
0F | SI |
10 | DLE |
11 | DC1 |
12 | DC2 |
13 | DC3 |
14 | DC4 |
15 | NAK |
16 | SYN |
17 | ETB |
18 | CAN |
19 | EM |
1A | SUB |
1B | ESC |
1C | FS |
1D | GS |
1E | RS |
1F | US |
20 | SP |
21 | ! |
22 | " |
23 | # |
24 | $ |
25 | % |
26 | & |
27 | ' |
28 | ( |
29 | ) |
2A | * |
2B | + |
2C | , |
2D | - |
2E | . |
2F | / |
30 | 0 |
31 | 1 |
32 | 2 |
33 | 3 |
34 | 4 |
35 | 5 |
36 | 6 |
37 | 7 |
38 | 8 |
39 | 9 |
3A | : |
3B | ; |
3C | < |
3D | = |
3E | > |
3F | ? |
40 | @ |
41 | A |
42 | B |
43 | C |
44 | D |
45 | E |
46 | F |
47 | G |
48 | H |
49 | I |
4A | J |
4B | K |
4C | L |
4D | M |
4E | N |
4F | O |
50 | P |
51 | Q |
52 | R |
53 | S |
54 | T |
55 | U |
56 | V |
57 | W |
58 | X |
59 | Y |
5A | Z |
5B | [ |
5C | | |
5D | ] |
5E | ^ |
5F | _ |
60 | ` |
61 | a |
62 | b |
63 | c |
64 | d |
65 | e |
66 | f |
67 | g |
68 | h |
69 | i |
6A | j |
6B | k |
6C | l |
6D | m |
6E | n |
6F | o |
70 | p |
71 | q |
72 | r |
73 | s |
74 | t |
75 | u |
76 | v |
77 | w |
78 | x |
79 | y |
7A | z |
7B | { |
7C | | |
7D | } |
7E | ~ |
7F | DEL |
The unprintable
You can notice there are two kind of categories:
- Printable characters: a, B, 1, 2, !, etc.
- Unprintable characters: NUL, BEL, ACK, BS, TAB, ...
Where do the unprintable characters come from?. Most of these characters come from older times, where old computers and teletypes were the kings and queens (or Teleprinter).
These characters instruct the interpreter for special behavior, like the good old typewriter for "Ring a bell" or "Return to the beginning of the text". They are also called Control characters. All of them also can be represented by Caret notation, and some may be represented using C escape sequence:
Hexadecimal | Character | Caret notation | C escape sequence |
---|---|---|---|
00 | NUL | ^@ | \0 |
01 | SOH | ^A | |
02 | STX | ^B | |
03 | ETX | ^C | |
04 | EOT | ^D | |
05 | ENQ | ^E | |
06 | ACK | ^F | |
07 | BEL | ^G | \a |
08 | BS | ^H | \b |
09 | HT | ^I | \t |
0A | LF | ^J | \n |
0B | VT | ^K | \v |
0C | FF | ^L | \f |
0D | CR | ^M | \r |
0E | SO | ^N | |
0F | SI | ^O | |
10 | DLE | ^P | |
11 | DC1 | ^Q | |
12 | DC2 | ^R | |
13 | DC3 | ^S | |
14 | DC4 | ^T | |
15 | NAK | ^U | |
16 | SYN | ^V | |
17 | ETB | ^W | |
18 | CAN | ^X | |
19 | EM | ^Y | |
1A | SUB | ^Z | |
1B | ESC | ^[ | \e |
1C | FS | ^| | |
1D | GS | ^] | |
1E | RS | ^^ | |
1F | US | ^_ | |
7F | DEL | ^? |
The ASCII page on Wikipedia is a gold mine about this.
These characters have been used for more than 50 years, and are still fully used today in all basic I/O communication at low level.
We may one day talk about (pseudo)TTY behaviors, you can refer to this excellent article from Guillaume Quéré about The oldest privesc: injecting careless administrators' terminals using TTY pushback and Linus Akesson about The TTY demystified.
Managing the unprintable characters
When text processing comes in, everything breaks. We need exceptions, awful if/else, whitelisting, regex (they do not hurt, stay here), ... to manage control characters (block, replace, etc.). Beyond this, some characters may even have a double meaning (ex: control character 09
for text tabulation).
Without countermeasures, here some winners in cybersecurity:
- Null Byte Injection
- CRLF injection
- Embedded tab evasion
- And a lot more.
What a nightmare uh? Because these characters are not natural, they are sometimes ignored by developers, thinking control characters are actually well-managed at low-level states which is not always the case (by purpose, by mistake or by ignorance).
A lot of security holes come from missing character management. We are only talking about 1 or 2 bytes. That is beautiful. That is also why I like cybersecurity.
About character testing
I will not especially cover specific injection types like the CRLF one. I think you got it after reading that first part: in fact, all control characters, and by extension unprintable characters, are candidates for injection during text processing. Some of them are more special than others, because they are more prevalent (NUL for strings, LF for lines, ...).
It is always interesting to inject these control characters during critical text processing (Authentication, user management, session management, text editor, etc.).
Some ideas and methods:
- For HTTP(S), I like to use the Intruder from Burp Pro with numbers or hex list.
- For serial or specific TCP, I like to run my own boofuzz template or specific library if necessary (scapy, ...).
- For RPC / IPC, I like implementing my own callers (C, shell, python, ...) but you can also use standard fuzzers (AFLplusplus, syzkaller, ...). msfvenom mixed with badchars generator can also be used for payloads generation, but it is very context-dependent.
The case
The security test
I was testing the security of a user management feature.
An HTTP endpoint permits adding a user to the system, which involves editing the /etc/passwd
file (found after some reverse engineering).
Here the general steps of the current implementation:
- API endpoint callable with HTTP POST with parameters like the username, the password, the groups and the user-friendly name (GECOS).
- Its implementation calls an internal function which ends up invoking
putpwent
from glibc (man), in order to edit/etc/passwd
.
A vulnerability may be discovered in two ways:
- Try to find a vulnerability regarding the workflow between (1) and (2).
- Try to find an exploit regarding directly
putpwent
to target the system. This where I identified some parameters "not well controlled" by the caller.
I like the last way, and I also wanted to understand how the putpwent
function works.
The Results
Within the well-prepared environment, create a simple caller:
#include <stdio.h>
#include <pwd.h>
int main() {
FILE *passwdFile = fopen("./etc_passwd", "a");
if (passwdFile == NULL) {
perror("Error on reading file");
return 1;
}
struct passwd userEntry;
userEntry.pw_name = "standard";
userEntry.pw_passwd = "userpw";
userEntry.pw_uid = 1001;
userEntry.pw_gid = 1001;
userEntry.pw_gecos = "Simple user";
userEntry.pw_dir = "/home/user1";
userEntry.pw_shell = "/bin/sh";
putpwent(&userEntry, passwdFile);
fclose(passwdFile);
return 0;
}
After executing the previous code, the file etc_passwd
will contain:
standard:userpw:1001:1001:Simple user:/home/user1:/bin/sh
Fine. Nominal case.
Now, let's try to inject some control characters and :
as it acts as a separator.
- The function is not executed for
\n
:
userEntry.pw_name = "stand\nard";
:
is replaced by a space in the final file:
userEntry.pw_gecos = "Simple:user";
- For
\r
(CR):
userEntry.pw_gecos = "Simple\ruser";
I got:
❯ cat etc_passwd
user:/home/user1:/bin/sh1:Simple
Hm. What? A possible vulnerability in glibc for password entry write function ? I could not believe it (and I feel like I shouldn’t; this is too easy).
Let's dive in glibc source code. In fact, there is effectively just a "simple" filter on semi-colon and LF characters around nss/valid_field.c
:
#include <nss.h>
#include <string.h>
const char __nss_invalid_field_characters[] = NSS_INVALID_FIELD_CHARACTERS;
/* Check that VALUE is either NULL or a NUL-terminated string which
does not contain characters not permitted in NSS database
fields. */
_Bool
__nss_valid_field (const char *value)
{
return value == NULL
|| strpbrk (value, __nss_invalid_field_characters) == NULL;
}
Where NSS_INVALID_FIELD_CHARACTERS
is # define NSS_INVALID_FIELD_CHARACTERS ":\n"
.
Sure, the vulnerability could only be fully exploited if the password entry function accepts that format (Like getpwent
).
I have managed to create a special passwd
file from a possible bad usage of putpwent
function resulting in abcd:mypwd:88:0:random
.
It could potentially be used as a local privilege escalation vector.
If I create a /etc/passwd
file based with this result: it works (sudo -u abcd id
gives me the IDs of the user), but I use the full chain exploit attempt, it does not (real editing of /etc/password
file through putpwent
then sudo -u abcd id
. The user does not exist.).
WHY?
The failure
I just forgot to avoid to mixing up text processing programs with pure I/O programs.
Let’s go back to our last generated passwd
file:
❯ cat etc_passwd
user:/home/user1:/bin/sh1:Simple
Display with -A
:
❯ cat -A etc_passwd
standard:userpw:1001:1001:Simple^Muser:/home/user1:/bin/sh$
I was stunned. Just tricked by (my own) ^M
character.
Keep in mind that the historical job of that control character is to do a carriage return.
So, with default options, cat
produces a good output, by only showing "printable" characters and process control characters in conformity with your tty. For this case:
- Write
standard:userpw:1001:1001:Simple
- Process
^M
: go back to the beginning of the buffer (or return to the beginning of the line like a typewriter). - Write
user:/home/user1:/bin/sh
- We got the final string:
user:/home/user1:/bin/sh1:Simple
.
It is also a good reminder that
printf
should be preferred toecho
for I/O.echo
is dedicated to text (man), and an LF control character is automatically appended by default (this behavior can be disabled if you pass the option-n
). It could mess up some of your scripts. Example:# Incorrect ❯ echo "wonderful" | base64 d29uZGVyZnVsCg== # Correct ❯ printf "wonderful" | base64 d29uZGVyZnVs
Conclusion and learnings
Do not mix pure I/O and text processing programs or functions. For the security of your programs and your peace of mind.
Ultimately, prefer checking and comparing function output with agnostic format (Hexadecimal (xxd)) combined with checksum (sha256sum).
This little experience is kind of human proof-of-concept that, even with technical knowledge about characters management, errors ((bad) usage, (mis)understanding) about text processing, it can sometimes lead to strange outputs, followed by hasty conclusions, and associated bugs. What a waste of time.
Now multiply that situation by N
where N
is the amount of people, langages, programs, fixes, errors, standards that have been around for 50 years.
Character management, an infinite source of work for developer & cybersecurity.