The carriage return (CR) case

The characters

The history with introduction to security

Representation

Make the experience, open a file textbook.txt, write this content:

This is my wonderful textbook!

Save it then display it: cat textbook.txt. You should see your text, nothing exceptional.

Some explanations: as you may know, in most of the cases, all characters are 8-bits single bytes that can be represented as hexadecimal.
You can display the hexadecimal representation of your file with xxd:

❯ xxd textbook.txt 
00000000: 5468 6973 2069 7320 6d79 2077 6f6e 6465  This is my wonde
00000010: 7266 756c 2074 6578 7462 6f6f 6b21 0a    rful textbook!.

The T is represented by 54. The end-of-line (Or LF, Line Feed) is represented by 0A.

Yes, in the jungle of encoding schemes, you could encounter some outliers like characters encoded on 7-bits bytes (Original ASCII) or two bytes or more (For special characters, emoji, ...) ... Let's keep it simple today.

Here are the 128 first characters in ASCII table:

Hexadecimal	Character
00	NUL
01	SOH
02	STX
03	ETX
04	EOT
05	ENQ
06	ACK
07	BEL
08	BS
09	HT
0A	LF
0B	VT
0C	FF
0D	CR
0E	SO
0F	SI
10	DLE
11	DC1
12	DC2
13	DC3
14	DC4
15	NAK
16	SYN
17	ETB
18	CAN
19	EM
1A	SUB
1B	ESC
1C	FS
1D	GS
1E	RS
1F	US
20	SP
21	!
22	"
23	#
24	$
25	%
26	&
27	'
28	(
29	)
2A	*
2B	+
2C	,
2D	-
2E	.
2F	/
30	0
31	1
32	2
33	3
34	4
35	5
36	6
37	7
38	8
39	9
3A	:
3B	;
3C	<
3D	=
3E	>
3F	?
40	@
41	A
42	B
43	C
44	D
45	E
46	F
47	G
48	H
49	I
4A	J
4B	K
4C	L
4D	M
4E	N
4F	O
50	P
51	Q
52	R
53	S
54	T
55	U
56	V
57	W
58	X
59	Y
5A	Z
5B	[
5C	\|
5D	]
5E	^
5F	_
60	`
61	a
62	b
63	c
64	d
65	e
66	f
67	g
68	h
69	i
6A	j
6B	k
6C	l
6D	m
6E	n
6F	o
70	p
71	q
72	r
73	s
74	t
75	u
76	v
77	w
78	x
79	y
7A	z
7B	{
7C	\|
7D	}
7E	~
7F	DEL

The unprintable

You can notice there are two kind of categories:

Printable characters: a, B, 1, 2, !, etc.
Unprintable characters: NUL, BEL, ACK, BS, TAB, ...

Where do the unprintable characters come from?. Most of these characters come from older times, where old computers and teletypes were the kings and queens (or Teleprinter).

These characters instruct the interpreter for special behavior, like the good old typewriter for "Ring a bell" or "Return to the beginning of the text". They are also called Control characters. All of them also can be represented by Caret notation, and some may be represented using C escape sequence:

Hexadecimal	Character	Caret notation	C escape sequence
00	NUL	^@	\0
01	SOH	^A
02	STX	^B
03	ETX	^C
04	EOT	^D
05	ENQ	^E
06	ACK	^F
07	BEL	^G	\a
08	BS	^H	\b
09	HT	^I	\t
0A	LF	^J	\n
0B	VT	^K	\v
0C	FF	^L	\f
0D	CR	^M	\r
0E	SO	^N
0F	SI	^O
10	DLE	^P
11	DC1	^Q
12	DC2	^R
13	DC3	^S
14	DC4	^T
15	NAK	^U
16	SYN	^V
17	ETB	^W
18	CAN	^X
19	EM	^Y
1A	SUB	^Z
1B	ESC	^[	\e
1C	FS	^\|
1D	GS	^]
1E	RS	^^
1F	US	^_
7F	DEL	^?

The ASCII page on Wikipedia is a gold mine about this.

These characters have been used for more than 50 years, and are still fully used today in all basic I/O communication at low level.

We may one day talk about (pseudo)TTY behaviors, you can refer to this excellent article from Guillaume Quéré about The oldest privesc: injecting careless administrators' terminals using TTY pushback and Linus Akesson about The TTY demystified.

Managing the unprintable characters

When text processing comes in, everything breaks. We need exceptions, awful if/else, whitelisting, regex (they do not hurt, stay here), ... to manage control characters (block, replace, etc.). Beyond this, some characters may even have a double meaning (ex: control character 09 for text tabulation).

Without countermeasures, here some winners in cybersecurity:

bomb

What a nightmare uh? Because these characters are not natural, they are sometimes ignored by developers, thinking control characters are actually well-managed at low-level states which is not always the case (by purpose, by mistake or by ignorance).

A lot of security holes come from missing character management. We are only talking about 1 or 2 bytes. That is beautiful. That is also why I like cybersecurity.

About character testing

I will not especially cover specific injection types like the CRLF one. I think you got it after reading that first part: in fact, all control characters, and by extension unprintable characters, are candidates for injection during text processing. Some of them are more special than others, because they are more prevalent (NUL for strings, LF for lines, ...).

It is always interesting to inject these control characters during critical text processing (Authentication, user management, session management, text editor, etc.).

Some ideas and methods:

For HTTP(S), I like to use the Intruder from Burp Pro with numbers or hex list.

For serial or specific TCP, I like to run my own boofuzz template or specific library if necessary (scapy, ...).

For RPC / IPC, I like implementing my own callers (C, shell, python, ...) but you can also use standard fuzzers (AFLplusplus, syzkaller, ...). msfvenom mixed with badchars generator can also be used for payloads generation, but it is very context-dependent.

The case

The security test

I was testing the security of a user management feature.
An HTTP endpoint permits adding a user to the system, which involves editing the /etc/passwd file (found after some reverse engineering).
Here the general steps of the current implementation:

API endpoint callable with HTTP POST with parameters like the username, the password, the groups and the user-friendly name (GECOS).
Its implementation calls an internal function which ends up invoking putpwent from glibc (man), in order to edit /etc/passwd.

A vulnerability may be discovered in two ways:

Try to find a vulnerability regarding the workflow between (1) and (2).
Try to find an exploit regarding directly putpwent to target the system. This where I identified some parameters "not well controlled" by the caller.

I like the last way, and I also wanted to understand how the putpwent function works.

The Results

Within the well-prepared environment, create a simple caller:

#include <stdio.h>
#include <pwd.h>

int main() {
    FILE *passwdFile = fopen("./etc_passwd", "a");
    if (passwdFile == NULL) {
        perror("Error on reading file");
        return 1;
    }

    struct passwd userEntry;

    userEntry.pw_name = "standard";
    userEntry.pw_passwd = "userpw";
    userEntry.pw_uid = 1001;
    userEntry.pw_gid = 1001;
    userEntry.pw_gecos = "Simple user";
    userEntry.pw_dir = "/home/user1";
    userEntry.pw_shell = "/bin/sh";

    putpwent(&userEntry, passwdFile);

    fclose(passwdFile);

    return 0;
}

After executing the previous code, the file etc_passwd will contain:

standard:userpw:1001:1001:Simple user:/home/user1:/bin/sh

Fine. Nominal case.

Now, let's try to inject some control characters and : as it acts as a separator.

The function is not executed for \n:

    userEntry.pw_name = "stand\nard";

: is replaced by a space in the final file:

    userEntry.pw_gecos = "Simple:user";

For \r (CR):

    userEntry.pw_gecos = "Simple\ruser";

I got:

❯ cat etc_passwd 
user:/home/user1:/bin/sh1:Simple

Hm. What? A possible vulnerability in glibc for password entry write function ? I could not believe it (and I feel like I shouldn’t; this is too easy).

Let's dive in glibc source code. In fact, there is effectively just a "simple" filter on semi-colon and LF characters around nss/valid_field.c:

#include <nss.h>
#include <string.h>

const char __nss_invalid_field_characters[] = NSS_INVALID_FIELD_CHARACTERS;

/* Check that VALUE is either NULL or a NUL-terminated string which
   does not contain characters not permitted in NSS database
   fields.  */
_Bool
__nss_valid_field (const char *value)
{
  return value == NULL
    || strpbrk (value, __nss_invalid_field_characters) == NULL;
}

Where NSS_INVALID_FIELD_CHARACTERS is # define NSS_INVALID_FIELD_CHARACTERS ":\n".

Sure, the vulnerability could only be fully exploited if the password entry function accepts that format (Like getpwent).

I have managed to create a special passwd file from a possible bad usage of putpwent function resulting in abcd:mypwd:88:0:random.
It could potentially be used as a local privilege escalation vector.

If I create a /etc/passwd file based with this result: it works (sudo -u abcd id gives me the IDs of the user), but I use the full chain exploit attempt, it does not (real editing of /etc/password file through putpwent then sudo -u abcd id. The user does not exist.).

WHY?

The failure

I just forgot to avoid to mixing up text processing programs with pure I/O programs.

Let’s go back to our last generated passwd file:

❯ cat etc_passwd 
user:/home/user1:/bin/sh1:Simple

Display with -A:

❯ cat -A etc_passwd
standard:userpw:1001:1001:Simple^Muser:/home/user1:/bin/sh$

I was stunned. Just tricked by (my own) ^M character.

Keep in mind that the historical job of that control character is to do a carriage return.

So, with default options, cat produces a good output, by only showing "printable" characters and process control characters in conformity with your tty. For this case:

Write standard:userpw:1001:1001:Simple
Process ^M: go back to the beginning of the buffer (or return to the beginning of the line like a typewriter).
Write user:/home/user1:/bin/sh
We got the final string: user:/home/user1:/bin/sh1:Simple.

It is also a good reminder that printf should be preferred to echo for I/O. echo is dedicated to text (man), and an LF control character is automatically appended by default (this behavior can be disabled if you pass the option -n). It could mess up some of your scripts. Example:
# Incorrect
❯ echo "wonderful" | base64
d29uZGVyZnVsCg==
# Correct
❯ printf "wonderful" | base64
d29uZGVyZnVs

Conclusion and learnings

Do not mix pure I/O and text processing programs or functions. For the security of your programs and your peace of mind.

Ultimately, prefer checking and comparing function output with agnostic format (Hexadecimal (xxd)) combined with checksum (sha256sum).

This little experience is kind of human proof-of-concept that, even with technical knowledge about characters management, errors ((bad) usage, (mis)understanding) about text processing, it can sometimes lead to strange outputs, followed by hasty conclusions, and associated bugs. What a waste of time. Now multiply that situation by N where N is the amount of people, langages, programs, fixes, errors, standards that have been around for 50 years.

Character management, an infinite source of work for developer & cybersecurity.