A Very Different Hello World

After last week when I barely had time to scratch my head and I had to transfer my fiancee’s website to a different hosting (and finally learn something about how does the dns system work) I am back to coding for myself. I decided to go back to Pentester Academy’s shellcoding course and go on with the videos. This time the “hello world” program hit me again but in a very different manner.

The Task

Writing a program that prints “Hello, world!” on the screen isn’t any challenge. In C you can achieve this very easily using syscalls:

#include <unistd.h>
#include <stdlib.h>

int main() {
    char hello_world[] = "Hello, world!\n";
    size_t len = 14;
    write(1, hello_world, len);
    exit(0);
}

Define a character table, get its length, write to standard output (file descriptor 1), call exit with status code to terminate. This translates to more or less this in assembly:

global _start

section .text

_start:
    mov rax, 1
    mov rdi, 1
    mov rsi, hello_world
    mov rdx, len
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

section .data
    hello_world: db "Hello, world!",0xa
    len: equ $-hello, world

This is more tricky since you have to know to which register goes which value (for example rax stores the number of the syscall – 1 for write, 60 for exit), there’s a trick used to determine the size of the buffer used to store the string, but it’s more or less readable.

The Problem

When shellcoding you can’t have any static data. Instead, you have to pass it dynamically.

The Solution

Let’s take a look at call instruction. According to some random page that’s shown up as the first in Google’s search:

The CALL instruction performs two operations:
1. It pushes the return address (address immediately after the CALL instruction) on the stack.
2. It changes EIP to the call destination. This effectively transfers control to the call target and begins execution there.

The first point is very interesting because at this address we can define data and later call pop to get this data’s address from the stack to use it.

global _start

section .text

_start:
    jmp call_shellcode

shellcode:
    pop rsi
    mov rax, 1
    mov rdx, 14
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

call_shellcode:
    call shellcode
    hello_world: db 'Hello, world!', 0xa

What happens step by step is:

  1. from _start we jump to call_shellcode section, where call instruction is followed by hello_world label which will store the string we want to print.
  2. The call instruction is executed moving the address of the hello_world to the stack and going to shellcode.
  3. The address of hello_world is popped from the stack and saved in rsi register, which keeps the buffer address for write.
  4. Other registers are filled with proper values, including rdx which stores buffer length (which we can’t determine in advance now).

Another problem

Let’s dump the output:

$ objdump -M intel -D shellcode

shellcode:     file format elf64-x86-64


Disassembly of section .text:

0000000000400080 <_start>:
  400080:	eb 19                	jmp    40009b <call_shellcode>

0000000000400082 <shellcode>:
  400082:	5e                   	pop    rsi
  400083:	b8 01 00 00 00       	mov    eax,0x1
  400088:	ba 0e 00 00 00       	mov    edx,0xe
  40008d:	0f 05                	syscall 
  40008f:	b8 3c 00 00 00       	mov    eax,0x3c
  400094:	bf 00 00 00 00       	mov    edi,0x0
  400099:	0f 05                	syscall 

000000000040009b <call_shellcode>:
  40009b:	e8 e2 ff ff ff       	call   400082 <shellcode>

00000000004000a0 <hello_world>:
  4000a0:	48                   	rex.W
  4000a1:	65 6c                	gs ins BYTE PTR es:[rdi],dx
  4000a3:	6c                   	ins    BYTE PTR es:[rdi],dx
  4000a4:	6f                   	outs   dx,DWORD PTR ds:[rsi]
  4000a5:	2c 20                	sub    al,0x20
  4000a7:	77 6f                	ja     400118 <hello_world+0x78>
  4000a9:	72 6c                	jb     400117 <hello_world+0x77>
  4000ab:	64 21 0a             	and    DWORD PTR fs:[rdx],ecx

What’s wrong with it? Well, it contains null bytes, which are bad because null byte terminates strings. I have absolutely zero experience with it, but since Vivek says it’s true I must believe him at least until I will cease to be such a noob 😉

How can it be fixed? For example by using different instructions. Typically mov is a good point to start because it writes entire values to registers and since the upper bytes are unused, they’re filled with zeros. Unless you explicitly tell to use only lower parts of the registers and zero the upper ones with xor reg, reg instructions. You can observe it here:

  • The xor rax, rax is used to empty the rax register,
  • mov al, 1 is used to set its value to 1, but only using lower parts available,
  • this value is copied to rdx which is later incremented by 13 with add instruction in order to set it to desired value
  • and so on.
global _start

section .text

_start:
    jmp call_shellcode

shellcode:
    pop rsi
    xor rax, rax
    mov al, 1
    mov rdx, rax
    add rdx, 13
    syscall

    xor rax, rax
    mov rdi, rax
    mov al, 60
    syscall

call_shellcode:
    call shellcode
    hello_world: db 'Hello, world!', 0xa

Effect?

$ objdump -M intel -D shellcode

shellcode:     file format elf64-x86-64


Disassembly of section .text:

0000000000400080 <_start>:
  400080:       eb 19                   jmp    40009b <call_shellcode>

0000000000400082 <shellcode>:
  400082:       5e                      pop    rsi
  400083:       48 31 c0                xor    rax,rax
  400086:       b0 01                   mov    al,0x1
  400088:       48 89 c2                mov    rdx,rax
  40008b:       48 83 c2 0d             add    rdx,0xd
  40008f:       0f 05                   syscall 
  400091:       48 31 c0                xor    rax,rax
  400094:       48 89 c7                mov    rdi,rax
  400097:       b0 3c                   mov    al,0x3c
  400099:       0f 05                   syscall 

000000000040009b <call_shellcode>:
  40009b:       e8 e2 ff ff ff          call   400082 <shellcode>

00000000004000a0 <hello_world>:
  4000a0:       48                      rex.W
  4000a1:       65 6c                   gs ins BYTE PTR es:[rdi],dx
  4000a3:       6c                      ins    BYTE PTR es:[rdi],dx
  4000a4:       6f                      outs   dx,DWORD PTR ds:[rsi]
  4000a5:       2c 20                   sub    al,0x20
  4000a7:       77 6f                   ja     400118 <hello_world+0x78>
  4000a9:       72 6c                   jb     400117 <hello_world+0x77>
  4000ab:       64 21 0a                and    DWORD PTR fs:[rdx],ecx

Can we get better though? Sure. Let’s see how long is this shellcode.

$ ./test 
Shellcode length: 46
Hello, world!

Let’s write proper value to rdx using mov dl, 14 instead of adding.

global _start

section .text

_start:
    jmp call_shellcode

shellcode:
    pop rsi
    mov al, 1
    xor rdx, rdx
    mov dl, 14
    syscall

    xor rax, rax
    mov rdi, rax
    mov al, 60
    syscall

call_shellcode:
    call shellcode
    hello_world: db 'Hello, world!', 0xa

This time when we test it, it turns out that the output is 41 bytes long:

$ ./test 
Shellcode length: 41
Hello, world!

Cool. 5 bytes saved. This is important to have short shellcodes because when used as payloads, they must fit into sometimes very narrow spaces.

Summary

It turns out that writing “Hello, world!” properly is not always as simple as it seems. The number of tricks required to perform this task properly makes it really hard fun.

As always, you can find code samples on my Gitlab.

Also, visit Pentester Academy to learn more.

Happy hacking.