How to write a shellcode in assembly to display “Hello, world?”
After last week when I barely had time to scratch my head and I had to transfer my fiancee’s website to a different hosting (and finally learn something about how does the dns system work) I am back to coding for myself. I decided to go back to Pentester Academy’s shellcoding course and go on with the videos. This time the “hello world” program hit me again but in a very different manner.
The Task
Writing a program that prints “Hello, world!” on the screen isn’t any challenge. In C you can achieve this very easily using syscalls:
#include <unistd.h> #include <stdlib.h> int main() { char hello_world[] = "Hello, world!\n"; size_t len = 14; write(1, hello_world, len); exit(0); }
Define a character table, get its length, write to standard output (file descriptor 1), call exit with status code to terminate. This translates to more or less this in assembly:
global _start section .text _start: mov rax, 1 mov rdi, 1 mov rsi, hello_world mov rdx, len syscall mov rax, 60 mov rdi, 0 syscall section .data hello_world: db "Hello, world!",0xa len: equ $-hello, world
This is more tricky since you have to know to which register goes which value (for example rax
stores the number of the syscall – 1 for write
, 60 for exit
), there’s a trick used to determine the size of the buffer used to store the string, but it’s more or less readable.
The Problem
When shellcoding you can’t have any static data. Instead, you have to pass it dynamically.
The Solution
Let’s take a look at call
instruction. According to some random page that’s shown up as the first in Google’s search:
The CALL instruction performs two operations:
1. It pushes the return address (address immediately after the CALL instruction) on the stack.
2. It changes EIP to the call destination. This effectively transfers control to the call target and begins execution there.
The first point is very interesting because at this address we can define data and later call pop
to get this data’s address from the stack to use it.
global _start section .text _start: jmp call_shellcode shellcode: pop rsi mov rax, 1 mov rdx, 14 syscall mov rax, 60 mov rdi, 0 syscall call_shellcode: call shellcode hello_world: db 'Hello, world!', 0xa
What happens step by step is:
- from
_start
we jump tocall_shellcode
section, wherecall
instruction is followed byhello_world
label which will store the string we want to print. - The
call
instruction is executed moving the address of thehello_world
to the stack and going toshellcode
. - The address of
hello_world
is popped from the stack and saved inrsi
register, which keeps the buffer address forwrite
. - Other registers are filled with proper values, including
rdx
which stores buffer length (which we can’t determine in advance now).
Another problem
Let’s dump the output:
$ objdump -M intel -D shellcode shellcode: file format elf64-x86-64 Disassembly of section .text: 0000000000400080 <_start>: 400080: eb 19 jmp 40009b <call_shellcode> 0000000000400082 <shellcode>: 400082: 5e pop rsi 400083: b8 01 00 00 00 mov eax,0x1 400088: ba 0e 00 00 00 mov edx,0xe 40008d: 0f 05 syscall 40008f: b8 3c 00 00 00 mov eax,0x3c 400094: bf 00 00 00 00 mov edi,0x0 400099: 0f 05 syscall 000000000040009b <call_shellcode>: 40009b: e8 e2 ff ff ff call 400082 <shellcode> 00000000004000a0 <hello_world>: 4000a0: 48 rex.W 4000a1: 65 6c gs ins BYTE PTR es:[rdi],dx 4000a3: 6c ins BYTE PTR es:[rdi],dx 4000a4: 6f outs dx,DWORD PTR ds:[rsi] 4000a5: 2c 20 sub al,0x20 4000a7: 77 6f ja 400118 <hello_world+0x78> 4000a9: 72 6c jb 400117 <hello_world+0x77> 4000ab: 64 21 0a and DWORD PTR fs:[rdx],ecx
What’s wrong with it? Well, it contains null bytes, which are bad because null byte terminates strings. I have absolutely zero experience with it, but since Vivek says it’s true I must believe him at least until I will cease to be such a noob 😉
How can it be fixed? For example by using different instructions. Typically mov
is a good point to start because it writes entire values to registers and since the upper bytes are unused, they’re filled with zeros. Unless you explicitly tell to use only lower parts of the registers and zero the upper ones with xor reg, reg
instructions. You can observe it here:
- The
xor rax, rax
is used to empty therax
register, mov al, 1
is used to set its value to 1, but only using lower parts available,- this value is copied to
rdx
which is later incremented by 13 withadd
instruction in order to set it to desired value - and so on.
global _start section .text _start: jmp call_shellcode shellcode: pop rsi xor rax, rax mov al, 1 mov rdx, rax add rdx, 13 syscall xor rax, rax mov rdi, rax mov al, 60 syscall call_shellcode: call shellcode hello_world: db 'Hello, world!', 0xa
Effect?
$ objdump -M intel -D shellcode shellcode: file format elf64-x86-64 Disassembly of section .text: 0000000000400080 <_start>: 400080: eb 19 jmp 40009b <call_shellcode> 0000000000400082 <shellcode>: 400082: 5e pop rsi 400083: 48 31 c0 xor rax,rax 400086: b0 01 mov al,0x1 400088: 48 89 c2 mov rdx,rax 40008b: 48 83 c2 0d add rdx,0xd 40008f: 0f 05 syscall 400091: 48 31 c0 xor rax,rax 400094: 48 89 c7 mov rdi,rax 400097: b0 3c mov al,0x3c 400099: 0f 05 syscall 000000000040009b <call_shellcode>: 40009b: e8 e2 ff ff ff call 400082 <shellcode> 00000000004000a0 <hello_world>: 4000a0: 48 rex.W 4000a1: 65 6c gs ins BYTE PTR es:[rdi],dx 4000a3: 6c ins BYTE PTR es:[rdi],dx 4000a4: 6f outs dx,DWORD PTR ds:[rsi] 4000a5: 2c 20 sub al,0x20 4000a7: 77 6f ja 400118 <hello_world+0x78> 4000a9: 72 6c jb 400117 <hello_world+0x77> 4000ab: 64 21 0a and DWORD PTR fs:[rdx],ecx
Can we get better though? Sure. Let’s see how long is this shellcode.
$ ./test Shellcode length: 46 Hello, world!
Let’s write proper value to rdx using mov dl, 14
instead of adding.
global _start section .text _start: jmp call_shellcode shellcode: pop rsi mov al, 1 xor rdx, rdx mov dl, 14 syscall xor rax, rax mov rdi, rax mov al, 60 syscall call_shellcode: call shellcode hello_world: db 'Hello, world!', 0xa
This time when we test it, it turns out that the output is 41 bytes long:
$ ./test Shellcode length: 41 Hello, world!
Cool. 5 bytes saved. This is important to have short shellcodes because when used as payloads, they must fit into sometimes very narrow spaces.
Summary
It turns out that writing “Hello, world!” properly is not always as simple as it seems. The number of tricks required to perform this task properly makes it really hard fun.
As always, you can find code samples on my Gitlab, support me on Patreon and subscribe to the newsletter.
Also, visit Pentester Academy to learn more.
Happy hacking.