Go up to the main DADA homeworks page (md)
In this assignment, you learn and demonstrate how buffer overflow vulnerabilities can be exploited. Assignment Resources
Your submission on the assignment must work in a 64-bit Ubuntu 16.04 LTS environment like you installed on your VM. This is the environment where we will test your submitted code.
Ubuntu (and most current Linux environments) have ASLR (Address Space Layout Randomization) enabled as a mitigation against buffer overflow and similar exploits. For this assignment, you will work with this feature disabled. To turn it off run the command:
setarch x86_64 -v -RL bash
This will run a shell (i.e. command prompt) with ASLR disabled. It does not affect any other shells. We also have provided you a binary without other mitigations that would usually be used these days including non-executable stacks and stack canaries. Later in the semester we will talk about these mitigations and how they can be defeated.
Examine the supplied dumbledore.exe file. It
contains an obvious buffer overrun vulnerability in the
GetGradeFromInput function, which calls the C standard
library function gets. gets, as its manpage
documents, does not check the length of the buffer supplied as an
argument as is unsafe.
Create a file name data.txt containing your name and run
dumbledore.exe
./dumbledore.exe < data.txt
Thank you, Aaron Bloomfield.
I recommend that you get a grade of F on this assignment.
./dumbledore.exe <data.txt
Thank you, Aaron Bloomfield.
I recommend that you get a grade of A on this assignment.
To do this, you will use the stack smashing technique we discussed in class. There are several strategies to write the machine code run by this attack, which have extensive hints below:
PrintGradeAndExit
function in the supplied executable. To do this, you should be careful
to set the stack pointer is less than the address of your machine code,
so this function does not corrupt your machine code/data when it
executes. This is probably the easiest solution.Note that the location of the stack pointer can vary slightly when your environment changes. See the section “Variations in the location of the stack pointer” under hints below. Because of this, you should plan on using a NOP sled so you don’t have to precisely predict the address of the stack pointer.
Rather than submit the input file alone, we’d like you to submit
a C program attack.c, that will generate the input. This C
file can include comments that explain how the exploit works (which
might any sort of partial credit/figuring out if our test environment
diagrees with your environment/etc. possible). An example file which
produces a normal (non-exploit input) is:
#include <stdio.h>
int main(void) {
/* Just have the name */
printf("Thomas Jefferson\n");
return 0;
}
This would be used to run the program like
$ ./attack-program.exe > data.txt
$ ./dumbledore.exe < data.txt
Thank you, Thomas Jefferson.
I recommend that you get a grade of F on this assignment.
You will find it very confusing if you are
not running your commands from a shell started with
setarch x86_64 -RL bash. In particular, the stack will have
inconsistent addresses, and your program will just segfault every
time.
A useful starting point is using objdump to
disassemble the executable file.
Using the debugger gdb can be helpful for debugging
and refining your buffer overflow payload. See this
page of useful GDB commands. But see the warning below about the
debugger’s environment slightly changing the location of the stack
pointer.
In particular, after looking over objdump output, a
good second step is running the program in GDB to find the address of
the stack pointer at a relevant time.
Since we tell you the buffer overflow occurs in
gets, it is helpful to find the call to gets
and examine the state of the program at that time in the
debugger.
Drawing a picture of the state of the stack is helpful.
The stack can start at slightly different locations depending on how the program is run. One cause of this is that Linux stores program arguments and “environment variables” on the stack, so the location on the stack pointer on entry to main depends how much space these take up.
Environment variables include things like information about the
terminal the program is being run in. You can see a list of environment
variables by running printenv. Note that the shell commonly sets
environment variables depending on what program is being run like
_=/usr/bin/printenv or OLDPWD.
For example, the program
int main(void) {
int x;
printf("%p\n", &x);
}
has different output on my system depending on the environment variables:
$ setarch x86_64 -RL bash
$ ./stackloc # run normally
0x7ffffffffe034
$ env - ./stackloc # run with no enviornment variables
0x7ffffffffed84
$ gdb ./stackloc
...
(gdb) run
0x7ffffffffe004
A particular case where this is a problem is running the program in the debugger versus not. The debugger may set a few environment variables itself, and when you run the program in the debugger, it may set
The best way to avoid problems with the stack starting in different locations is to use a NOP sled. Please place a large string of NOPs before your exploit code and try to “aim” the return address in the middle of this string. This will prevent you from being sensitive to small differences in the location of the stack. We’ve made the buffer that is overflowed particularly large to make a NOP sled more reliable.
An encoding for a 1-byte NOP instruction on x86 and x64 is 0x90.
You could also try to figure out how to keep the debugger from
changing the enviornment (likely with some unset env
commands), but this is less preferable, because it means your exploit is
less reliable.
You can run objdump on .o files. I
would recommend using objdump -dr file.o, which will show
disassembly and unresolved relocations, so you can tell if you
accidentally generated machine code which needs the linker to complete
it. (Recall that relocations are addresses the linker needs to fill in
later.)
On 64-bit x86, you can use RIP-relative addressing (that is,
program counter-relative addressing) to load addresses within your
machine code without worrying about the location at which your machine
code is placed in memory:
code: movq value(%rip), %rax leaq value(%rip), %rbx ... value: .quad 42
will place the value 42 in %rax and the address of the value 42 in %rax.
But, unlike not using (%rip), the resulting machine code will not have
any depenencies on the memory addresses eventually assigned to code and
value. It will only depend on how far apart code and value are in
memory.
Note that if you choose to do this, nasm will become difficult to use
(it doesn’t interact well with rip). You can program in AT&T syntax
(shown above) and use as to compile the assembly.
Other techniques for finding the address of your code include using a sequence like:
call next
next:
popq %rax
to load the current program counter into %rax. The
call instruction uses an address relative to the current
program counter, so the resulting machine code does not include
hard-coded addresses.
Since gets reads until a newline, you need to make
sure your machine code does not contain newlines.
The objcopy utility can be used to extract a
particular section of an object file. For example
objcopy -O binary --only-section=.text compiled_code.o compiled_code.raw
will take the .text section of the object file
compiled_code.o and put it in
compiled_code.raw. (compiled_code.o might be a
file generated by gcc -c some_assembly_file.s.) You might
then look at the resulting file with a tool like ghex or
od to extract the machine code in an less cluttered way
than looking at the objdump output.
The executable contains PrintGradeAndExit function.
To figure out what the arugments mean, figure out what the arguments of
its call to printf are.
A challenge with calling the PrintGradeAndExit
function is that our machine code and data is on the stack and could be
corrupted by our call to PrintGradeAndExit if we are not
careful. To avoid this, you can explicitly set the stack pointer. For
example, you might use
leaq label-0x100(%rip), %rsp
to set the stack pointer to point 0x100 bytes before a label in your
shellcode. (label-0x100 is assembly syntax for
0x100 bytes before label.)
pushq then ret allows you
to jump to an location from machine code without worrying about where
that machine code ends up relatively in memory.If you don’t call PrintGradeAndExit, you could
instead print out the output you want directly, then exit. This is more
realistic but a little more challenging.
Instead of including a newline in your buffer overflow, you can, instead, include code to compute a newline (e.g., by adding or subtracting from another value) or to copy one from elsewhere in the application.
To print something out from your machine code, you could call the
printf@plt “stub” (hard-coding its address) or make a
write() system call directly. An example assembly snippet to make a
write system call is:
mov $1, %eax /* system call number 1 = write */
mov $1, %edi /* arg 1: file descriptor number 1 = "standard output" */
lea string, %rsi /* arg 2: pointer to string */
mov $length_of_string, %rdx /* arg 3: length of string */
syscall
exit@plt “stub” or by making an
exit_group system call directly. An example assembly
snippet to make an exit_group system call is:mov $231, %eax /* system call number 231 = exit_group */
xor %rdi, %rdi /* arg 1: exit code = 0 */
syscall
You can find an example of shellcode that runs runs the
execve system call to execute /bin/sh in this archive of shellcode.
Note that some of the shellcode you find may make assumptions about the
initial contents of registers or location of the stack pointer. If you
use prebuilt shellcode like this, you must clearly cite
its source.
On Linux, execve replaces
the current program with the executed program. The new program inherits
the same input and output as the prior program.
Standard I/O functions read ahead in their input. For example,
gets may read part of the next line, saving it in a buffer
for future calls to gets or other
<stdio.h> functions. These buffers are
not passed to the new program by execve.
To compensate for this, you may need to include padding in your
input.
You can print out a string from the shell using the echo command.
By default, the shell won’t print out a command-prompt when its input is not a terminal.
Submit a C file called attack.c, which will produce to
stdout a data.txt that will cause the supplied program to output your
name and a recommendation for a grade of A. Make sure you C file
includes comments that describe how it works and any special resources
you used. Also submit a Makefile that will compile this
program. The executable that attack.c compiles to
MUST be named attack!
This assignment was adopted from Charles Reiss’ fall 2017 assignment, which was adapted from Jack Davidson’s fall 2016 assignment, which was adopted from one given previously by Andrew Appel in Princeton’s COS 217.