ICS: Programming Homework: Tricky Jump

Go up to the ICS HW page (md) | view tabbed version

Overview

In this assignment, you are going to read in a binary file, add in a tricky jump, and write the modified binary file. This is similar to what viruses do; however, you are going to do this in Python, whereas viruses do this in assembly. However, you have to read in the binary file, and cannot use any external tools (such as objdump) to help you.

The goal of this is to write out the string “Not a virus” (with the trailing newline) to the screen, and then having the program continue on as normal. Be sure to write that exact string, as that is what the auto-grader will check for! If you write anything different, the auto-grader will deduct points.

This assignment is organized into steps, each one more complicated than the previous. However, each one also builds upon the previous. You can get partial credit if you do the first few, but are not able to complete the last ones.

Changelog

Any changes to this page will be put here for easy reference. Typo fixes and minor clarifications are not listed here. So far there aren’t any significant changes to report.

Platform

This assignment will be reading in ELF binaries used in Linux operating systems. You can develop this on any platform that you want; however, the binary file will only be Linux ELF, and will only run on a Linux system. It should run on WSL (Windows Subsystem for Linux) as well, since that system can run ELF files. If you do not have access to a Linux machine, you can use the Virginia Cyber Range to test your modified ELF files to ensure that they work. See the Rootkits assignment for how to sign into the Virginia Cyber Range, if you have not already done so in previous assignments.

You can use any Ubuntu 22.04 system – both the Rootkits homework and the Buffer Overflow homework use that system. You can transfer the files back and forth by using a cloud file service, such as UVABox (which is free), Google Drive, etc.

WARNING: This site is free to use for class purposes for all students in the course. Using it for non-class purposes is an honor violation, and will be dealt with as such. Anything outside the reasonable bounds of an assignment in this course is considered a non-class purpose.

The Cyber Range is a great resource, but it is a finite resource. If you decide to wait to the last minute to start the assignment, and the rest of your class-mates do so as well, it’s going to be slooooooow. You cannot get an extension because you waited until the last minute along with everybody else, and the system was slow as a result.

setarch

All the commands herein should be done after running setarch $(uname -m) -L -R /bin/bash. Nothing will work right without running that command! You have to enter that command each time you log in.

Compiled targets

We provide a number of executable files for you try to modify. If you compile them yourself, then the executable files may be different due to differences in compiler versions, etc. So please use the ones we provide!

All of the programs just print “hello world”.

Step 1: Payload

For this assignment, we are just going to insert a small assembly routine that prints out “Not a virus”, with a newline at the end. You should create an assembly file, called print.s, that does this. This output should be done with the syscall opcode, as was done in the Buffer Overflow assigment and the sample code in the buffer overflow slides. However, this code has both a .data section and a .text section – you need to put it all in the .text section, as shown on this slide. You need to modify that code to print out the required string (“Not a virus” with a newline) – the 0x0a at the end of that string is the newline. Note that for this assignment, you don’t have to remove null bytes (0x00) or newlines (0x0a), as we are not having this machine code read in via a scanf() call.

As we want to return to the main program after our assembly routine runs, you need to end that print assembly subroutine with a ret.

Put that assembly code into a file called print.s, and create a part1.c file:

extern void print();
int main() {
    print();
    return 0;
}

You should put all this into a Makefile, which should look like:

all:
    nasm -g -f elf64 -o print.o print.s
    gcc -c part1.c
    gcc -o part1 part1.o print.o

Note: on some platforms, the flags -m64 -no-pie had to be added to the two gcc lines. See if the above works for you first; if not, then try adding those two flags.

Remember that the indentation is tabs, not spaces! The executable here is just part1. You are welcome to use the Makefile provided above, and add your lines to that. Just be sure to put your compilation lines above what is already in the Makefile. At this point, you should be able to type make it should compile, and it should print:

Not a virus

What to submit

The files to submit from this part are your print.s, part1.c, and Makefile.

Part 2: Hard-coded

We are going to make a modification to the test1 executable file. This step has you making modifications to specific hard-coded addresses – which means it will only work with that one executable file.

This executable was compiled from test1.c (src), nops.s (src), and this Makefile (src). Note that if you compile it, you might get a slightly different binary version (due to compiler differences, etc.), so you should use the provided test1 executable file. You can also see the objdump of test1, the hexdump of test1, and the results of readelf.

This part should be in a file called part2.py.

Your inserted code

You need to run objdump -d on your part1 executable that you created above (this has to be done on a Linux or WSL system, such as the Virginia Cyber Range). Look for the print() subroutine. This will have three parts: the jmp, the string itself, and the body of the subroutine (which ends with a ret (0xc3)). Yours will likely be around 40 bytes – it’s fine if it’s more or less, as long as it prints the desired output. These hex bytes are what you are going to write to the executable.

Inserted code target

If you look at the objdump of test1, you will see an entire subroutine of nop instructions; that subroutine is called nops(). While this is quite convenient for us, it’s not realistic – but a good start for our first binary executable modification. Pick any spot at the beginning of that set of nop instructions. Remember the address you chose, and remember that the addresses in the objdump file are in hex.

push

We are going to write a push command into the executable file, right before the ret. The format we are going to use is 5 bytes: 0x68 followed by the four bytes of the value we are pushing (in little-Endian!). The value you are writing is the absolute address of the jump target – what you chose above – in the file.

Modifying main()

If you look at the objdump, you can see the two assembly instructions right before the ret in main():

40113d: b8 00 00 00 00          mov    $0x0,%eax
401142: 48 83 c4 08             add    $0x8,%rsp

The address listed is 0x40113d, and if you remove the leading ‘0x40’, you get the address in the file: byte 0x113d to 0x1145. You can also see this in the hexdump:

00001130  05 cf 0e 00 00 48 89 c7  e8 f3 fe ff ff b8 00 00  |.....H..........|
00001140  00 00 48 83 c4 08 c3 66  0f 1f 84 00 00 00 00 00  |..H....f........|

These nine bytes are going to be replaced with the five byte push instruction, and four bytes of nop instructions (0x90). It doesn’t matter if you put the nop instructions before the push instruction or after.

Save those nine bytes that you are about to overwrite! Then overwrite them with four nop instructions and the push instruction. Remember that the address for the push instruction is in little-Endian, so you have to reverse the bytes.

Reading the binary file

As this is in Python, you can read the test1 file into an array of bytes via:

with open("test1","rb") as f:
    bin = list(f.read())

This opens the file in binary mode, reads in the entire file in one go as a bytes type, and then converts that to a list of bytes (which will be printed as base-10 integers if you print it out).

Writing your code

You need to make two modifications to the binary file.

Outputting the file

Your output file should be called test1mod – do NOT overwrite test1!

In Python, given a list of byte values, you can write the file as such:

binout = b''
for i in bin:
    binout += bytes.fromhex("0x{:02x}".format(i).replace('0x',''))
with open("test1mod","wb") as f:
    f.write(binout)

Hard-coded values

To make your life easier when you get to the next part, you should have the various hard-coded values in variables. You are welcome to do this before or after you get this part working. Those values are:

Ensuring it works

Run your program! It should look like this:

$ ./test1mod 
hello world
Not a virus
$

If it didn’t work, here are a few things to try:

What to submit

The file to submit from this part is part2.py.

Step 3: Variable target

The previous section had you hard-code addresses – both where in main() to replace the instructions, and also where in the file to put your payload. This part will have you determine those values. Once they are determined, the program will be much the same as the previous part.

Specifically, you need to determine the values for push_loc, inst_len, and write_loc. Once they are determined, you can modify the file using your code from the previous part.

There are a few different pieces of information that you will need to determine:

We will look at teach of these in turn. As you are working on determining those three or four parts, you can hard-code the values for the test binary so that you can work on the other parts.

Your code for this part must be in a file called part3.py.

Command-line parameter

The binary file to modify will be passed in as a command line parameter – in sys.argv[1]. Your output should just append mod (not .mod!) to the file name. So an input of test2 should result in test2mod. An example run:

$ ./test2
hello world
$ python3 part3.py test2
$ ./test2mod
hello world
not a virus
$

Where .text starts

This starts at 0x1040 in test1.

The best way to do this is to read through the section header table in an ELF file. But it turns out there is a shortcut for the programs we are dealing with in this assignment.

The entry point address is, for the programs in this assignment, the same as the start of the .text section. And that address is at bytes 0x18 to 0x1b in the executable file. You can see that in the hexdump of test1:

00000010  02 00 3e 00 01 00 00 00  40 10 40 00 00 00 00 00  |..>.....@.@.....|

That’s little-Endian for 0x00401040. We remove the leading ‘0x0040’, so we really only want bytes 0x18 and 0x19: 0x1040. That’s the start of .text.

Where in main() the ret is

This is at 0x1146 in test1. The entry point is 0x1040, which is what we just determined above. Note that there are a number of 0xc3 (ret) bytes before you get to main() itself – these will not work due to the next part (identifying the previous instructions) – you can start analyzing at byte 0x1120 in test1 until you get that part working – the first 0xc3 from that byte is the correct ret statement to work with.

For our purposes, we will just look for a ret opcode in binary (0xc3). It’s possible that a 0xc3 byte is present due to being part of another instruction, such as a data value. If we can identify the instructions before the ret (that’s next), then we can safely assume that we have found a ret and not a data value.

Identify previous instructions

For test1, these are the nine bytes of instructions that you dealt with in the previous part (bytes 0x113d to 0x1145).

Your program needs to analyze the bytes before the ret to determine which instructions they are, and how many bytes each instruction is. For our purposes, we will assume that the instructions before the ret will always be one of: add, mov, pop, and nop. This is a reasonable assumption in general, as those are the types of instructions that a compiler will put at the end of a subroutine. All the test files provided here, and the ones that will be used for grading, will follow this assumption.

You can see how these instructions translate into machine code in the next section. For each instruction, we are only using the 2 (or so) most common encodings. This means there are a total of 6 possible patterns to look for (as nop only has one encoding, and we are only considering one encoding for mov). Your program needs five bytes to write the push instruction, so you should continue to identify instructions until you get to five (or more) bytes. If you get to more than 5 bytes, you will use nop instructions, as you did in the previous part.

If your code does not find the correct instructions before a ret, then it should continue looking forward in the file for the next ret, and then analyze the bytes before that one. This could happen because you did not find an actual ret (a 0xc3 byte was a data value, for example), or if the instructions before the ret were not of the form that you need to identify. If you hit the end of the file, then output “Unable to find a suitable ret”.

Where to put the payload

You identified this location for test1 in the previous part – it was somewhere in the nops() subroutine.

The programs we are using have a large set of consecutive nop statements for you to put your payload in. However big your payload is (likely around 40 or so bytes), you should look for a sequence of nop bytes (0x90) that is of that length. That is where you should put your payload. You can assume that our executables will provide a nop section of at least 100 bytes.

Putting it all together

When you can determine the four previous data items, you can use your binary modification code from part 2. You can test this on the three executable files provided with this assignment:

You are welcome to create your own, but check (with objdump -d) that the assembly instructions before the ret are what you expect them to be.

What to submit

The file to submit from this part is part3.py.

Machine code

When a compiler compiles a function, there are three parts: the prologue, the function body, and the epilogue. The prologue sets up the function – allocation of local variables, and saving registers that it is going to overwrite. The function body is the compiled code from the C code. The epilogue will deallocate local variables, restore registers, and then calls ret.

If the epilogue is of a standard format (a safe assumption for this assignment), then there are only a few instructions that will appear immediately before the the ret: mov, pop, add, and nop. It is certainly possible to have other instructions before the ret – especially if it’s a simpler subroutine – but we are going to assume that it is only going to be one of those four. All the test files will follow this assumption.

Thus, we only have to scan for those four types of instructions before the ret.

The add instruction

Any add instruction in the epilogue is adding a value to rsp, and that is the only type of add we have to consider for this assignment. These instructions can be in one of two forms:

The mov instruction

There are many encodings of the mov instruction, but we will only study a few such forms:

The pop instruction

The pop instruction can take a few forms:

Instruction Hex encoding
pop %rax 58
pop %rbx 5b
pop %rcx 59
pop %rdx 5a
pop %rbp 5d
pop %rdi 5f
pop %rsi 5e
pop %r8 41 58
pop %r9 41 59
pop %r10 41 5a
pop %r11 41 5b
pop %r12 41 5c
pop %r13 41 5d
pop %r14 41 5e
pop %r15 41 5f

Nobody should pop into %rsp, so it is not listed above.

Thus, the pop instructions we have to look at are hex 0x58 to 0x5e, and hex 0x41 0x58 to 0x41 0x5f.

The nop instruction

This will occasionally appear. It’s a single byte of value 0x90.

Submission

You should submit to Gradescope the five files that you developed:

If you did not finish all the parts, submit a blank file of the same name for the parts you did not finish – otherwise Gradescope will fail on the file-check part, and never get to giving you partial credit.