Go up to the main DADA homeworks page (md)
In the last homework, we saw how to recognize a virus pattern. While the patterns we found were very common patterns (a tricky jump and an interrupt hook), longer patterns can recognize a specific virus. This homework will write an obfuscator, which will take x64 code and modify it so that it can not be recognized in such a manner. Your program will read in an x64 assembly file, add obfuscation to the program, and print the output.
The program can be written in any language that you would like, with the following restrictions:
We provide a Makefile at the bottom that will ensure proper compilation. If you are not using C++, you will have to modify it, as described below. Specifically, there is a run
target, so that calling make run
will run your program.
Your program will read in an x64 assembly from standard input, and write the output to standard output. Thus, you can run your program as such:
cat vecsum.s | make run > vecsum-obfuscated.s
While testing, you may want to replace make run
with whatever command you use to run your program.
The assembly files will follow a very strict format, intended to make parsing the file viable without a full fledged parser:
global
and section
; those keywords will be the first (non-whitespace) token on a line, and those lines are to be reproduced exactly in the output (we don't care about white space, of course). There might be multiple section
or global
lines in a file.:
) will appear is after a label; no colons will appear in the comments or the x64 code. This should allow for easy identification of jump targets.ret
, pop rsi
, and cmp rcx, 0
; they have zero, one, and two targets, respectively. There will be no spaces within a target (so [rax + 4*rbx + rcx]
will not appear; instead it would appear as [rax+4*rbx+rcx]
). This means that there will never white space before a comma, only afterward. The opcode and any target(s) will separated by one or more spaces.These restrictions should allow for easy reading of the x64 assembly input. The intent here is not for you to have to spend all your time writing a complicated lexer and/or parser to read in the file. If there are other restrictions on the x64 assembly input that will make the parsing job easier, feel free to chat with me about it.
The output of your program should be an obfuscated x64 program THAT COMPUTES THE EXACT SAME RESULT. Comments should not be output, and you are free to output blank lines or not (it's probably easier to not output them). We are going to run the output of your code through NASM, so it needs to compile. Furthermore, your output should conform to the x64 formatting guidelines above, as we will try to run your code through your program a second time.
You can start with the sample code provided in CS 2150 lab 8 (x64, part 1): Makefile, main.cpp, and vecsum.s. However, the vecsum.s has to be modified to conform to the above guidelines (reformatting of comments and removal of colons; all x64 opcodes stayed the same). Below is the vecsum.s file properly formatted, but without any comments:
global vecsum
section .text
vecsum:
xor rax, rax
xor r10, r10
start:
cmp r10, rsi
je done
add rax, [rdi+8*r10]
inc r10
jmp start
done:
ret
Running it through a VERY simple obfuscator might yield the following code. Note that the formatting (i.e. leading spaces) is optional, and is only included here for ease of reading. The NOPs are indicated in the program below. This intentionally uses very simple obfuscations; details on obfuscation complexity is detailed below.
global vecsum
section .text
vecsum:
xor rax, rax
; the following line is a nop obfuscation
imul rdx, 1
xor r10, r10
start:
cmp r10, rsi
je done
; the following line is a nop obfuscation
add r11, 0
add rax, [rdi+8*r10]
inc r10
; the following line is a nop obfuscation
nop
jmp start
done:
ret
Note that in the above program the obfuscations are clearly labeled. Not only are you not expected to do that, but it will be impractical when you are doing more advanced obfuscations. We did it here for clarity in understanding the program that resulted.
trim()
(or the equivalent in your language of choice).split()
or explode()
(or the equivalent in your language of choice).split()
(or equivalent) easier.cmp
and it's respective conditional jump.The program above has as simple obfuscations as there can be: there are three types of NOPs: nop
itself, adding zero, and multiplying by 1. You can imagine a bunch of other NOPs: subtracting 0, exchanging (xchg
opcode) a register with itself, etc. In each one, a random register can be chosen, which could be any of the x64 registers. One option would be to have a percentage chance to put such a nop after each line (that is not a ret
or cmp
).
Implementing this will get you 2 points (out of 10) - it was done by a 40 line Java program. This type of obfuscation doesn't get us very far - the NOPs are easily detectable as NOPs, and can be easily removed by lex (or anything more powerful).
Your job is to implement more complicated obfuscation. In the program above, there is a dec
opcode - perhaps multiple operations to yield the same result. As these command would not, individually, be NOPs, they are harder to detect and remove. Consider the various obfuscation techniques that we discussed in lecture.
It is likely that you will need to generate more complicated assembly routines to demonstrate your code obfuscation - you will be submitting these as well.
There are three different platforms that people are using: Windows, Mac OS X, and Linux. As a result, there are differences in how to compile and run x64 assembly.
YOUR SUBMITTED PROGRAM MUST RUN ON A 64 BIT LINUX MACHINE! And must be compiled using the -m64 flag (which compiles it into 64 bit assembly).
You can look at CS 2150 lab 8 (x64, part 1), which discusses the various ways to compile x64 for the various platforms.
Your obfuscations may need to use temporary registers for their computations. One way to do this is to trace the registers throughout the execution of the program and see which ones are not being used, but this is beyond the scope of this homework.
For this homework, you can safely assume that you may use the rcx, r8, and r9 registers, as those will not be used by the surrounding assembly code. Thus, you can use those three registers in your obfuscations (you don't have to, but you have that option). You may recall that these are registers that are used to pass in parameters 4-6 (from the register usage guidelines). Thus, we will not be providing you with subroutines that have more than three parameters.
Note that you will have to assure that your provided assembly code (in x64.s and whatever you test with) also does not use these registers.
You should submit the following files. BE SURE TO NAME THEM PROPERLY, including capitalization - otherwise, can can't call our testing scripts on your code, and we'll just give you a zero. For example, we will assume that your assembly file is called x64.s
, your C++ file main.cpp
. Your sample C++/assembly file needs to compile to an x64
executable (not a.out
!). The submission system will call make
to compile everything.
x64.s
: assembly code that you write for us to obfuscate. In particular, you should generate some assembly routines that demonstrate the various obfuscations that your code can produce. However, this file should be the non-obfuscated version. You can have multiple assembly routines in a single file - just have multiple global
lines, one for each. To start with, use the vecsum.s file (but rename it to x64.s). This file should compute something - what, we don't care, but it should need many opcodes to compute some numerical result. IT MUST CONFORM TO 64-BIT X64 ASSEMBLY. See above for details.main.cpp
: the driver file that will call your sample assembly code. This is not the program that you can select the language for! You are welcome to use the CS 2150 lab 8 main.cpp file verbatim, if you would like. It should not take in any input.Makefile
: this should compile BOTH the main.cpp/x64.s program (into an executable named 'x64') and, if necessary, your obfuscation program (only C, C++, and Java need to do this compilation step; Python, Ruby, and PHP do not). When you submit it, it MUST CALL g++ WITH THE -m64 FLAG. See above for details, and see below for a sample Makefile.readme.pdf
: this file should describe the obfuscation techniques that you use, and where we would find them in the file. We realize that you can't specify exactly where (due to the fact that your program will have randomization), but give us as good an idea as you can. And see [How to create a PDF file]. Note that we will not know about an obfuscation technique unless it is listed here!Obfuscations types will yield the following points:
dec
command with multiple computations to yield the same result, and likewise with inc
): 2 points for each type.Note that these obfuscations need to be general. So implementing just a dec
replacement is not worth 2 points by itself, but replacing dec
and inc
(and similar commands) with replacements can yield 2 points for that part. And replacing it with the same pattern of opcodes is not much of an obfuscation, as that becomes an easy pattern to match and thus remove. Obviously, the quality of the implementation of each obfuscation will be on a scale of that amount (so poorly implementing dec/inc may only yield 1/2 points for that one).
Restrictions:
This has the net effect of requiring at least two complicated algorithmic implementation for more than 6/10 on this homework.
NOTE: if your obfuscated code doesn't compile, then you will get a very low score. Anybody can scramble a program so that it doesn't compile. It will be far better to provide a small number of obfuscations that work properly rather than a lot that do not work.
We are going to run your obfuscator on your provided source code (x64.s), and compile the result along with your main.cpp, and make sure that it works the same way that your original (un-obfuscated) x64.s and main.cpp worked.
We are also going to obfuscate our own assembly code. In particular, we are going to obfuscate our code multiple times -- meaning we will take the output of our obfuscated assembly code and run it through the obfuscator again and again. It should produce the same result each time.
Below is a sample Makefile for an obfuscator written in C++. You are certainly welcome to use a more complicated Makefile; this is the minimum required for this assignment.
main:
g++ -Wall obfuscator.cpp
nasm -f elf64 -o x64.o x64.s
g++ -m64 -Wall -c -o main.o main.cpp
g++ -m64 -Wall -o x64 x64.o main.o
run:
@./a.out
You have to put the @
before the execution line! Your execution line will vary depending on your language: @./a.out
for C/C++, @java Main.java
for Java, @python obfuscator.py
for Python, etc.
Note that if you cut-and-paste this into a Makefile for you to use, you will have to replace the leading 5 spaces on those lines with a single tab. And the -Wall flag is there for your sanity (it turns on all warnings), but it is certainly not required.
This Makefile does a number of things:
main
and run
) - and no macros - as this is a simple Makefile. You can define all the targets and macros that you would like. We are going to call make
to compile it, and make run
to run it.a.out
(since we don't specify the executable name, that's what it defaults to). We don't care what your obfuscator source code file is called (as long as it's something reasonable, follows the Program submission guidelines (md), and is not called main.cpp). This line will be one of the only two lines that changes depending on your choice of implementation language (see below).-f elf64
). The assembly file MUST be called x64.s.x64
- and the executable MUST be called x64
.The second line of the Makefile (g++ -Wall obfuscator.cpp
) will change depending on your choice of implementation language:
gcc -Wall obfuscator.c
javac Main.java
For this homework, please name the obfuscator source code file obfuscator.*
for whatever language you are using.
The second target is what will run your obfuscator. Here are some sample lines for various languages:
./a.out1
(compiled from obfuscator.c or obfuscator.cpp)java obfuscator
(compiled from obfuscator.java)python obfuscator.py
python3 obfuscator.py
php obfuscator.php
ruby obfuscator.rb