Go up to the main DADA homeworks page (md)
In this assignment, you will be examining some of the issues surrounding hashes and their security applications.
Note that due to the time issues related to task 2, there will be no extensions granted on this assignment. Task 2 will take some hours of the computer thinking, and if you wait to the last minute to start, you will end up with a late assignment.
Your job is to write a C/C++ program (necessary for speed reasons) that, when given an input file and a CRC checksum, will modify that message, and ensure that the modified version matches the CRC checksum.
For this program, you are to use the CRC library in Boost. Boost is a C++ library somewhat similar to Java's SDK. Details about the Boost library can be found here; that page includes a sample program. Also see here for another example as to how to use the CRC functions.
Note that all Boost files have an .hpp extension, and are #include'd like the STL library - meaning you don't have to compile them separately (there are some exceptions to this rule, but not ones we will see for this assignment). You will need three Boost files: crc.hpp, cstdint.hpp, and config.hpp. You can download the files from those links if you don't have them; if you have Boost installed, you can use those as well.
You should NOT submit the .hpp files (we have copies as well). Your program should include the files as follows:
#include <boost/crc.hpp>
#include <boost/cstdint.hpp>
Note that the Linux VirtualBox image does not have Boost installed, but you can install it via sudo apt-get install libboost-dev libboost-doc
. Alternatively, you can compile the code with the files linked to above (use #include "crc.hpp"
for development), and just change the #includes over before submission. You don't need any other files in the Boost library for this assignment, so just copying those files is fine for this.
Your program should be called crc.cpp
and be compiled as crc
(see the Makefile section, below, for details). It will be run with two command-line parameters:
A sample execution:
./crc input.txt abcd
The program should write its output to a file named output.txt
, which should contain the following:
Note that there are differences in text files between Linux and Windows platforms (see here for details); this program will be run in (and graded in) a Linux environment.
Adding the message of your own shows that you can "modify" the original message without having to deal with parsing of the original message.
To make this program reasonable to run in a short time frame, we are going to be using the CRC16 algorithm, NOT the CRC32 algorithm. Boost can do both. You can install the crc32
binary in Ubuntu by installing the libarchive-zip-perl package (sudo apt-get install libarchive-zip-perl
). First make sure you are computing the CRC32 hash properly with your code and the crc32
program (follow the directions linked to above). Once you can do that, change it over to CRC16 (Boost provides both CRC16 and CRC32 data types and functions).
A few hints:
crc_16_type
result EACH time you compute the CRC value; you can't re-use it very easily-O2
compilation flag, though.Your program source code must be in a crc.cpp
file, and the binary must be called crc
. See the Makefile section, below.
How easy is it to create a malicious program with a specific MD5 hash? In this part we'll find out.
For this task, we are going to follow the instructions found online here. This code is released under the Modified BSD and/or the GPL license, so I am allowed to use it here, as long as I don't claim credit for it (I'm not), and I include the license in the source code (it's included there).
While I have tried this on my computer, I can make no guarantees about the safety of the program. Thus, you should run it in the Linux VirtualBox image just to be safe. That being said, I didn't, and the world hasn't ended yet...
The instructions are repeated below, and you can download the source code from Collab; the file is called evilize-0.2.zip
.
Your task is to create two binary executables, good
and evil
, that have the same MD5 hash. Those executables should print something relevant (i.e., something "good" and something "evil") - it can be interesting quotations, good/evil instructions, etc. IT SHOULD NOT DO ANYTHING MALICIOUS, as that would be a violation of your Ethics honor pledge. It should print something good/evil, not do something good/evil. Find some interesting quotations to entertain us!
The binary executables should be Linux binary executables, as we will be running them on a 64 bit Linux system. As long as they are elf executables (32 bit or 64 bit) created in a Linux environment, you are fine. If you use any obscure libraries, you should check with me first to make sure that they will run.
For this part, you should submit four files:
good
, the first binary with a matching MD5 hashevil
, the second binary with a matching MD5 hashmultiple_personalities.c
, the source code file that contains main_good()
and main_evil()
, the main()
functions from the good
and evil
executables -- we aren't going to compile this, we just want to look over the source codemd5.pdf
, the report for this section (see below)As the binary files will already have been created, and we don't need to compile your source code file, there are no entries for this task in the Makefile.
The report (in the md5.pdf
) file should answer the following question: how does this whole thing work? We aren't looking for a mathematical analysis of how the MD5 collision algorithm works. Rather, an overview of how the entire evilize and md5coll programs work. How long this part takes is up to you, but we aren't expecting more than 1 page (and don't bother with double-spacing, we hate that). If you can explain it adequately in 1/2 page, that's fine. We are looking for quality of your description, not a specific quantity of words or lines.
Note that it took 90 minutes on my home computer (3.4 Ghz machine) to run; your mileage may vary. And it can only run on one core, so multi-core machines do not get much of a boost.
Here are a step-by-step instructions to get this part running:
Unpack the archive and build the library and tools:
unzip evilize-0.2.zip
cd evilize-0.2
make
This creates the programs evilize
, md5coll
, and the object file goodevil.o
.main()
, write two separate top-level functions main_good()
and main_evil()
. See the file hello-erase.c (included in the evilize-0.2.zip file) for a simple example.Compile your program and link against goodevil.o. For example:
gcc hello-erase.c goodevil.o -o hello-erase
Run the following command to create an initialization vector:
./evilize hello-erase -i
This will output something like the following:
Initial vector: 0xfda9af93 0xb20a5c10 0x653aa3af 0x8e2347d2
Create an MD5 collision by running the following command (but replace the vector on the command line with the one you found in step 4):
./md5coll 0xfda9af93 0xb20a5c10 0x653aa3af 0x8e2347d2 > init.txt
Note: this step can take several hours.Create a pair of good and evil programs by running:
./evilize hello-erase -c init.txt -g good -e evil
Here "good" and "evil" are the names of the two programs generated, and "hello-erase" is the name of the program you created in step 3.
NOTE: steps 4-6 can also be done in a single step, as follows:
./evilize hello-erase -g good -e evil
However, I prefer to do the steps separately, since step 5 takes so long.
Check the MD5 checksums of the files "good" and "evil"; they should be the same.
Run the programs "good" and "evil" - they should exhibit the two different behaviors that you programmed in step 2.
Although SHA-1 is not used as much in practice these days, we are going to implement it. This will give you the idea as to how a more secure hash function works, without having to deal with the complexity of the SHA-2 family of hash functions.
There is a UNIX command called sha1sum, which is installed by default on pretty much all Linux systems (and is installed on the VirtualBox image). This will allow you to check your program.
Your implementation should be in a sha1.cpp file, and the executable should be called sha1. It should take in one command-line parameter, which is the file to take the SHA-1 hash of. The output should be in the same format as the sha1sum command:
$ sha1sum message1.txt
9e7f39160d76f722ce4ff957e296533a5b077c0c message1.txt
You may use the following resources:
And no other resources! I'm sure there are many online sources of SHA-1 implementations in C/C++. Using them (even to look at for reference) is an honor violation. We can also use Google, and will be looking at those implementations.
A few useful tidbits to keep in mind:
~
, not !
unsigned
types are your friend; use unsigned int
for pretty much everythingNote that there will be partial credit for this part, so if you implement it, but it's somewhat off (meaning you think you have it correct, but the hash result is wrong), you will get a good amount of credit. The point here is not for you to spend 10 hours debugging your algorithm.
There are a few questions to answer, and they should be in a misc.pdf
file. Again, we are not looking for quantity, only quality; each one can reasonably be answered in 4 lines or less (assuming those 4 lines are correct, of course).
The files that you need to submit are:
crc.cpp
(from task 1)good
, evil
, multiple_personalities.c
, and md5.pdf
(from task 2); note that good
and evil
are binary executablessha1.cpp
(from task 3)misc.pdf
(from task 4)Makefile
(described below)The Makefile needs to do the following:
crc.cpp
(from task 1) into a binary executable called crc
sha1.cpp
(from task 3) into a binary executable called sha1
The Makefile does not need to do anything for tasks 2 and 4. Below is a sample Makefile. Recall that with Makefiles, you must replace the leading 5 spaces on each target task with a single tab.
main:
g++ -O2 -o crc crc.cpp
g++ -O2 -o sha1 sha1.cpp
We will, obviously, be compiling your submission with make
.