DADA: HW 2: x64 assembly

Go up to the main DADA homeworks page (md)

Introduction

This assignment will refresh your knowledge of x86 assembly language (which you will have to analyze in all the virus detection assignments later in this semester), expose you to the "tricky jump" style of code often used by virus writers, and familiarize you with a couple of key tools used in the analysis of programs in their binary form.

The reference platform for this project is the 64-bit Linux VirtualBox image. Although most viruses are for the Windows platform, for reasons described in class, we are using the Linux platform to make this assignment easier to perform. Similar binary analysis tools exist on Windows (and Mac OS X) platforms.

You are welcome to use Mac OS X -- or even Windows -- to write and compile your code. However, it needs to work under Linux. Your C/C++ code may have to call _foo() instead of foo() - try each one to see which one works. When you submit it, it should not have the underscore. The binary analysis will have to be done under the reference platform of 64-bit Linux.

Part 1: x86 code

Your Makefile MUST generate an executable called volume! Anything else will not be graded!

Use the following code for your main.cpp. Note that, for the parameters given, the volume is 960 and the density is 5.

#include <iostream>
using namespace std;

extern "C" int volume (int, int, int, int);

int  main () {
  int len = 12, width = 8, height = 10, weight = 4800;
  int vol = volume(len, width, height, weight);
  cout << "volume: " << vol << endl;
  return 0;
}

Part 2: analysis

The questions to answer are below. There are a number of command-line tools that can be used to generate these answers, which are listed by each question. You can type man foo at the Linux command prompt to see the full manual page about the 'foo' program. Some of the commands below require to to use specific flags described in the manual page for each command.

See "Notes on interpreting the objdump file", below, for help interpreting the output.

Answer these questions in a PDF file called analysis.pdf. Copy the questions above (you can from the web page copy of this assignment) and type your answers below each question. Below each answer, copy and paste the specific portions of the tools output that were used to answer the question.

Part 3: detecting tricky jumps

Write a separate C or C++ program that takes in, as the input, an objdump from the above executable. It should read it in as standard input - we are not dealing with file input and output. First, you may want to save the output of objdump to a file:

objdump -sRrd volume > input.txt

Your program must read in that file via standard input:

./tricky < input.txt

Your program is to detect if there is a tricky jump in that code. You can assume a few things to make this easier:

If a tricky jump is found, then you should output "tricky jump found!" for each time it's found. If there are no tricky jumps, then output "tricky jump not found".

We are not looking for error-proof or long-winded code! The reference solution was about 20 lines.

Your Makefile MUST generate an executable called tricky! Anything else will not be graded!

Part 4: Makefile

You will need to create a Makefile that will compile the programs. There are three required aspects to the Makefile.

You may want to start with the Makefile for the vecsum program provided in CS 2150 (see here). That example uses clang++ as the compiler, but you can use either clang++ or g++.

It's fine if your source code files are named something slightly different, as long as your Makefile can compile both of them. We just have to be able to reasonably tell which source code file name is which. But the executables need to be named exactly the same as specified herein.

Notes on interpreting the objdump file

This section (and some of the questions for analysis.pdf) was taken with permission from Charles Reiss (original was here).

General format

The objdump output we provide contains several parts corresponding to several parts of the executable, which are described in more detail below:

Note that this is not all the information in the executable and not all the information that objdump is capable of providing.

On dynamic linking

This executable is dynamically linked, so it doesn't include code for functions in the C standard library like printf(). These are loaded at runtime by the dynamic linker which is contained in /lib64/ld-linux-x86-64.so.2. The way Linux implements dynamic linking involves having this program handle loading all dynamically linked executables as an interpreter.

As part of Linux's implementation of dynamic linking, there is a Procedure Linkage Table (PLT). This contains "stubs" for each function the executable expects to find in a dynamically linked library, like the C standard library. One of the "stubs" looks like:

00000000004004c0 <__printf_chk@plt>:
  4004c0:       ff 25 6a 0b 20 00       jmpq   *0x200b6a(%rip)        # 601030 <_GLOBAL_OFFSET_TABLE_+0x30>
  4004c6:       68 03 00 00 00          pushq  $0x3
  4004cb:       e9 b0 ff ff ff          jmpq   400480 <_init+0x20>

This stub is called __print_chk@plt and is loaded into the program's memory at address 0x4004c0. The first instruction in this function reads the address of a function from memory at 0x601030, then jumps to that function. As indicated by the comment added by objdump this address is part of the "global offset table". This is an array of pointers used to find functions like printf() which are loaded every time the executable runs. Using this table allows the same program to work with different implementations of printf(), where printf may end up at different locations in memory. For example, in this case the global offset table will eventually contain the address at which __printf_chk, part of the Linux C library's implementation of printf() is loaded into memory.

By default, the values in this global offset table are initialized to point to the instruction following the jump, for example 0x601030 contains 0x4004c6. This means that the first time the "stub" is called, it will "fall through" to the code after the global offset table jump. This code pushes an indicator of what function was called on the stack, then jumps to part of the dynamic linker. (This code is not included in the executable file, and therefore not present in the objdump output.) The dynamic linker will then locate the actual routine (the implementation of __printf_chk in the standard library, in this case) and update the global offset table to contain its address.

On _start

Execution of the program does not actually start in main but starts in a function called _start that is provided by the compiler -- this is the start address specified in the program header. This function calls a special function in the C standard library called __libc_start_main. It is this function that actually calls main() and takes care of exiting when main() returns.

On %fs

x86 has a feature called "segmentation". As part of this feature, the processor has several "segment registers" which specify a region of memory -- essentially the segment register acts as a pointer. %fs:0x28 specifies to use segment register %fs and access a value 0x28 bytes from the beginning of the memory region it identifies.

On Linux, the %fs segment register is used for "thread-local storage" -- to point to a block of data particular to a thread, even in a multithreaded process.

On Windows, the %gs segment register is used for something similar.

The use of a segment register for this purpose instead of a normal register is just to make sure as many registers are available to the program as possible.

Segmentation was originally intended to provide functionality similar to virtual memory. These days, it is rarely used for this purpose, and its primary use is to support thread-local storage, as occurs briefly in the assembly in this assignment. It is, still, however, universally present on x86 and is entangled with x86's implementation of kernel mode and exceptions.

Items to Submit

These files need to be submitted to the HW 2 assignment via the course tools submission.