CS 2150 Roadmap

Data Representation

Program Representation

 
 
string
 
 
 
int x[3]
 
 
 
char x
 
 
 
0x9cd0f0ad
 
 
 
01101011
vertical red double arrow  
Objects
 
Arrays
 
Primitive types
 
Addresses
 
bits
           
 
Java code
 
 
C++ code
 
 
C code
 
 
x86 code
 
 
IBCM
 
 
hexadecimal
vertical green double arrow  
High-level language
 
Low-level language
 
Assembly language
 
Machine code

History of x86

Intel 4004

  • 1971: 4004, 4-bit words
  • 1972: 8008, 8-bit words
  • 1978: 8086, 16-bit words
  • 1982: 80286
  • 1985: 80386, 32-bit words
  • 1989: 80486
  • 1993: Pentium
  • 1995: Pentium Pro
  • 1997: Pentium II
  • 1998: Pentium III
  • 2000-2008: Pentium IV
  • 2005-2008: Pentium D
  • 2006-2011: Core 2
  • 2008-present: Core i3, i5, i7

Declaring Variables in x86

Directives

  • byte: 1 byte (DB) declare byte
  • word: 2 bytes (DW)
  • double: 4 bytes (DD)
  • quadword: 8 bytes (DQ)

 

TIMES x DB 0 directive means create x bytes of value zero

section .data
a	DB	 	23
b	DW	 	?
c	DD	 	3000
d	DQ	 	-800
x	DD	 	1, 2, 3
y	TIMES 8 DB	0
str	DB	 	'hello', 0
z	TIMES 50 DD	?

Addressing Memory

  • Up to 2 registers and one 32-bit signed constant can be added together to compute a memory address
     
  • Furthermore, one register can be pre-multiplied by 2, 4, or 8
    • word-align
    • double-align
    • quadword-align
mov eax, ebx
mov eax, [ebx]
mov [var], ebx
mov eax, [esi - 4]
mov [esi + eax], cl
mov edx, [esi + 4*ebx]

 

Incorrect: (why?)

mov eax, [ebx - ecx]
mov [eax + esi + edi], ebx
mov [4*eax + 2*ebx], ecx

Example

Source code:

mov ecx, eax
mov edx, [ebx]
mov esi, [edx+eax+4]
mov [esi], 45
mov [a], 15
lea edi, [a]

Registers:

eax100
ebx104
ecx100
edx8
esi200
edi300

Memory:

100
1048
108
112200
116
...
20045
204
...
a: 30015
304
...

A code block in both C/C++ and Assembly

C/C++ code:

int n = 5;
int i = 1;
int sum = 0;
while (i <= n) {
    sum += i;
    i++;
}

Assembly code:

section .data
n	DD 5
i	DD 1
sum	DD 0

section .text
loop:  	mov ecx, [i]
	cmp ecx, [n]
	jg endOfLoop
	add [sum], ecx
	inc [i]
	jmp loop
endOfLoop:

Stack Memory Visualization for myFunc

This is just before the call opcode is invoked.


value of edx 
To higher addressescopy of var z 
(to 0xffffffff)123 
value of eax (var x)← esp
  
  
To lower addresses  
(to 0x00000000)  
  

Stack Memory Visualization for myFunc

This is just after the call opcode is invoked.


value of edx 
To higher addressescopy of var z 
(to 0xffffffff)123 
value of eax (var x) 
return address← esp
  
To lower addresses  
(to 0x00000000)  
  

Callee Rules (Prologue)

  1. Save callee-save registers
    • ebx, edi, esi (push them onto stack)
    • only need to do this if callee intends to use them, otherwise, no need to save their contents

 

THEN, perform body of the function

Stack Memory Visualization for myFunc

This is just after the caller invokes the call opcode.


value of edx↖ ebp
To higher addressescopy of var z 
(to 0xffffffff)123 
value of eax (var x) 
return address← esp
  
To lower addresses  
(to 0x00000000)  
  

Stack Memory Visualization for myFunc

This is just after the callee invokes the push ebp opcode.


value of edx↖ ebp
To higher addressescopy of var z 
(to 0xffffffff)123 
value of eax (var x) 
return address 
ebp backup← esp
To lower addresses  
(to 0x00000000)  
  

Stack Memory Visualization for myFunc

This is after the myFunc() prologue is completed.


value of edx 
To higher addressescopy of var z[ebp+16]
(to 0xffffffff)123[ebp+12]
value of eax (var x)[ebp+8]
return address 
ebp backup← ebp
To lower addresseslocal variable[ebp-4]
(to 0x00000000)saved value of ebx 
saved value of esi← esp
Code
push ebp
mov ebp, esp
sub esp, 4
push ebx
push esi
mov eax, [ebp+8]
mov esi, [ebp+12]
mov ebx, [ebp+16]
mov [ebp-4], ebx
add [ebp-4], esi
mov eax, [ebp-4]
pop esi
pop ebx
mov esp, ebp
pop ebp
ret

Callee Animation

   
StackRegisters
cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation
cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation
cc animation cc animation cc animation cc animation cc animation cc animation cc animation

Activation Records

  • Every time a sub-routine is called, a number of things are pushed onto the stack:
    • Registers
    • Parameters
    • Old base/stack pointers
    • Local variables
    • Return address
  • All of this is called the activation record
    • (note that caller-saved registers are not shown in this diagram)
x86 activation record

Consider this subroutine

void security_hole() {
    char buffer[12];
    scanf ("%s", buffer); // how C handles input
}

The stack looks like (with sizes in parenthesis):

 esi (4)   edi (4)   buffer (12)   ebp (4)   ret addr (4) 
  • Addresses increase to the right (the stack grows to the left)
  • What happens if the value stored into buffer is 13 bytes long?
  • What happens if the value stored into buffer is 16 bytes long?
  • What if it is exactly 20 bytes long?
    • We overwrite the return address!

Do we need to even use ebp?

  • If we know that the number of local variables will be fixed...
    • ... then we can offset everything from esp instead
  • This saves the push/pop ebp operations
  • clang++ really likes to do this...
  • But this is not always possible if a program uses dynamic memory on the stack
x86 activation record