CS 2150 Roadmap

Data Representation

Program Representation

 
 
string
 
 
 
int x[3]
 
 
 
char x
 
 
 
0x9cd0f0ad
 
 
 
01101011
vertical red double arrow  
Objects
 
Arrays
 
Primitive types
 
Addresses
 
bits
           
 
Java code
 
 
C++ code
 
 
C code
 
 
x86 code
 
 
IBCM
 
 
hexadecimal
vertical green double arrow  
High-level language
 
Low-level language
 
Assembly language
 
Machine code

History of x86

Intel 4004

  • 1971: 4004, 4-bit words
  • 1972: 8008, 8-bit words
  • 1978: 8086, 16-bit words
  • 1982: 80286
  • 1985: 80386, 32-bit words
  • 1989: 80486
  • 1993: Pentium
  • 1995: Pentium Pro
  • 1997: Pentium II
  • 1998: Pentium III
  • 2000-2008: Pentium IV
  • 2003: AMD64 Opteron
  • 2004: Intel 64 bit chips
  • 2005-2008: Pentium D
  • 2006-2011: Core 2
  • 2008-present: Core i3, i5, i7

IBCM vs. x86: Registers

Declaring Variables in x86

Directives

  • byte: 1 byte (DB) declare byte
  • word: 2 bytes (DW)
  • double: 4 bytes (DD)
  • quadword: 8 bytes (DQ)

 

TIMES x DB 0 directive means create x bytes of value zero

section .data
a	DB	 	23
b	DW	 	?
c	DD	 	3000
d	DQ	 	-800
x	DD	 	1, 2, 3
y	TIMES 8 DB	0
str	DB	 	'hello', 0
z	TIMES 50 DD	?

Addressing Memory

  • Up to 2 registers and one 64-bit signed constant can be added together to compute a memory address
     
  • Furthermore, one register can be pre-multiplied by 2, 4, or 8
    • word-align
    • double-align
    • quadword-align
mov rax, rbx
mov rax, [rbx]
mov [var], rbx
mov rax, [r13 - 4]
mov [rsi + rax], cl
mov rdx, [rsi + 4*rbx]

 

Incorrect: (why?)

mov rax, [r11 - rcx]
mov [rax + r5 + rdi], rbx
mov [4*rax + 2*rbx], rcx

Example

Source code:

mov rcx, rax
mov rdx, [rbx]
mov rsi, [rdx+rax+16]
mov [rsi], 45
mov [a], 15
lea rdi, [a]

Registers:

rax100
rbx108
rcx100
rdx8
rsi200
rdi300
r8
...

Memory:

100
1088
116
124200
132
...
20045
208
...
a: 30015
308
...

A code block in both C/C++ and Assembly

C/C++ code:

int n = 5;
int i = 1;
int sum = 0;
while (i <= n) {
    sum += i;
    i++;
}

Assembly code:

section .data
n	DQ 5
i	DQ 1
sum	DQ 0

section .text
loop:  	mov rcx, [i]
	cmp rcx, [n]
	jg endOfLoop
	add [sum], rcx
	inc qword [i]
	jmp loop
endOfLoop:

Stack Memory Visualization for myFunc

This is just before the call opcode is invoked.


value of rdi← rsp
To higher addresses  
(to 0xffffffff)  
  
  
  
To lower addresses  
(to 0x00000000)  
  

Stack Memory Visualization for myFunc

This is just after the call opcode is invoked.


value of rdi 
To higher addressesreturn address← rsp
(to 0xffffffff)  
  
  
  
To lower addresses  
(to 0x00000000)  
  

Callee Rules (Prologue)

  1. Save callee-save registers
    • rbx, rbp, r12-r15
    • only need to do this if callee intends to use them, otherwise, no need to save their contents

 

THEN, perform body of the function

Stack Memory Visualization for myFunc

This is just after the caller invokes the call opcode.


value of rdi 
To higher addressesreturn address← rsp
(to 0xffffffff)  
  
  
  
To lower addresses  
(to 0x00000000)  
  

Stack Memory Visualization for myFunc

This is just after the callee invokes the sub rsp, 8 opcode.


value of rdi 
To higher addressesreturn address 
(to 0xffffffff)local var (result)← rsp
  
  
  
To lower addresses  
(to 0x00000000)  
  

Stack Memory Visualization for myFunc

This is after the myFunc() prologue is completed.


value of rdi 
To higher addressesreturn address 
(to 0xffffffff)local var (result)← [rsp+16]
value of rbx← [rsp+8]
value of rbp← [rsp]
  
To lower addresses  
(to 0x00000000)  
  
Code
sub rsp, 8
push rbx
push rbp
mov rax, rdi
mov rbp, rsi
mov rbx, rdx
mov [rsp+16], rbx
add [rsp+16], rbp
mov rax, [rsp+16]
pop rbp
pop rbx
add rsp, 8
ret

Callee Animation

   
StackRegisters
cc animation cc animation cc animation cc animation cc animation cc animation
cc animation cc animation cc animation cc animation cc animation cc animation cc animation cc animation
cc animation cc animation cc animation cc animation cc animation cc animation cc animation

Activation Records

  • Every time a sub-routine is called, a number of things are (potentially) pushed onto the stack:
    • Registers
    • Parameters
    • Local variables
    • Return address
  • All of this is called the activation record
    • (note that caller-saved registers are not shown in this diagram)
x86 activation record

Consider this subroutine

void security_hole() {
    char buffer[12];
    scanf ("%s", buffer); // how C handles input
}

The stack looks like (with sizes in parenthesis):

 rsi (8)   rdi (8)   buffer (12)   ret addr (8) 
  • Addresses increase to the right (the stack grows to the left)
  • What happens if the value stored into buffer is 13 bytes long?
  • What happens if the value stored into buffer is 20 bytes long?
    • We overwrite the return address!