CS 2150 Roadmap

Data Representation

Program Representation

 
 
string
 
 
 
int x[3]
 
 
 
char x
 
 
 
0x9cd0f0ad
 
 
 
01101011
vertical red double arrow  
Objects
 
Arrays
 
Primitive types
 
Addresses
 
bits
           
 
Java code
 
 
C++ code
 
 
C code
 
 
x86 code
 
 
IBCM
 
 
hexadecimal
vertical green double arrow  
High-level language
 
Low-level language
 
Assembly language
 
Machine code

Heap Structure Property

A binary heap is an almost complete binary tree, which is a binary tree that is completely filled, with the possible exception of the bottom level, which is filled left to right. Examples:

heap 4heap 3

Almost complete binary tree of height h

  • For h = 0, just a single node
heap 5
  • For h = 1, left child or two children
heap 6heap 7
  • For h ≥ 2, either:
    • the left subtree of the root is complete with height h-1 and the right is almost complete with height h-1, OR
    • the left is almost complete with height h-1 and the right is complete with height h-2

Complete Binary Trees in Arrays

heap 8

 

From node i:

 

left child: 2*i

right child: (2*i)+1

parent: floor(i/2)

Implicit (array) representation:

 ABCDEFGHIJKL 
012345678910111213

Heap Ordering Property

Heap ordering property: For every non-root node X, the key in the parent of X is less than (or equal to) the key in X. Thus, the tree is partially ordered.

heap 9 not a heapheap 10 min-heap

Insert: percolate up

heap 12heap 13

Insert expected running time

  • How far to move up?
    • Half of the nodes are leaves, so half of the inserts will only move up one level
    • A quarter of the nodes are one level above the leaves, so one quarter of the inserts will move up two levels
    • One eighth will require moving up 3 levels
    • One sixteenth will require moving up 4 levels
    • Etc.
  • Expected running time:
     
    • \( \frac{1}{2}*1 + \frac{1}{4}*2 + \frac{1}{8}*3 + \ldots = \sum_{i=1}^{n} \frac{1}{2^i}*i = 2 \)

Which child to swap with

  • Consider this min-heap:
    • 25 needs percolating!
    • But which way?
heap 14
  • If we swap 25 with the smallest child:
    • All's good!
heap 15
  • If we swap 25 with the largest child:
    • No longer a min-heap!
heap 16

DeleteMin: percolate down

heap 17heap 18

An xkcd about heaps...

Tree

xkcd # 835

 
JPEG image quality
comparison

jpeg @ 100%
  • Quality = 100; image size: 83,261 (100%)
  • Quality = 50; image size: 15,138 (18%)
  • Quality = 25; image size: 9,553 (11%)
  • Quality = 10; image size: 4,787 (6%)
  • Quality = 1; image size: 1,523 (2%)
jpeg @ 50% jpeg @ 25% jpeg @ 10% jpeg @ 1%

Huffman Coding

  • Uses frequencies of symbols in a string to build a prefix code
  • The more frequent a character is, the fewer bits we'll use to represent it
  • Prefix code: no code in our encoding is a prefix of another code
LetterCode
a0
b100
c101
d11

Decode: 1110001010011

LetterCode
a0
b100
c101
d11
huffman 13This is a full
binary tree!

 

11 100 0 101 0 0 11 = dbacaad

Huffman Trees

Cost of a file encoded via a Huffman Tree containing n symbols:

 

\( C(T) = p_1 * r_1 + p_2 * r_2 + p_3 * r_3 + \ldots + p_n * r_n \)

 

Where:

  • pi = the frequency (or probability) that a symbol occurs
  • ri = the length of the path from the root to the node

Huffman encoding costs

This is the example
from 2 slides ago
 
 
LetterFrequencyCode
a3/70
b1/7100
c1/7101
d2/711
  • a: 3/7 * 1 = 3/7
  • b: 1/7 * 3 = 3/7
  • c: 1/7 * 3 = 3/7
  • d: 2/7 * 2 = 4/7

 

  • Cost is 3/7 + 3/7 + 3/7 + 4/7 = 13/7 = 1.85 bits per character
  • ASCII is 8 bits per char
  • "Straight" encoding is 2 bits per char

Compression step 1 (a)

Determine frequencies of letters

CharacterFrequency
b1
e2
f1
i5
m1
o2
p1
s2
t4
u1
, (comma)1
(space)9

Compression step 1 (b)

Build a min-heap, sorted by frequency

huffman-14
CharacterFrequency
b1
e2
f1
i5
m1
o2
p1
s2
t4
u1
, (comma)1
(space)9

The Prefix codes

huffman-12
CharacterPrefix code
b00000
e0011
f00001
i11
m00010
o1000
p00011
s1001
t101
u00100
, (comma)00101
(space)01

Resulting Encoding Table

CharacterFrequencyPrefix codeTotal bits
b1000005
e200118
f1000015
i51110
m1000105
o210008
p1000115
s210018
t410112
u1001005
, (comma)1001015
(space)90118

Total is 94 bits

The Prefix codes

  • Write the text to the file using the Huffman encoding
     
  • "be" becomes 00000 0011
  • "set" becomes 1001 0011 101
  • "stumps" becomes 1001 101 00100 00010 00011 1001
CharacterPrefix code
b00000
e0011
f00001
i11
m00010
o1000
p00011
s1001
t101
u00100
, (comma)00101
(space)01

ASCII Character Codes in Hexadecimal

ascii

For the lab, you only need to account for the printable characters (0x20 to 0x7e)

 

Character codes:

  • 3210 (0x20) = space
  • 3310 (0x21) = !
  • 12610(0x7e) = ~