CS 2150 Roadmap

Data Representation

Program Representation

 
 
string
 
 
 
int x[3]
 
 
 
char x
 
 
 
0x9cd0f0ad
 
 
 
01101011
vertical red double arrow  
Objects
 
Arrays
 
Primitive types
 
Addresses
 
bits
           
 
Java code
 
 
C++ code
 
 
C code
 
 
x86 code
 
 
IBCM
 
 
hexadecimal
vertical green double arrow  
High-level language
 
Low-level language
 
Assembly language
 
Machine code

Hash Tables

  • Hash table
    • fixed size array of some size, usually a prime number
      • Should be larger than the number of elements
  • Given a key space:
blob
hash function
hash(k)
 
 
hashtable
 
0
1
2
 
...
 
tablesize‑1

Hash functions KLA

  • I'm going hash all of you into 10 buckets (0-9) by your birthday
    • (you are welcome to make up another birthday, as long as you are consistent)
  • The hash functions:
    • By the decade of your birth year
      • hash(birthday) = (year/10) % 10
    • By the last digit of your birth year
      • hash(birthday) = year % 10
    • By the last digit of your birth month
      • hash(birthday) = month % 10
    • By the last digit of your birth day
      • hash(birthday) = day % 10

Keys

  • How can we hash the keys if the keys can be anything?
  • Best one binary comparison can do is eliminate one half of the elements Θ(log n)
  • We want Θ(1)
  • The keys must be bits, so we can do better!
 
"Hello"
 
['H','i',\0]
 
3.14
 
'x'
 
0x42381a
 
01001010
vertical red double arrow  
 
Objects
 
Arrays
 
Primitive types
 
Addresses
 
bits
         

Example

  • Key space: integers
     
  • Table size: 10
     
  • hash(k) = k mod 10
    • Technically, hash(k) = k,
      which is then mod'ed by
      the table size of 10
       
  • Insert: 7, 18, 41, 34
     
  • How do we find them?
0
141
2
3
434
5
6
77
818
9

Another Example

  • Key space: integers
     
  • Table size: 6
     
  • hash(k) = k mod 6
     
  • Insert: 7, 18, 41, 34, 12
     
  • How do we find them?
018 12
17
2
3
434
541

Sample String Hash Functions

  • Key space: strings
  • A string s is made up of characters si
  • \( s = s_0s_1s_2s_3\ldots s_{k-1} \)

 

  1. \( hash(s) = s_0 \mod table\_size \)
     
  2. \( hash(s) = \left( \sum_{i=0}^{k-1}s_i \right) \mod table\_size \)
     
  3. \( hash(s) = \left( \sum_{i=0}^{k-1}s_i*37^i \right) \mod table\_size \)
     

Separate Chaining

0
1
2
3
4
5
6
7
8
9
  • All keys that map to the same hash value are kept in a "bucket"
    • This "bucket" is another data structure, typically a linked list

     
  • hash(k) = k mod 10
     
  • Insert: 10, 22, 107, 12, 42

Linear Probing

037
114
221
3
4
5
6
7
827
94
  • Check spots in this order:
    • hash(k)
    • hash(k)+1
    • hash(k)+2
    • hash(k)+3
    • etc.
     
  • hash(k) = 3k+7
    • Which is then mod'ed by the table size (10)
    • Result: hash(k) = (3k+7) mod 10
     
  • Insert: 4, 27, 37, 14, 21
    • hash(k) values: 19, 88, 118, 49, 70, respectively

Quadratic Probing

014
1
237
322
4
534
6
7
827
94
  • Check spots in this order:
    • hash(k)
    • hash(k)+12 = hash(k)+1
    • hash(k)+22 = hash(k)+4
    • hash(k)+32 = hash(k)+9
    • etc.
     
  • hash(k) = 3k+7
    • Which is then mod'ed by the table size (10)
    • Result: hash(k) = (3k+7) mod 10
     
  • Insert: 4, 27, 14, 37, 22, 34
    • hash(k) values: 19, 88, 49, 118, 73, 109, respectively

Double Hashing

069
1
260
358
4
5
649
7
818
989
  • Check spots in this order:
    • hash(k)
    • hash(k) + 1 * hash2(k)
    • hash(k) + 2 * hash2(k)
    • hash(k) + 3 * hash2(k)
    • etc.
     
  • hash(k) = k
    • The hash function was made simpler for this example...
    • Which is then mod'ed by the table size (10)
    • Result: hash(k) = k mod 10
  • hash2(k) = 7 - (k mod 7)
     
  • Insert: 89, 18, 58, 49, 69, 60

Double Hashing Thrashing

010
1
212
3
414
5
616
7
818
9
  • hash(k) = k mod 10 
    • Same as the previous slide
    • Result: hash(k) = k mod 10
     
  • hash2(k) = (k mod 5) +1
     
  • Insert: 10, 12, 14, 16, 18, 36