Article

Hash Tables

Comes in Paper 2, not paper 1.

A hash table is a form of key, value table with the goal to immediately find an item. A hashing function is used to calculate the position of an item in a hash table.

Big O Notation: O(1), however worse case may be higher depending on the collision handling

Implementation

class HashTable:
	def __init__(self, size: int):
		self.table = [None for x in range(size)]
		self.size = size
	
	def get(self, key: str):
		return self.table[self.__hash__(key)]
	
	def put(self, key: str, data: str):
		self.table[self.__hash__(key)] = data
	
	def __hash__(self, data: str):
		tol = 0
		for char in data:
			 tol += ord(char)
		return tol % self.size

ht = HashTable(100)
ht.put("gurt", "yo")
ht.put("darragh", "hi")
ht.put("so", "tuff")
ht.put("tuff", "tuff x2")
print(ht.get("gurt"))
print(ht.get("darragh"))
print(ht.get("so"))
print(ht.get("tuff"))

Linear Probing

class HashTable:
    def __init__(self, size: int):
        self.table = [None for _ in range(size)]
        self.size = size

    def get(self, key: str):
        index = self.__hash__(key)
        start_index = index

        while self.table[index] is not None:
            stored_key, stored_value = self.table[index]
            if stored_key == key:
                return stored_value
            index = (index + 1) % self.size
            if index == start_index:
                break # full loop, key not found
        return None

    def put(self, key: str, data: str):
        index = self.__hash__(key)
        start_index = index

        while self.table[index] is not None:
            stored_key, _ = self.table[index]
            if stored_key == key:
                # update existing key
                self.table[index] = (key, data)
                return
            index = (index + 1) % self.size
            if index == start_index:
                raise Exception("Hash table is full")

        self.table[index] = (key, data)

    def __hash__(self, key: str):
        total = 0
        for char in key:
            total += ord(char)
        return total % self.size


ht = HashTable(100)
ht.put("gurt", "yo")
ht.put("darragh", "hi")
ht.put("so", "tuff")
ht.put("tuff", "tuff x2")

print(ht.get("gurt"))
print(ht.get("darragh"))
print(ht.get("so"))
print(ht.get("tuff"))

Hashing Function

One-way algorithm which produces a hashed result. The hashing function is applied to an item to determine a hash value.

A good hashing function should:

  • Be calculated quickly
  • Result in minimal collisions
  • Use minimal memory

Examples:

  • MD5
  • SHA512, SHA256, SHA128

Resolving Collisions

Collisions: When an input gets the same hash value.

Linear Probing

Repeatedly checking the next space in a hash table until an empty position is found, iterating in a given interval.

Requires a linear search to be performed when inserting but also searching for the item.

Chaining

When it is possible for a multitude of items to be placed at the same position.

Examples:

  • Overflow table
  • Linked list
  • Two-dimensional table

Multiple Hash Functions

Iterating through various hash functions to find an empty position for an item.

Exam Example: Having multiple functions will reduce the likelihood of two or more inputs having the same hash value.

Usage

Applications

  • File systems (file name → file path)
  • Identifying keywords (token → token class, data type, etc.)

Operations

  • Add
  • Delete
  • Retrieve