Hash Maps
Why is this important?
This workshop is important because:
-
Associative arrays are used in a huge variety of languages. For example:
- objects in JavaScript
- hashes in Ruby
- dictionaries in Python
- HashMaps in Java ...
-
Key-value stores are a common kind of noSQL database. Redis is a popular example.
What are the objectives?
After this workshop, developers will be able to:
- Describe and draw the structure of a hash map (with separate chaining).
- Explain properties of a good hash function.
- Perform runtime analysis on the hash map data structure.
Hash Map Operations
Hash maps let us add key-value pairs, look up the value for a key, and delete a key-value pair.
Structure
Hash maps are based on arrays. Each key is mapped to a slot in the array with a "hash" function.
Note that these hash functions can be reversible, unlike the "cryptographic hash functions" used to encrypt things.
Choosing a Hash Function
Let's say we've decided to use an array size of 16.
Here's a hash function:
function hash(key){
return Math.floor(Math.random()*16);
}
It won't work for us. Why?
click for explanation
We need to be able to look up values by their keys, so the function has to send us to the same bucket every time we give it a particular key.Collisions
What if we want to be able to handle more inputs than we have slots in the array? We're going to have to figure out how to deal with collisions!
We can handle collisions in a few ways. We could find a different slot in the array to put each value pair in. We might still end up with two values mapped to the same slot, in which case we'd have to resize the array or start removing elements.
Our approach will be to add more key-value pairs to the same slot in the array. This approach has its own limitations.
Here's another hash function:
def hash(key)
0
end
It's not very good. Why?
click for explanation
This hash function sends all keys to the first bucket. We basically end up with a linked list!Check for understanding
Let's try to store information about many developers in a hash map with each developer's name as the key for their information.
What happens if we choose an array length of 26 and a hash function that maps each name to the bucket that corresponds to the first letter?
function hash(name){
return (name.charCodeAt(0) - 65) % 26;
}
Is this a good hash function? How could we improve it?
Speed
spoiler
You might think of improving this hash function by using more than the first letter of the string so that we have fewer collisions.function hash(key, arrayLength) {
arrayLength = arrayLength || 13;
return key
.split('').map(function (letter){
return letter.charCodeAt();
})
.reduce(function(prev, curr) {
return prev + curr;
}) % arrayLength;
};
Hash maps promise O(1)
lookup, insert, and delete. In order to get fast performance from the data structure, we need fast (O(1)
) performance from the hash function.
-
What is the Big O time complexity of the function above (in the spoiler) in terms of the length of the
key
string? -
How can we change the hash function so that it still takes
O(1)
time and still distributes keys into buckets more evenly than our previous hash function that just used the first letter?
click for an idea
Looking at some [data](http://home.uchicago.edu/~jsfalk/misc/baby_names/), we can see there are pretty skewed percentages for just the first or last letter of a name, so using multiple letters seems like a decent idea. But using a variable number of letters based on the length of the string takes our time above `O(1)`. A potential middle ground is choosing to use the first 2 or 3 letters. We can also very reduce the time it takes to run `%` by choosing an array length that's a power of 2.Of course, if we knew ahead of time exactly which developers we'd need to hash, we could make a perfect hashing function that's O(1)
, distributes names evenly, and always gives the same results for the same input:
def hash(key)
if key == "Alicia"
0
elsif key == "Andrew C"
1
elsif key == "Andrew V"
2
#....
Coding Challenge
In the javascript/starter-code
directory, you'll find a basic hash map implementation that's based on a JavaScript singly linked list. Note that this singly linked list implementation stores a key
and some data
in each node.
Working in that code:
- update the
put
method so that it will not store duplicate keys - implement a
set
method to change the value associated with a key
Closing Thoughts
-
What is a hash map?
-
What are the Big O time complexities of inserting something into a hash map, deleting something from it, and searching for something (or looking something up) by key?