Chapter 15 – Hashing – Programming by Design

(Updated September 2, 2024)

Overview

We have seen arrays and their power. They can store vast amounts of knowledge, perform complex operations like sorting, and be relatively fast using tools like binary search.

The next step after that, from an OOP perspective, was the Vector (and ArrayList!) with many operations bolted onto the data itself.

Then, we branched out to the linked linear list. This novel concept allowed us to overcome some array limitations, like managing the array contents, which complicates adding and removing data while maintaining array continuity.

But both arrays and linked lists still suffer from scalability issues. The more data we put into them, the slower they can become without additional steps. In other words, unless we keep sorting the array, we cannot use the O(log₂n) binary search, and we are stuck with the O(n) linear search.

The same is true for the linked list. Sorting that data structure would be mind-numbing, so we are left with the O(n)

Hashing

Hashing is a pretty old concept authored by H. P. Luhn at IBM in 1953. The original idea was to find new ways of rapidly searching large piles of data to determine if they contained a specific value. The access speed approached O(1) (constant) by storing the data in memory using clever calculations. Remember that sequential search is O(n) and binary search is O(log₂n). Sorting arrays helped make things faster, but linked lists ruined that with their disconnected nature of construction.

Of course, this was not the only value to hashing. It has become the basis of how we can validate something as authentic or take two huge items and determine if they are identical. You may have heard of MD5, SHA1, SHA256, etc. These are all forms of hashing by creating a reproducible and unique value that can be used to validate the contents of something as genuine.

These specific hashing algorithms were designed in such a way as to (potentially) never have a collision. In other words, no two distinct files should ever produce the same hash. This is quite a mathematical challenge and MD5 has already been proven broken, SHA1 has been shown to be vulnerable.

So collisions are bad, right?

Well, it depends. Collisions are bad if you are looking for something that provides reasonable assurance that the contents of an entity are genuine and unaltered or if your goals involve cryptography. Otherwise, collisions are perfectly fine and welcome if your requirements are not that strict. We can handle them.

Hash Table

A hash table is a dictionary where keys are mapped to array positions. This is done by taking the key and applying a mathematical hash function. The function is always designed to return the same hash value for the given data, which is then used as an index into the array or massaged further into a value that can represent a bucket.

The array is a group of buckets. The buckets are exactly what you think – containers capable of holding things—plural, meaning multiple. Again, we do not generally care if collisions occur, but it would be best to keep their number low.

Hash Function

So, how does hashing work? First, imagine a mathematical function that can take data as input and produce a numeric value. Then, imagine an array as a line of buckets. We can always find a specific bucket since we know the array’s length. This is part of the overall algorithm – no matter what happens in the math, the final result will always be in the range 0 to n-1, inclusive, for an array of size n.

How does this happen? Well, we take the hash value modulo the number of buckets.

Word is "PICKLE".
Number of buckets is 181.

     "PICKLE" --> [Hash Function] --> 37682376 % 181 --> 167

So, "PICKLE" lives in bucket 167.

The math is not the critical piece here (yet). We send the word to the hash function, producing an unsigned value. Then we took that value modulo the number of buckets, and we have the bucket that word will live in!

We could send both the word and the number of buckets to the hash function if we wanted.

     ("PICKLE", 181) --> [Hash Function] --> 167

Now the hash value is the bucket! It is really up to you, the programmer, to decide what is best for implementation.

Here is a version of a hashing function based on Kernighan and Ritchie’s C Programming Language 2nd edition (q.v.):

  public static int hashValue(String s) {

    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = (h << 5) - h + s.charAt(x); // h * 31 + c
    
    if ( h < 0 )
      h = -h;
    
    return h;
  }

A simple hash function for strings.

Now in the hashValue() method presented about, make no assumptions about the data and focus on the application of the procedure:

Start by setting the hash value, h, to zero.
For each char in the string, multiply h by 31 and add the next char value. (This version uses a bit rotation, q.v.)
When we're done, if h is negative, make it positive.
Return the hash value.

How does the hash value become negative? Remember that a large enough calculation will eventually exceed the space provided and overflow. The long type could be chosen to try reducing the collision space due to overflows.

Is it proper to ensure the value is not negative? In this case, it was a design choice. This would not be an issue in C implementations since we could use unsigned storage.

Now we will see how this works:

HashingAround.java

public class HashingAround {
  public static int hashValue(String s) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = (h << 5) - h + s.charAt(x);
    
    if ( h < 0 )
      h = -h;
    
    return h;
  }
  
  public static void main(String[] args) {
    
    final int BUCKETS = 181;
    String[] words = {"XYLOPHONE", "POTATO", "BOOTPRINT", "RING", "Ring", "ring", "COMMIT"};
    
    for ( String w : words ){
      int hv = hashValue(w);
      int b = hv % BUCKETS;
      
      System.out.println(w + " has a hash value of " + hv + " and exists in bucket " + b + ".");
    }
  }
}

Example 1: Testing the hash function.

The output is shown below:

XYLOPHONE has a hash value of 1417705354 and exists in bucket 48.
POTATO has a hash value of 1929109465 and exists in bucket 62.
BOOTPRINT has hash value of 789745371 and exists in bucket 17.
RING has a hash value of 2515504 and exists in bucket 147.
Ring has a hash value of 2547280 and exists in bucket 67.
ring has a hash value of 3500592 and exists in bucket 52.
COMMIT has a hash value of 1993481527 and exists in bucket 17.

There are specific takeaways in the output:

The case of the letters in the word matters. Note the differences in RING, Ring and ring.
Collisions are difficult to "predict". Would you have guessed that BOOTPRINT and COMMIT would be in the same bucket?
How does the collision rate change with the change in the number of buckets?

Even K&R indicates that this is not a perfect hash model but is easy to remember. It is suitable for many needs, is much better than the first edition, and is far from the only solution available. Additional hashing models are presented in the Improving Hashes section below.

Important Note!

The value of 31 is a prime number near the size of the alphabet to be used (A-Z). If we expected upper and lower cases, 53 might be better.

Also, the number of buckets is prime. This is an attempt to reduce the collision rate by choosing a number less likely to be a factor.

Implementation

Up to this point, the only solid code we have seen is a hash function and some hashed words. Now we will begin the process of storing data into the array!

We will initially assume a collision rate of 0 and use a method known as open addressing. This means we use the resulting value as the index in the array. Consider the following:

  static class HashTable {
    
    private String[] buckets;
    private int numBuckets;
    private int count = 0;
    
    public HashTable(int b) {
      numBuckets = b;
      buckets = new String[numBuckets];
    }
    
    // The KnR_2nd() method is a hash function described later.
    public void put(String s) {
      int b = KnR_2nd(s, numBuckets);
      buckets[b] = s;
      System.out.println(s + " added to bucket " + b);
      count++;
    }
    
    public boolean get(String s) {
      int b = KnR_2nd(s, numBuckets);
      if (buckets[b] != null)
        return true;
      return false;
    }
    
    public void remove(String s) {
      int b = KnR_2nd(s, numBuckets);
      buckets[b] = null;
    }
    
    public String toString() {
      String s = "Hash Table: size " + count + ", buckets " + numBuckets + ":\n";
      for (int x = 0; x < numBuckets; x++)
      s += "bucket[" + x + "] -> " + buckets[x] + "\n";
      return s;
    }
    
  }

A very primitive hash table implementation.

There are many assumptions in this code. Not the least of which is that every value will map to a unique location. Hence the simplicity of the implementation.The operation put() adds a value to the table, get() returns whether the value is in the table and remove() is responsible for deleting the value if found.

Consider some code that uses this hash table:

  public static void main(String[] args) {
        
    final int BUCKETS = 11;
    String[] words = {"192.168.1.2", "192.168.2.1", "192.168.1.3","192.168.3.1"};
    HashTable ht = new HashTable(BUCKETS);
      
    for ( String w : words )
      ht.put(w);
            
    System.out.println(ht);
  }

Put some IP addresses in the table.

This code will store some IP addresses in the hash table. A sample run produces the following:

192.168.1.2 added to bucket 9
192.168.2.1 added to bucket 1
192.168.1.3 added to bucket 10
192.168.3.1 added to bucket 5
Hash Table: size 4, buckets 11:
bucket[0] -> null
bucket[1] -> 192.168.2.1
bucket[2] -> null
bucket[3] -> null
bucket[4] -> null
bucket[5] -> 192.168.3.1
bucket[6] -> null
bucket[7] -> null
bucket[8] -> null
bucket[9] -> 192.168.1.2
bucket[10] -> 192.168.1.3

As you can see, we managed to get absolutely no collisions. Another sample run with the same data but a different hash function produces:

192.168.1.2 added to bucket 2
192.168.2.1 added to bucket 2
192.168.1.3 added to bucket 3
192.168.3.1 added to bucket 3
Hash Table: size 4, buckets 11:
bucket[0] -> null
bucket[1] -> null
bucket[2] -> 192.168.2.1
bucket[3] -> 192.168.3.1
bucket[4] -> null
bucket[5] -> null
bucket[6] -> null
bucket[7] -> null
bucket[8] -> null
bucket[9] -> null
bucket[10] -> null

Well, that could have gone better. We recognize that four values should be in the table, but we ended up with a corrupted table because of collisions and no collision management.

Important Note!

Yes, we intentionally created this mess. It was necessary to show that collision resolution is real, but we can handle it!

Now, we will consider some basic ways to address this.

Collision Resolution Schemes

Collision resolution can be made by computing another location within the table (open addressing), maintaining some form of a list (chaining), or some other unique form designed by the application writer.

Open Addressing

In the open-addressing model, there are several ways to address collisions. These typically involve finding another location in the hash table array. We will only look at linear probing and double-hashing as possible solutions. A drawback of open addressing is clustering. This happens when multiple collisions look for new places in the hash table. This can rapidly move our performance from O(1) to O(n).

Linear Probing involves performing a linear search from the collision point to find the next empty slot. Since we can check for null, a straightforward modification allows us to address this rapidly.

    public void put(String s) {
      int b = KnR_1st(s, numBuckets);
      System.out.print(s + " hashed to bucket " + b + ". ");
      while (b < numBuckets && buckets[b] != null)
        b++;
      
      if (b >= numBuckets)
        throw new ArrayIndexOutOfBoundsException();
      
      buckets[b] = s;
      System.out.println(s + " added to bucket " + b);
      count++;
    }
    
    public boolean get(String s) {
      int b = KnR_1st(s, numBuckets);
      while (b < numBuckets && buckets[b] != null) {
        if ( ! buckets[b].equals(s))
          return true;
        b++;
      }
      return false;
    }
    
    public void remove(String s) {
      int b = KnR_1st(s, numBuckets);
      
      while (b < numBuckets && buckets[b] != null) {
        if ( buckets[b].equals(s)) {
          buckets[b] = null;
          count--;
          return;
        }
        b++;
      }
    }

Code changes that apply linear probing to find the next free slot in the event of a collision.

The above code change results in the following:

192.168.10.2 hashed to bucket 6. 192.168.10.2 added to bucket 6
192.168.2.10 hashed to bucket 6. 192.168.2.10 added to bucket 7
192.168.1.3 hashed to bucket 3. 192.168.1.3 added to bucket 3
192.168.3.1 hashed to bucket 3. 192.168.3.1 added to bucket 4
192.168.1.6 hashed to bucket 6. 192.168.1.6 added to bucket 8
192.168.6.1 hashed to bucket 6. 192.168.6.1 added to bucket 9
Hash Table: size 6, buckets 11:
bucket[0] -> null
bucket[1] -> null
bucket[2] -> null
bucket[3] -> 192.168.1.3
bucket[4] -> 192.168.3.1
bucket[5] -> null
bucket[6] -> 192.168.10.2
bucket[7] -> 192.168.2.10
bucket[8] -> 192.168.1.6
bucket[9] -> 192.168.6.1
bucket[10] -> null

As you can see in the debug statements, the probe found a new place for the data after detecting the collision. Multiple entries landed in bucket 6, and then we had to walk the array to find the next free slot. This is an example of a high collision rate (or poorly chosen hash function) leading to a less efficient implementation.

Double-Hashing involves a secondary hashing algorithm to find another empty slot. Keep in mind that the second hash may also collide. Combined with linear probing, this may help reduce clustering. The problem with this model is that the put(), get(), and remove() methods become more complex.

Chaining

Chaining allows adding a data structure to the mix - the linked list. Adding a list to the bucket will enable us to walk the chain within the same bucket rather than use linear probing to find another bucket and possibly run out of buckets to check!

Direct Chaining

Direct chaining means that each bucket is the head of a linked list. The goal here is to be as fast as possible. When we add a value to the bucket, we will take an approach similar to pushing onto a stack. So, the new node always goes into the head of the chain.

Our HashTable class can be modified like the following:

  static class HashTable {
        
    private Node[] buckets;
    private int numBuckets;
    private int count = 0;
        
    static boolean DEBUG = false;
        
    private static class Node {
      String data;
      Node next;
            
      public Node(String s) {
        data = s;
        next = null;
      }
    }
        
    public HashTable(int b) {
      numBuckets = b;
      buckets = new Node[numBuckets];
    }

    /* ... */
  }

Code to augment the hash table by encapsulating the data into nodes and then creating primitive lists through an array of nodes.

Separate Chaining

Separate chaining involves a slight modification to the direct chaining model. In this case, we have a value and a chain rather than the bucket being the head of the linked list.

  static class HashTable {
        
    private Bucket[] buckets;
    private int numBuckets;
    private int count = 0;
        
    static boolean DEBUG = false;
        
    private static class Bucket {
      String value;
      Node chain;
    }
        
    private static class Node {
      String data;
      Node next;
            
      public Node(String s) {
        data = s;
        next = null;
      }
    }
        
    public HashTable(int b) {
      numBuckets = b;
      buckets = new Bucket[numBuckets];
    }
        
    /* ... */
  }

Abstracted further to have a Bucket be a single value plus a chain.

The takeaway on separate chaining is the additional testing needed to see if we have filled value before creating a new node to place on the head of the chain.

Load Factor

The whole point of the hash table is to provide a relatively fast data structure for storing and searching data. The goal is to try and achieve O(1) time complexity (see Big-O Notation). Remember that arrays and linked lists have linear time complexity or O(n), and linear does not scale well.

Hash tables can be implemented poorly: horrible hashing algorithm, too few buckets, too much data, improper hash application.

Or they can be implemented very well but degrade over time because the data set has grown, but the number of buckets remains constant.

The load factor is the ratio of data to buckets. If you have 45 items and 75 buckets, the load factor is .60. Many implementations try to keep this at or below .75.

What do you do when the load factor is exceeded? You build another hash table with more (often double) buckets and move the data to the new one, re-hashing all the data as you go - because the data will likely be in a different bucket.

From the perspective of the hash table itself, this can all happen behind the scenes. We can add code to the put() method to inspect the current load factor. This is why we have kept track of the number of data values in the count instance variable.

Here is a view of the changes we will make to the code:

  static class HashTable {
    
    private Node[] buckets;
    private int numBuckets;
    private int count = 0;
    private boolean REHASH = false;
    
    static boolean DEBUG = false;

    /* ... */

    public void put(String s) {
      
      if ( !REHASH && ((double)count / numBuckets) > .75)
        rehash();
      
      int b = KnR_2nd(s, numBuckets);
      Node n = new Node(s);
      
      if (get(s))
        return;
      n.next = buckets[b];
      buckets[b] = n;
      System.out.println(s + " added to bucket " + b);
      count++;
    }

    /* ... */

    private void rehash() {
      REHASH = true;
      
      System.out.println(this); // show the current state before the rehashing begins.
      
      Node[] oldBuckets = buckets;
      numBuckets *= 2;
      buckets = new Node[numBuckets];
      count = 0;
      
      for ( int x = 0; x < oldBuckets.length; x++) {
        Node p = oldBuckets[x];
        while (p != null ) {
          put(p.data);
          p = p.next;
        }
      }
      
      REHASH = false;
    }

    /* ... */

  }

An implementation of rehash().

We have added the REHASH variable as minimal protection against accidentally starting a new rehash. Once we set that flag, we will not check the load factor until the flag has cleared.

Important Note!

None of this code is thread-safe! It can be easily converted to being thread-safe, and the REHASH flag indicates that this sort of data structure will need additional protections to prevent other threads from attempting to start another rehash or put data in the old buckets in the middle of things moving around. This is known as a race condition and results from critical section code that must execute perfectly controlled way, or else intermittent bugs will creep in!

Hash Map

There is also the concept of a hash map. The hash map concept allows recording (key, value) pairs. The key is used by the application to record the data to get that value back with the known key. This is a little different yet similar to the hash table. The key is still used to select a place within the table, but the key is also uniquely identifying the value.

In the hash table model, the key is typically also the value. Or you could say that the real key is the hash value produced by the hash function that determines how to map the key into the data structure.

You are using a hash table when you have single values to be stored and rapidly sought and a hash map when you have grouped components such as key-value pairs.

Improving Hashes

There are so many ways to improve hashing algorithms. The mathematics of which is well beyond the scope of this introductory piece.

This final bit of code provides several hashing schemes at once. These are all viable solutions for string and numeric hashing.

SeriouslyHashingAround.java

public class SeriouslyHashingAround {
  
  /* K&R 1st edition Hash */
  public static int KnR_1st(String s) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = h + s.charAt(x);
    
    if ( h < 0 )
      h = -h;
    
    return h;
  }
  
  /* K&R 2nd edition Hash */
  public static int KnR_2nd(String s) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = (h << 5) - h + s.charAt(x); // h * 31 + c
    
    if ( h < 0 )
      h = -h;
    
    return h;
  }
  
  /* sdbm Hash - Historically from Sleepycat's Berkeley DB. */
  public static int sdbm(String s) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = (h << 6) + (h << 16) - h + s.charAt(x); // h * 65599 + c
    
    if ( h < 0 )
      h = -h;
    
    return h;
  }
  
  /* Knuth's Multiplicative Hash - good for numeric key distribution. */
  public static int Knuth_mult(int i) {
    
    // 1001 1110 0011 0111 0111 1001 1011 0001
    return i * 0x9E3779B1;
  }
  
  /* Dan Bernstein's djb2 Hash starting at 5381 with 33 multiplier. */
  public static int djb2_Bernstein(String s) {
    
    int h=5381;
    int x, l = s.length();
    
    h = 0;
    for ( x = 0; x < l; x++ )
      h = ((h << 5) + h) + s.charAt(x); // h * 33 + c
    
    if ( h < 0 )
      h = -h;
    
    return h;
  }
  
  public static void main(String[] args) {
    
    final int BUCKETS = 181;
    String[] words = {"XYLOPHONE", "POTATO", "BOOTPRINT", "RING", "Ring", "ring", "COMMIT"};
    
    for ( String w : words ) {
      // change the hash function to see how things move around.
      int h = sdbm(w);
      int b = h % BUCKETS;
      
      System.out.println(w + " has hash value of " + h + " and exists in bucket " + b + ".");
    }
  }
}

Example 2: Many hash functions to experiment with.

In Example 2, there are many different hash functions to choose from. Change the function used inside the for loop to see a different distribution. Be aware the Knuth_mult() is for numerical values.

Complete Examples

All of the bits of code from this chapter are now presented as complete examples so you can experiment with many of the concepts presented.

This first example demonstrates what can happen when you use different hashing methods. Remember that when switching from KnR_2nd() to KnR_1st(), there are collisions, and data is lost since there is no collision handling.

HashingNoCollisionCheck.java

public class HashingNoCollisionCheck {
  
  /* K&R 1st edition Hash */
  public static int KnR_1st(String s, int b) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = h + s.charAt(x);
    
    if ( h < 0 )
      h = -h;
    
    return h % b;
  }
  
  /* K&R 2nd edition Hash */
  public static int KnR_2nd(String s, int b) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = (h << 5) - h + s.charAt(x); // h * 31 + c
    
    if ( h < 0 )
      h = -h;
    
    return h % b;
  }
  
  static class HashTable {
    
    private String[] buckets;
    private int numBuckets;
    private int count = 0;
    
    public HashTable(int b) {
      numBuckets = b;
      buckets = new String[numBuckets];
    }
    
    // The KnR_2nd() method is a hash function described later.
    public void put(String s) {
      int b = KnR_2nd(s, numBuckets);
      buckets[b] = s;
      System.out.println(s + " added to bucket " + b);
      count++;
    }
    
    public boolean get(String s) {
      int b = KnR_2nd(s, numBuckets);
      if (buckets[b] != null)
        return true;
      return false;
    }
    
    public void remove(String s) {
      int b = KnR_2nd(s, numBuckets);
      buckets[b] = null;
    }
    
    public String toString() {
      String s = "Hash Table: size " + count + ", buckets " + numBuckets + ":\n";
      for (int x = 0; x < numBuckets; x++)
        s += "bucket[" + x + "] -> " + buckets[x] + "\n";
      return s;
    }
    
  }
  
  public static void main(String[] args) {
    
    final int BUCKETS = 11;
    String[] words = {"192.168.10.2", "192.168.2.10", "192.168.1.3","192.168.3.1",
      "192.168.1.6", "192.168.6.1"};
    HashTable ht = new HashTable(BUCKETS);
    
    for ( String w : words )
      ht.put(w);
    
    System.out.println(ht);
  }
}

Example 3: Hashing with no collision check.

The following example has additional code for simple linear probing to resolve collisions. This throws an exception if we exceed the length of the table.

LinearProbing.java

public class LinearProbing {
  
  /* K&R 1st edition Hash */
  public static int KnR_1st(String s, int b) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = h + s.charAt(x);
    
    if ( h < 0 )
      h = -h;
    
    return h % b;
  }
  
  /* K&R 2nd edition Hash */
  public static int KnR_2nd(String s, int b) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = (h << 5) - h + s.charAt(x); // h * 31 + c
    
    if ( h < 0 )
      h = -h;
    
    return h % b;
  }
  
  static class HashTable {
    
    private String[] buckets;
    private int numBuckets;
    private int count = 0;
    
    public HashTable(int b) {
      numBuckets = b;
      buckets = new String[numBuckets];
    }
    
    public void put(String s) {
      int b = KnR_1st(s, numBuckets);
      System.out.print(s + " hashed to bucket " + b + ". ");
      while (b < numBuckets && buckets[b] != null)
        b++;
      
      if (b >= numBuckets)
        throw new ArrayIndexOutOfBoundsException();
      
      buckets[b] = s;
      System.out.println(s + " added to bucket " + b);
      count++;
    }
    
    public boolean get(String s) {
      int b = KnR_1st(s, numBuckets);
      while (b < numBuckets && buckets[b] != null) {
        if ( ! buckets[b].equals(s))
          return true;
        b++;
      }
      return false;
    }
    
    public void remove(String s) {
      int b = KnR_1st(s, numBuckets);
      
      while (b < numBuckets && buckets[b] != null) {
        if ( buckets[b].equals(s)) {
          buckets[b] = null;
          count--;
          return;
        }
        b++;
      }
    }
    
    public String toString() {
      String s = "Hash Table: size " + count + ", buckets " + numBuckets + ":\n";
      for (int x = 0; x < numBuckets; x++)
        s += "bucket[" + x + "] -> " + buckets[x] + "\n";
      return s;
    }
  }
  
  public static void main(String[] args) {
    
    final int BUCKETS = 11;
    String[] words = {"192.168.10.2", "192.168.2.10", "192.168.1.3","192.168.3.1",
      "192.168.1.6", "192.168.6.1"};
    HashTable ht = new HashTable(BUCKETS);
    
    for ( String w : words )
      ht.put(w);
    
    System.out.println(ht);
  }
}

Example 4: Hash table with linear probing to resolve collisions.

Now we have code that implements direct chaining.

DirectChaining.java

public class DirectChaining {
  
  /* K&R 1st edition Hash */
  public static int KnR_1st(String s, int b) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = h + s.charAt(x);
    
    if ( h < 0 )
      h = -h;
    
    return h % b;
  }
  
  /* K&R 2nd edition Hash */
  public static int KnR_2nd(String s, int b) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = (h << 5) - h + s.charAt(x); // h * 31 + c
    
    if ( h < 0 )
      h = -h;
    
    return h % b;
  }
  
  static class HashTable {
    
    private Node[] buckets;
    private int numBuckets;
    private int count = 0;
    
    static boolean DEBUG = false;
    
    private static class Node {
      String data;
      Node next;
      
      public Node(String s) {
        data = s;
        next = null;
      }
    }
    
    public HashTable(int b) {
      numBuckets = b;
      buckets = new Node[numBuckets];
    }
    
    // The KnR_2nd() method is a hash function described later.
    public void put(String s) {
      int b = KnR_1st(s, numBuckets);
      Node n = new Node(s);
      
      if (get(s))
        return;
      n.next = buckets[b];
      buckets[b] = n;
      System.out.println(s + " added to bucket " + b);
      count++;
    }
    
    public boolean get(String s) {
      int b = KnR_1st(s, numBuckets);
      Node p = buckets[b];
      
      while (p != null) {
        if ( p.data.equals(s) )
        return true;
        p = p.next;
      }
      return false;
    }
    
    public void remove(String s) {
      int b = KnR_1st(s, numBuckets);
      Node p = buckets[b];
      
      // empty
      if ( p == null )
        return;
      // head of chain
      if ( p.data.equals(s) )
        buckets[b] = p.next;
      else {
        // seek the value
        Node cur = p.next, back=p;
        while ( cur != null ) {
          if ( cur.data.equals(s) ) {
            back.next = cur.next;
            break;
          }
          back = cur;
          cur = cur.next;
        }
      }
      count--;
    }
    
    private static String dumpList(Node l) {
      Node p = l;
      String s = "";
      while (p != null) {
        s = s + p.data + " ";
        p = p.next;
      }
      return s;
    }
    
    public String toString() {
      String s = "Hash Table: size " + count + ", buckets " + numBuckets + ":\n";
      for (int x = 0; x < numBuckets; x++)
        s += "bucket[" + x + "] -> " + dumpList(buckets[x]) + "\n";
      return s;
    }
    
  }
  
  public static void main(String[] args) {
    
    final int BUCKETS = 11;
    String[] words = {"192.168.10.2", "192.168.2.10", "192.168.1.3","192.168.3.1",
      "192.168.1.6", "192.168.6.1"};
    HashTable ht = new HashTable(BUCKETS);
    
    for ( String w : words )
      ht.put(w);
    
    System.out.println(ht);
  }
}

Example 5: Hashing using direct chaining.

Finally, direct chaining with an implemented load factor detection and reditribution.

DirectChainLoadFactor.java

public class DirectChainLoadFactor {
  
  /* K&R 1st edition Hash */
  public static int KnR_1st(String s, int b) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = h + s.charAt(x);
    
    if ( h < 0 )
      h = -h;
    
    return h % b;
  }
  
  /* K&R 2nd edition Hash */
  public static int KnR_2nd(String s, int b) {
    
    int h = 0;
    int x, l = s.length();
    
    for ( x = 0; x < l; x++ )
      h = (h << 5) - h + s.charAt(x); // h * 31 + c
    
    if ( h < 0 )
      h = -h;
    
    return h % b;
  }
  
  static class HashTable {
    
    private Node[] buckets;
    private int numBuckets;
    private int count = 0;
    private boolean REHASH = false;
    
    static boolean DEBUG = false;
    
    private static class Node {
      String data;
      Node next;
      
      public Node(String s) {
        data = s;
        next = null;
      }
    }
    
    public HashTable(int b) {
      numBuckets = b;
      buckets = new Node[numBuckets];
    }
    
    // The KnR_2nd() method is a hash function described later.
    public void put(String s) {
      
      if ( !REHASH && ((double)count / numBuckets) > .75)
        rehash();
      
      int b = KnR_2nd(s, numBuckets);
      Node n = new Node(s);
      
      if (get(s))
        return;
      n.next = buckets[b];
      buckets[b] = n;
      System.out.println(s + " added to bucket " + b);
      count++;
    }
    
    public boolean get(String s) {
      int b = KnR_2nd(s, numBuckets);
      Node p = buckets[b];
      
      while (p != null) {
        if ( p.data.equals(s) )
        return true;
        p = p.next;
      }
      return false;
    }
    
    public void remove(String s) {
      int b = KnR_2nd(s, numBuckets);
      Node p = buckets[b];
      
      // empty
      if ( p == null )
        return;
      // head of chain
      if ( p.data.equals(s) )
        buckets[b] = p.next;
      else {
        // seek the value
        Node cur = p.next, back=p;
        while ( cur != null ) {
          if ( cur.data.equals(s) ) {
            back.next = cur.next;
            break;
          }
          back = cur;
          cur = cur.next;
        }
      }
      count--;
    }
    
    private void rehash() {
      REHASH = true;
      
      System.out.println(this); // show the current state before the rehashing begins.
      
      Node[] oldBuckets = buckets;
      numBuckets *= 2;
      buckets = new Node[numBuckets];
      count = 0;
      
      for ( int x = 0; x < oldBuckets.length; x++) {
        Node p = oldBuckets[x];
        while (p != null ) {
          put(p.data);
          p = p.next;
        }
      }
      
      REHASH = false;
    }
    
    private static String dumpList(Node l) {
      Node p = l;
      String s = "";
      while (p != null) {
        s = s + p.data + " ";
        p = p.next;
      }
      return s;
    }
    
    public String toString() {
      String s = "Hash Table: size " + count + ", buckets " + numBuckets + ":\n";
      for (int x = 0; x < numBuckets; x++)
        s += "bucket[" + x + "] -> " + dumpList(buckets[x]) + "\n";
      return s;
    }
    
  }
  
  public static void main(String[] args) {
    
    final int BUCKETS = 8;
    String[] words = {"192.168.10.2", "192.168.2.10", "192.168.1.3","192.168.3.1",
      "192.168.1.6", "192.168.6.1", "192.168.1.2", "192.168.2.1"};
    HashTable ht = new HashTable(BUCKETS);
    
    for ( String w : words )
      ht.put(w);
    
    System.out.println(ht);
  }
}

Example 6: Hashing using direct chaining with rehash on high load factor.

Table of contents

Overview

Hashing

Hash Table

Hash Function

Implementation

Collision Resolution Schemes

Load Factor

Hash Map

Improving Hashes

Complete Examples

Exercises