Refactor fhash is done

Table of Contents

Overview

Aha, After some days, I have finished the refactor task for fhash, it’s really really big improvement for flibs, since many many components depend on it, such as:

  • thread pool
  • event framework
  • pcap conversion lib
  • log

Why need to do refactor

The old fhash have some defects, such as:

  • hard to iterate
  • hard to extend
  • hard to modify fhash when iteration
  • some performance issues during iteration and fhash_set interface

so finally, I decided to rewrite it. Before we go through the new design, let’s take a look on the old design at below section.

Graph of Old/New design

mind mapping software

The new design

The new design fix all the issues in old design, which much clean and user friendly.

Let’s take a look the core APIs:

fhash* fhash_create(uint32_t init_size, fhash_opt opt, uint32_t flags);
void fhash_delete(fhash* table);

void fhash_set(fhash* table,
               const void* key, key_sz_t key_sz,
               const void* value, value_sz_t value_sz);
void* fhash_get(fhash* table, const void* key, key_sz_t key_sz,
               value_sz_t* value_sz);
void* fhash_fetch_and_del(fhash* table,
               const void* key, key_sz_t key_sz,
               void* value, value_sz_t value_sz);

fhash_iter fhash_iter_new(fhash* table);
void fhash_iter_release(fhash_iter* iter);
void* fhash_next(fhash_iter* iter);
void fhash_foreach(fhash* table, fhash_each_cb cb, void* ud);

int fhash_rehash(fhash* table, uint32_t new_size);
void fhash_profile(fhash* table, int flags, fhash_profile_data* data);

Full API and Example Documents

Please refer to wiki

The benchmark

And for this time, I just add a benchmark tool for fhash, so that I can review the performance when I need it, let’s take a look one result I ran it on my virtual box:

========= fhash testing without auto rehash =========
fhash_set x100000 spend time: 9189450 usec
fhash_get x100000 spend time: 6535296 usec
fhash_next x100000 spend time: 1111 usec
fhash_rehash (index double), ret: 0, spend time: 42825 usec
[index]: used: 63226, total: 100000, usage rate: 0.632260
[slots]: used: 100000, total: 107241, usage rate: 0.932479
fhash_del x100000 spend time: 32075 usec
========= fhash testing with auto rehash =========
fhash_set x100000 spend time: 112542 usec
fhash_get x100000 spend time: 35542 usec
fhash_next x100000 spend time: 3333 usec
fhash_rehash (index double), ret: 0, spend time: 57153 usec
[index]: used: 63226, total: 100000, usage rate: 0.632260
[slots]: used: 100000, total: 107241, usage rate: 0.932479
fhash_del x100000 spend time: 37410 usec

So, from above, we can see the performance comparison of disabling/enabling auto rehash, the result with auto rehash is winner, that means in a normal case(if user cannot sure how many items will be putted into hash table), the best solution is enable auto rehash feature, it will avoid the hash table convert to a list.

The End

After rewrite fhash, I realized that:

  • To create a user friendly, extendable program is more important than the performance, since the most important thing is that how can we get start with a new library? If the library is hard to use, user will give up and try to find a another library.
  • Another hand, the documentation is also a very important part in the project, since that’s the easiest way to tell user: what it is and how to use it.

Next Step

In the future, I have some actions to optimize it:

  • I can add a counter in every hash node, to record the get/set actions frequency, so that fhash can have ability to optimize the hash node list in every rehash or some other trigger point, so the highest hotspot node will be putted at the front of list.
  • Use a list instead of array, and compare the performance impact.

Ok, let’s hacking begin!~