Sunday, May 27, 2012

Counting bits

How would you count the number of bits set in a 32-bit integer? This question is amongst the favorite interviewing questions. The naïve approach is to loop through the bits aggregating the sum in a variable. It's already fast enough but you can make the interviewer happy with a much better "lookup table" solution: just use a static 256-entry table that contains pre-evaluated bits count for any 8-bit integer. With such table you need only 4 lookups and 4 summations to get the answer. "Lookup table" algorithm is even implemented as a GCC builtin function. Take a look at the code:

#include <stdint.h>

const uint32_t tbl[] = {
4181239173/*,lots of random data, I used 16384 integers for tests*/,2866015773 } ;

#define TEST_COUNT 100000
static const uint32_t ARRAY_SIZE = sizeof(tbl)/sizeof(tbl[0]);

int main()
 uint32_t i = 0;
 volatile register uint32_t c;

 for (; i< TEST_COUNT; ++i)
  uint32_t j = 0;
  for (; j < ARRAY_SIZE; ++j)
   c = __builtin_popcount(tbl[j]);

Here __builtin_popcount is a function that calculates the population count. If compiled with
gcc -O3 t.c -o bitcount
one can get such results:
$ time ./bitcount 
real 0m17.465s
user 0m17.425s
sys 0m0.000s
Not too fast, huh? Some of the time is wasted calling the function and inlining it can speed up things. But! There is even a better way. Compiling it with
gcc -O3 -mpopcnt t.c -o bitcount
results in:
$ time ./bitcount 
real 0m1.738s
user 0m1.728s
sys 0m0.004s
That's simply because modern processors have POPCNT instruction that can be used to count the number of bits and GCC is nice enough to support it. As you can see, in this case hardware optimization beats smart algorithm design with ease.


Algorithms are designed on paper but executed on a real hardware. Knowing what your hardware is capable of is a must for a software engineer. And please don't tell me about all these patterns and object-oriented stuff: unless you invent a machine that can run concepts instead of machine instructions, you must be aware of all the low-level stuff because it will save lives because I don't want to buy a new laptop capable of running new version of nano/Notepad.

I also highly recommend an awesome article "Benchmarking CRC32 and PopCnt instructions" for further reading.