Timing bit manipulation
  |   Source

Timing bit manipulation

I ran across a surprising result today (surprising to me, at least), and I thought I'd share.

TL;DR; When in doubt, double check your assumptions!

I need a function that counts how many 1-bits are set in an unsigned integer. My initial thought was to not even consider Python's bin function because my assumption was that anything in string-land would be inherently slower. I don't necessarily need this to be blazingly fast, but there is no sense writing extra code if I don't need it. Thankfully, Python has a wonderful module called timeit that is great for these sorts of tests.

I picked up Hacker's Delight by Henry S. Warren, Jr. a few months ago, and it is full of bit-twiddling greatness. I have transposed the C code to Python below, and I really thought it would blow the doors off the other two. It is an order of magnitude faster than my initial creation, but the simple one-liner is the fastest!

In [12]:
from timeit import timeit

# My quick-and-dirty attempt, works on unsized +ints
def count_one_bits(positive_integer_or_zero: int) -> int:
    count = 0
    while positive_integer_or_zero:
        count += positive_integer_or_zero & 1
        positive_integer_or_zero >>= 1
    return count

# divide-and-conquer from Hacker's Delight (Warren), works on 32-bit unsigned ints
def pop32(x: int) -> int:
    x -= ((x >> 1) & 0x55555555)
    x = (x & 0x33333333) + ((x >> 2) & 0x33333333)
    x = (x + (x >> 4)) & 0x0f0f0f0f
    x += (x >> 8)
    x += (x >> 16)
    return x & 0x0000003f

# Surely not, strings are slow, right?!
def count_str_bits(x):
    return bin(x).count("1")

print(count_one_bits(4127189174))
print(pop32(4127189174))
print(count_str_bits(4127189174))

print("-"*80)

%timeit count_one_bits(4127189174)
%timeit pop32(4127189174)
%timeit count_str_bits(4127189174)
23
23
23
--------------------------------------------------------------------------------
3.17 µs ± 9.21 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
478 ns ± 7.02 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
352 ns ± 5.58 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
References

Warren, H. (2012). Hacker’s Delight (2nd ed.). Addison-Wesley Professional.