Python Benchmark
Using Python's timeit
module to measure the performance of operations in Python quickly.
Measuring Lock Performance in Python on Linux
Here’s a quick look at the cost of acquiring and releasing an uncontended lock using Python’s threading.Lock
on a Linux system:
$ ./python -m timeit \
-s "from threading import Lock; l=Lock(); a=l.acquire; r=l.release" \
"a(); r()"
10000000 loops, best of 3: 0.127 usec per loop
Now, let’s compare that with the cost of calling a dummy Python function:
$ ./python -m timeit -s "def a(): pass" "a(); a()"
1000000 loops, best of 3: 0.221 usec per loop
And a trivial C function call (returning the False
singleton via bool()
):
$ ./python -m timeit -s "a=bool" "a(); a()"
10000000 loops, best of 3: 0.164 usec per loop
Interestingly, using a Lock
as a context manager is actually slower, not faster, despite what you might expect:
$ ./python -m timeit -s "from threading import Lock; l=Lock()" \
"with l: pass"
1000000 loops, best of 3: 0.242 usec per loop
So at least on Linux, there doesn't seem to be much low-hanging fruit left when it comes to optimizing lock performance.
Bonus: As of recent Python versions, RLock
is now just as fast as Lock
in uncontended cases:
$ ./python -m timeit \
-s "from threading import RLock; l=RLock(); a=l.acquire; r=l.release" \
"a(); r()"
10000000 loops, best of 3: 0.114 usec per loop