Here the xlib result has improved from 194 seconds to 81
seconds. That's a 2.4x improvement, and fast enough to now play the
movie without skipping. It's very satisfying to validate performance
-patches with real-world application code like this. (Of course,
+patches with real-world application code like this. This commit is in
+the recent 2.7.99.901 or the Intel driver, by the way. (Of course,
there's still a 1.8x slowdown of the xlib backend compared to the
image backend, so there's still more to be fixed here.)