[[!meta title="Correcting bugs in the rectangles test"]]
[[!tag cairo exa performance xorg]]
Owen Taylor was kind enough to take a close look at my [[!recent
post|understanding_rectangles]] comparing the performance of EXA and
NoAccel rectangle fills on an r100. He was also careful enough to
notice that the results looked really fishy.
Here are some the problems he noted from looking at the graphs:
1. The EXA line looks to have an impossibly large fill rate
2. The NoAccel line looks asymptotically linear rather than quadratic
as expected.
3. No chart of numbers was provided to allow for any closer
examination.
I went back to the code for my test case and did find a bug that
explains some of the problems he saw. The random positioning of
rectangles wasn't correctly accounting for their size to keep them
within the visible portion of the window. So, as the rectangle gets
larger the region that is likely to be clipped by the destination
window also gets larger. And that explains the linear rather than
quadratic growth.
So here's a corrected version of the original graphs:
[[rectangles-corrected-512.png]]
And, again, a closer look at the small rectangles:
[[rectangles-corrected-64.png]]
And, this time I'll provide a chart of numbers as well:
Time to render 10000 rectangles with XRenderFillRectangles
|
---|
Rectangle size | NoAccel (ms) | EXA (ms)
|
---|
1x1 | 1.456 | 2.356
|
2x2 | 1.529 | 2.288
|
4x4 | 1.884 | 2.352
|
8x8 | 3.039 | 2.356
|
16x16 | 3.255 | 2.357
|
32x32 | 7.608 | 2.377
|
64x64 | 26.479 | 2.430
|
128x128 | 101.325 | 5.376
|
256x256 | 1295.105 | 22.549
|
512x512 | 15354.022 | 89.744
|
So that addresses the second and third of Owen's issues. But what
about that fill rate? First, how can I know my card's maximum fill
rate? I'm told that the standard approach is to use `x11perf
-rect500`. Let's see what that gives for NoAccel:
NoAccel $ x11perf -rect500
...
900 reps @ 6.1247 msec ( 163.0/sec): 500x500 rectangle
And then for EXA:
$ x11perf -rect500
3000 reps @ 1.9951 msec ( 501.0/sec): 500x500 rectangle
So that shows fill rates of about 41M pixels/sec for NoAccel
and about 125M pixels/sec for EXA, (`500*500*163 = 40750000`
and `500*500*501 = 125250000`).
Meanwhile, my results above for the 10000 512x512 rectangles give fill
rates of 171M pixels/sec for NoAccel and 29210M pixels/sec for EXA,
(`512*512*10000/15.354022 =~ 170733114` and `512*512*10000/.089744 =~
29210197896`).
So my test is reporting a NoAccel fill rate that is 4x faster than
what x11perf reports, and an EXA fill rate that is 233x (!) faster
than what x11perf reports. So, something is definitely still fishy
here. A fill rate of close to 30 billion pixels/sec. from an old r100
just cannot be possible, (as another datapoint, I just got a new Intel
965 and with x11perf I measure a fill rate of 843 million
pixels/sec. on it).
So what could be happening here? It could be that my cairo-perf
measurement framework is totally broken. It does at least seem to be
returning consistent numbers from one run to the next, though. And the
results do appear to have the correct trend as can be seen from these
two graphs showing the measured fill rates:
[[!img fill-rates-cairo-perf.png]]
[[!img fill-rates-x11perf.png]]
But again, notice from the Y-axis values of the cairo-perf plot that
the numbers are just plain too large to be believed.
I don't yet have a good answer for what could explain the difference
here. I did notice that exaPolyFillRect converts the list of
rectangles into a region which should prevent areas overlapped my
multiple rectangles from being filled multiple times. For x11perf
there is no overlap at 100x100 or smaller, but a lot of overlap at
500x500. Similarly, the overlap gets more probable at larger sizes
with the cairo-perf test. The existence of optimizations like that
suggest that these tests might legitimately be able to report numbers
larger than the actual fill rate of the video card.
But that code should also be common whether calling
XRenderFillRectangles like my cairo-perf test does, or XFillRectangles
like the x11perf test does. So that optimization doesn't explain what
I'm seeing here. (I also reran my cairo-perf test with
XRenderFillRectangles changed to XFillRectangles and saw no
difference.)
Anybody have any ideas what might be going on here? Email me at
or the xorg list at ,
([subscription required](http://lists.freedesktop.org/mailman/listinfo/xorg)
of course).