X-Git-Url: https://git.cworth.org/git?a=blobdiff_plain;f=src%2Fexa%2Fcorrected_rectangles.mdwn;fp=src%2Fexa%2Fcorrected_rectangles.mdwn;h=d57462f81e8d8306354a09c844fb506008f07b0c;hb=0ba56140ea9fea1c4838e9e0a0ec13a480113f70;hp=0000000000000000000000000000000000000000;hpb=7de4fb3ee2ef3df1eb635e79826a12be497640a3;p=cworth.org
diff --git a/src/exa/corrected_rectangles.mdwn b/src/exa/corrected_rectangles.mdwn
new file mode 100644
index 0000000..d57462f
--- /dev/null
+++ b/src/exa/corrected_rectangles.mdwn
@@ -0,0 +1,117 @@
+[[meta title="Correcting bugs in the rectangles test"]]
+
+[[tag cairo exa performance xorg]]
+
+Owen Taylor was kind enough to take a close look at my [[recent
+post|understanding_rectangles]] comparing the performance of EXA and
+NoAccel rectangle fills on an r100. He was also careful enough to
+notice that the results looked really fishy.
+
+Here are some the problems he noted from looking at the graphs:
+
+1. The EXA line looks to have an impossibly large fill rate
+
+2. The NoAccel line looks asymptotically linear rather than quadratic
+ as expected.
+
+3. No chart of numbers was provided to allow for any closer
+ examination.
+
+I went back to the code for my test case and did find a bug that
+explains some of the problems he saw. The random positioning of
+rectangles wasn't correctly accounting for their size to keep them
+within the visible portion of the window. So, as the rectangle gets
+larger the region that is likely to be clipped by the destination
+window also gets larger. And that explains the linear rather than
+quadratic growth.
+
+So here's a corrected version of the original graphs:
+
+[[rectangles-corrected-512.png]]
+
+And, again, a closer look at the small rectangles:
+
+[[rectangles-corrected-64.png]]
+
+And, this time I'll provide a chart of numbers as well:
+
+
+ Time to render 10000 rectangles with XRenderFillRectangles
+ |
---|
Rectangle size | NoAccel (ms) | EXA (ms)
+ |
---|
1x1 | 1.456 | 2.356
+ |
2x2 | 1.529 | 2.288
+ |
4x4 | 1.884 | 2.352
+ |
8x8 | 3.039 | 2.356
+ |
16x16 | 3.255 | 2.357
+ |
32x32 | 7.608 | 2.377
+ |
64x64 | 26.479 | 2.430
+ |
128x128 | 101.325 | 5.376
+ |
256x256 | 1295.105 | 22.549
+ |
512x512 | 15354.022 | 89.744
+ |
+
+So that addresses the second and third of Owen's issues. But what
+about that fill rate? First, how can I know my card's maximum fill
+rate? I'm told that the standard approach is to use `x11perf
+-rect500`. Let's see what that gives for NoAccel:
+
+ NoAccel $ x11perf -rect500
+ ...
+ 900 reps @ 6.1247 msec ( 163.0/sec): 500x500 rectangle
+
+And then for EXA:
+
+ $ x11perf -rect500
+ 3000 reps @ 1.9951 msec ( 501.0/sec): 500x500 rectangle
+
+So that shows fill rates of about 41M pixels/sec for NoAccel
+and about 125M pixels/sec for EXA, (`500*500*163 = 40750000`
+and `500*500*501 = 125250000`).
+
+Meanwhile, my results above for the 10000 512x512 rectangles give fill
+rates of 171M pixels/sec for NoAccel and 29210M pixels/sec for EXA,
+(`512*512*10000/15.354022 =~ 170733114` and `512*512*10000/.089744 =~
+29210197896`).
+
+So my test is reporting a NoAccel fill rate that is 4x faster than
+what x11perf reports, and an EXA fill rate that is 233x (!) faster
+than what x11perf reports. So, something is definitely still fishy
+here. A fill rate of close to 30 billion pixels/sec. from an old r100
+just cannot be possible, (as another datapoint, I just got a new Intel
+965 and with x11perf I measure a fill rate of 843 million
+pixels/sec. on it).
+
+So what could be happening here? It could be that my cairo-perf
+measurement framework is totally broken. It does at least seem to be
+returning consistent numbers from one run to the next, though. And the
+results do appear to have the correct trend as can be seen from these
+two graphs showing the measured fill rates:
+
+[[img fill-rates-cairo-perf.png]]
+
+[[img fill-rates-x11perf.png]]
+
+But again, notice from the Y-axis values of the cairo-perf plot that
+the numbers are just plain too large to be believed.
+
+I don't yet have a good answer for what could explain the difference
+here. I did notice that exaPolyFillRect converts the list of
+rectangles into a region which should prevent areas overlapped my
+multiple rectangles from being filled multiple times. For x11perf
+there is no overlap at 100x100 or smaller, but a lot of overlap at
+500x500. Similarly, the overlap gets more probable at larger sizes
+with the cairo-perf test. The existence of optimizations like that
+suggest that these tests might legitimately be able to report numbers
+larger than the actual fill rate of the video card.
+
+But that code should also be common whether calling
+XRenderFillRectangles like my cairo-perf test does, or XFillRectangles
+like the x11perf test does. So that optimization doesn't explain what
+I'm seeing here. (I also reran my cairo-perf test with
+XRenderFillRectangles changed to XFillRectangles and saw no
+difference.)
+
+Anybody have any ideas what might be going on here? Email me at
+ or the xorg list at ,
+([subscription required](http://lists.freedesktop.org/mailman/listinfo/xorg)
+of course).