X-Git-Url: https://git.cworth.org/git?a=blobdiff_plain;f=src%2Fexa%2Fmystery_solved.mdwn;fp=src%2Fexa%2Fmystery_solved.mdwn;h=fbff0c4cb9d779de0cbce64e53ac00fa5d469742;hb=22d083a02435a0c0c57734d1fe463a9abb981baf;hp=0000000000000000000000000000000000000000;hpb=c9983e3951226b2fb8c3388b3ada616cb4927591;p=cworth.org diff --git a/src/exa/mystery_solved.mdwn b/src/exa/mystery_solved.mdwn new file mode 100644 index 0000000..fbff0c4 --- /dev/null +++ b/src/exa/mystery_solved.mdwn @@ -0,0 +1,59 @@ +[[meta title="Rectangles mystery solved"]] + +[[tag exa performance xorg]] + +So I found the answer to my [[fill rate +confusion|rectangles-corrected]] and it turned out to not be all that +interesting in the end---no pretty graphs this time. And it should +have been obvious to me---though admittedly the EXA run time was too +fast for me to _see_ what was happening. + +What I did was eliminate all the variables of the cairo-perf test +suite by writing a tiny [[standalone test case|rectangles.c]]. I +happened to be running an XAA server at the time, and when I ran the +test it gave exactly the same results as `x11perf -rect500`. They both +rendered 501 500x500 rectangles per second. But there was an obvious +difference, x11perf flashed wildly while my test stayed a constant +black. + +So a quick glance with xtrace---by the way this is the long [[sought +after|understanding_rectangles]] X protocol tracer that actually +decodes Render requests, and it's much easier to use than any of xmon, +xscope, or wireshark. Hurrah! (And many thanks to Behdad for pointing +it out to me). Anyway, xtrace showed immediately that my test was +sending rectangles in batches of 256 per request while x11perf was +sending only 1 per request. + +And how could I have missed the obvious fact that x11perf is +alternating black and white colors when drawing rectangles while my +program was sending only white? (Cue forehead smacking sounds.) I +changed my program to do the same, (the file linked to above contains +this change), and it now behaves exactly like x11perf. + +At least I was correct that the speedup I saw is due to the +optimization in EXA to avoid doing any redundant filling of rectangles +that overlap. So with that optimization guess what happens when you +send 256 rectangles that overlap almost entirely? Wow, it goes about +200 times faster. + +OK, so that's actually a pretty boring result. I can't see that it's +all that useful to send lots of overlapping rectangles to the X +server, (but if your application is doing this for any reason, use EXA +and it will go faster). + +Oh, and just to leave on another note of mystery. After I saw many +runs of both x11perf and my test agreeing on 501 rectangles/sec., after +a server restart I started getting 772 rectangles/sec. At first I +thought this was due to a different X server build and configuration +file that I had switched to, but when I switched back, the original +one also gave me 772 rectangles/sec. + +Incidentally, the 501 rectangles/sec. rate corresponds to the 125M +pixels/sec. fill rate I reported in my previous post. So now I'm +getting a 193M pixels/sec. fill rate and I have no idea what +changed. (And I'm also wondering what the expected maximum fill rate +is for an r100. Anyone know? I guess it depends on how fast the memory +is on my card, and I'm not exactly sure how fast it might be.) + + +