From: Carl Worth Date: Fri, 1 Jun 2007 23:38:22 +0000 (-0700) Subject: Add mystery_solved blog entry X-Git-Url: https://git.cworth.org/git?p=cworth.org;a=commitdiff_plain;h=22d083a02435a0c0c57734d1fe463a9abb981baf Add mystery_solved blog entry --- diff --git a/src/exa/mystery_solved.mdwn b/src/exa/mystery_solved.mdwn new file mode 100644 index 0000000..fbff0c4 --- /dev/null +++ b/src/exa/mystery_solved.mdwn @@ -0,0 +1,59 @@ +[[meta title="Rectangles mystery solved"]] + +[[tag exa performance xorg]] + +So I found the answer to my [[fill rate +confusion|rectangles-corrected]] and it turned out to not be all that +interesting in the end---no pretty graphs this time. And it should +have been obvious to me---though admittedly the EXA run time was too +fast for me to _see_ what was happening. + +What I did was eliminate all the variables of the cairo-perf test +suite by writing a tiny [[standalone test case|rectangles.c]]. I +happened to be running an XAA server at the time, and when I ran the +test it gave exactly the same results as `x11perf -rect500`. They both +rendered 501 500x500 rectangles per second. But there was an obvious +difference, x11perf flashed wildly while my test stayed a constant +black. + +So a quick glance with xtrace---by the way this is the long [[sought +after|understanding_rectangles]] X protocol tracer that actually +decodes Render requests, and it's much easier to use than any of xmon, +xscope, or wireshark. Hurrah! (And many thanks to Behdad for pointing +it out to me). Anyway, xtrace showed immediately that my test was +sending rectangles in batches of 256 per request while x11perf was +sending only 1 per request. + +And how could I have missed the obvious fact that x11perf is +alternating black and white colors when drawing rectangles while my +program was sending only white? (Cue forehead smacking sounds.) I +changed my program to do the same, (the file linked to above contains +this change), and it now behaves exactly like x11perf. + +At least I was correct that the speedup I saw is due to the +optimization in EXA to avoid doing any redundant filling of rectangles +that overlap. So with that optimization guess what happens when you +send 256 rectangles that overlap almost entirely? Wow, it goes about +200 times faster. + +OK, so that's actually a pretty boring result. I can't see that it's +all that useful to send lots of overlapping rectangles to the X +server, (but if your application is doing this for any reason, use EXA +and it will go faster). + +Oh, and just to leave on another note of mystery. After I saw many +runs of both x11perf and my test agreeing on 501 rectangles/sec., after +a server restart I started getting 772 rectangles/sec. At first I +thought this was due to a different X server build and configuration +file that I had switched to, but when I switched back, the original +one also gave me 772 rectangles/sec. + +Incidentally, the 501 rectangles/sec. rate corresponds to the 125M +pixels/sec. fill rate I reported in my previous post. So now I'm +getting a 193M pixels/sec. fill rate and I have no idea what +changed. (And I'm also wondering what the expected maximum fill rate +is for an r100. Anyone know? I guess it depends on how fast the memory +is on my card, and I'm not exactly sure how fast it might be.) + + + diff --git a/src/exa/rectangles.c b/src/exa/rectangles.c new file mode 100644 index 0000000..df403e5 --- /dev/null +++ b/src/exa/rectangles.c @@ -0,0 +1,74 @@ +/* gcc `pkg-config --cflags --libs x11` -o rectangles rectangles.c */ +#include +#include +#include +#include +#include + +#define MEASUREMENTS 5 +#define ITERATIONS 4000 + +int +main (int argc, char **argv) +{ + Display *dpy; + Window win; + XSetWindowAttributes attr; + XEvent xev; + GC gc, gc_white, gc_black; + unsigned long gcmask = 0l; + XGCValues gcvalues; + struct timeval tv_start, tv_stop; + int i, j; + double elapsed; + XImage *ximage; + + dpy = XOpenDisplay (NULL); + if (dpy == NULL) { + fprintf (stderr, "Failed to open display %s\n", XDisplayName (NULL)); + exit (1); + } + + attr.override_redirect = True; + win = XCreateWindow (dpy, DefaultRootWindow (dpy), 0, 0, + 600, 600, 0, DefaultDepth (dpy, DefaultScreen (dpy)), + InputOutput, DefaultVisual (dpy, DefaultScreen (dpy)), + CWOverrideRedirect, &attr); + + gcmask |= GCFunction; + gcvalues.function = GXcopy; + gcmask |= GCForeground; + gcvalues.foreground = WhitePixel (dpy, DefaultScreen (dpy)); + gc_white = XCreateGC (dpy, win, gcmask, &gcvalues); + + gcvalues.foreground = BlackPixel (dpy, DefaultScreen (dpy)); + gc_black = XCreateGC (dpy, win, gcmask, &gcvalues); + + XMapWindow (dpy, win); + + gc = gc_black; + for (i = 0; i < MEASUREMENTS; i++) { + /* Nothing fancy here, just a gettimeofday system call. */ + gettimeofday (&tv_start, NULL); + for (j = 0; j < ITERATIONS; j++) { + XFillRectangle (dpy, win, gc, 0, 0, 500, 500); + if (gc == gc_white) + gc = gc_black; + else + gc = gc_white; + } + /* Standard 1x1 XGetImage to wait for rendering to complete. */ + ximage = XGetImage (dpy, win, 0, 0, 1, 1, AllPlanes, ZPixmap); + gettimeofday (&tv_stop, NULL); + XDestroyImage (ximage); + + elapsed = tv_stop.tv_sec - tv_start.tv_sec + + (tv_stop.tv_usec - tv_start.tv_usec) / 1000000.0; + printf ("%d iterations @ %.4f msec ( %.1f/sec): 500x500 XFillRectangle\n", + ITERATIONS, elapsed * 1000 / ITERATIONS, ITERATIONS / elapsed); + } + + XCloseDisplay (dpy); + + return 0; +}