1 [[meta title="Eliminating glyph fallbacks"]]
3 [[tag exa performance xorg i965]]
5 Sometimes things get worse before they get better.
7 A few days ago, I presented a patch for [storing glyphs as
8 pixmaps](http://cworth.org/exa/storing_glyphs_as_pixmaps/) which
9 improved performance, but not as dramatically as one would have hoped.
11 I profiled the result and found that there were still a lot of
12 software fallbacks going on. Tracking things down, (hints: enable
13 DEBUG\_TRACE\_FALL in xserver/exa/exa_priv.h and I830DEBUG in
14 xf86-video-intel/src/i830.h), I found a simple case statement that was
15 falling back to software for any compositing operation targeting an A8
16 buffer. Fortunately, it looks like this fallback was due to a
17 limitation in older graphics card that doesn't exist on the i965. So a
18 very simple [patch](Allow-i965-compositing-to-target-A8-buffers.patch)
19 eliminates the software fallback.
21 So lets take a look at before-and-after profiles:
23 <dl class="chart barchart">
24 <dt><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//system.oprofile">
25 aa10text-fallbacks/</a> (<a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//
26 timing">144000 chars./sec.</a>) <a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fall
27 backs//system.symbols">symbols profile</a></dt>
28 <dd style="width:65.9722%;">
30 <li class="libexa" style="width:27.1577%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libexa.oprofile">libexa</a><span>27%</span></li>
31 <li class="libpixman" style="width:26.8338%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libpixman.oprofile">libpixman</a><span>27%</span></li>
32 <li class="vmlinux" style="width:24.6667%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//vmlinux.oprofile">vmlinux</a><span>25%</span></li>
33 <li class="Xorg" style="width:4.9222%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//Xorg.oprofile">Xorg</a><span>5%</span></li>
34 <li class="libc-2_6" style="width:4.4754%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libc-2.6.oprofile">libc-2.6</a><span>4%</span></li>
35 <li class="oprofiled" style="width:3.4928%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//oprofiled.oprofile">oprofiled</a><span>3%</span></li>
36 <li class="intel_drv" style="width:2.5876%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//intel_drv.oprofile">intel_drv</a><span>3%</span></li>
37 <li class="other" style="width:5.8638%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//other.oprofile">other</a><span>6%</span></li>
40 <dt><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//system.oprofile">aa10text-no-fallbacks/</a> (<a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//timing">95000 chars./sec.</a>) <a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//system.symbols">symbols profile</a></dt>
41 <dd style="width:100%;">
43 <li class="vmlinux" style="width:42.1575%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//vmlinux.oprofile">vmlinux</a><span>42%</span></li>
44 <li class="intel_drv" style="width:26.7106%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//intel_drv.oprofile">intel_drv</a><span>27%</span></li>
45 <li class="libexa" style="width:7.9861%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//libexa.oprofile">libexa</a><span>8%</span></li>
46 <li class="librt-2_6" style="width:7.7359%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//librt-2.6.oprofile">librt-2.6</a><span>8%</span></li>
47 <li class="libc-2_6" style="width:5.3533%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//libc-2.6.oprofile">libc-2.6</a><span>5%</span></li>
48 <li class="Xorg" style="width:3.9202%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//Xorg.oprofile">Xorg</a><span>4%</span></li>
49 <li class="oprofiled" style="width:3.2670%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//oprofiled.oprofile">oprofiled</a><span>3%</span></li>
50 <li class="other" style="width:2.8694%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//other.oprofile">other</a><span>3%</span></li>
55 Yikes! The patch takes us from 144k chars/sec. to only 95k
56 chars/sec. I'm regressing performance! But look again, and see that
57 the libexa time has been cut dramatically, and the libpixman time has
58 been eliminated altogether. That's exactly what we would hope to see
59 for eliminating software fallbacks. So I've finally gotten this
60 text-rendering benchmark to involve no software fallbacks. Hurrah!
62 Meanwhile, the intel_drv and vmlinux time have increased
63 dramatically. Take a look at how hot those hotspots are in their
69 29614 41.2170 i965_prepare_composite
70 26641 37.0792 I830WaitLpRing
71 9143 12.7253 i965_composite
77 28775 25.3748 delay_tsc
78 21956 19.3616 system_call
79 7535 6.6446 getnstimeofday
82 So this is just the same, old [synchronous
83 compositing](http://cworth.org/exa/i965/synchronous_composite/) bug I
84 identified earlier. Performance has gotten worse since I'm stressing
85 out the driver and this bug more.
87 Dave Airlie has been doing some recent work that should let us fix
88 that bug once and for all. Hopefully it won't be too long before I can
89 actually post some positive progress here.
91 PS. I've also gotten one report that my patch for storing glyphs as
92 Pixmaps speeds glyph rendering up initially, but after the X server
93 has been running for about an hour or so, things get *really*
94 slow. Shame on me for not doing any testing more extensive than
95 starting the X server and then running a single client for a few
96 minutes, (either firefox or x11perf). The report is that most of the
97 time is disappearing into ExaOffscreenMarkUsed. Well the good news is
98 that Dave's work eliminates that function entirely, (along with lots
99 of migration code in EXA), so hopefully there's not any big problem to
100 fix there. I'll have to test more thoroughly after synching up with