]> git.cworth.org Git - cworth.org/blob - src/exa/i965/eliminating_glyph_fallbacks.mdwn
6c624c38fbf2c0b7f9455242f91d300fb3d007ab
[cworth.org] / src / exa / i965 / eliminating_glyph_fallbacks.mdwn
1 [[!meta title="Eliminating glyph fallbacks"]]
2
3 [[!tag exa performance xorg i965]]
4
5 Sometimes things get worse before they get better.
6
7 A few days ago, I presented a patch for [storing glyphs as
8 pixmaps](http://cworth.org/exa/storing_glyphs_as_pixmaps/) which
9 improved performance, but not as dramatically as one would have hoped.
10
11 I profiled the result and found that there were still a lot of
12 software fallbacks going on. Tracking things down, (hints: enable
13 DEBUG\_TRACE\_FALL in xserver/exa/exa_priv.h and I830DEBUG in
14 xf86-video-intel/src/i830.h), I found a simple case statement that was
15 falling back to software for any compositing operation targeting an A8
16 buffer. Fortunately, it looks like this fallback was due to a
17 limitation in older graphics card that doesn't exist on the i965. So a
18 very simple [patch](Allow-i965-compositing-to-target-A8-buffers.patch)
19 eliminates the software fallback.
20
21 So lets take a look at before-and-after profiles:
22
23 <dl class="chart barchart">
24     <dt><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//system.oprofile">
25 aa10text-fallbacks/</a> (<a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//
26 timing">144000 chars./sec.</a>) <a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fall
27 backs//system.symbols">symbols profile</a></dt>
28     <dd style="width:65.9722%;">
29         <ul>
30             <li class="libexa" style="width:27.1577%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libexa.oprofile">libexa</a><span>27%</span></li>
31             <li class="libpixman" style="width:26.8338%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libpixman.oprofile">libpixman</a><span>27%</span></li>
32             <li class="vmlinux" style="width:24.6667%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//vmlinux.oprofile">vmlinux</a><span>25%</span></li>
33             <li class="Xorg" style="width:4.9222%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//Xorg.oprofile">Xorg</a><span>5%</span></li>
34             <li class="libc-2_6" style="width:4.4754%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libc-2.6.oprofile">libc-2.6</a><span>4%</span></li>
35             <li class="oprofiled" style="width:3.4928%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//oprofiled.oprofile">oprofiled</a><span>3%</span></li>
36             <li class="intel_drv" style="width:2.5876%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//intel_drv.oprofile">intel_drv</a><span>3%</span></li>
37             <li class="other" style="width:5.8638%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//other.oprofile">other</a><span>6%</span></li>
38         </ul>
39     </dd>
40     <dt><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//system.oprofile">aa10text-no-fallbacks/</a> (<a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//timing">95000 chars./sec.</a>) <a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//system.symbols">symbols profile</a></dt>
41     <dd style="width:100%;">
42         <ul>
43             <li class="vmlinux" style="width:42.1575%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//vmlinux.oprofile">vmlinux</a><span>42%</span></li>
44             <li class="intel_drv" style="width:26.7106%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//intel_drv.oprofile">intel_drv</a><span>27%</span></li>
45             <li class="libexa" style="width:7.9861%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//libexa.oprofile">libexa</a><span>8%</span></li>
46             <li class="librt-2_6" style="width:7.7359%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//librt-2.6.oprofile">librt-2.6</a><span>8%</span></li>
47             <li class="libc-2_6" style="width:5.3533%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//libc-2.6.oprofile">libc-2.6</a><span>5%</span></li>
48             <li class="Xorg" style="width:3.9202%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//Xorg.oprofile">Xorg</a><span>4%</span></li>
49             <li class="oprofiled" style="width:3.2670%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//oprofiled.oprofile">oprofiled</a><span>3%</span></li>
50             <li class="other" style="width:2.8694%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//other.oprofile">other</a><span>3%</span></li>
51         </ul>
52     </dd>
53 </dl>
54
55 Yikes! The patch takes us from 144k chars/sec. to only 95k
56 chars/sec. I'm regressing performance! But look again, and see that
57 the libexa time has been cut dramatically, and the libpixman time has
58 been eliminated altogether. That's exactly what we would hope to see
59 for eliminating software fallbacks. So I've finally gotten this
60 text-rendering benchmark to involve no software fallbacks. Hurrah!
61
62 Meanwhile, the intel_drv and vmlinux time have increased
63 dramatically. Take a look at how hot those hotspots are in their
64 profiles:
65
66 intel_drv:
67
68         samples  %        symbol name
69         29614    41.2170  i965_prepare_composite
70         26641    37.0792  I830WaitLpRing
71         9143     12.7253  i965_composite
72         1618      2.2519  I830Sync
73
74 vmlinux:
75
76         samples  %        symbol name
77         28775    25.3748  delay_tsc
78         21956    19.3616  system_call
79         7535      6.6446  getnstimeofday
80         5109      4.5053  schedule
81
82 So this is just the same, old [synchronous
83 compositing](http://cworth.org/exa/i965/synchronous_composite/) bug I
84 identified earlier. Performance has gotten worse since I'm stressing
85 out the driver and this bug more.
86
87 Dave Airlie has been doing some recent work that should let us fix
88 that bug once and for all. Hopefully it won't be too long before I can
89 actually post some positive progress here.
90
91 PS. I've also gotten one report that my patch for storing glyphs as
92 Pixmaps speeds glyph rendering up initially, but after the X server
93 has been running for about an hour or so, things get *really*
94 slow. Shame on me for not doing any testing more extensive than
95 starting the X server and then running a single client for a few
96 minutes, (either firefox or x11perf). The report is that most of the
97 time is disappearing into ExaOffscreenMarkUsed. Well the good news is
98 that Dave's work eliminates that function entirely, (along with lots
99 of migration code in EXA), so hopefully there's not any big problem to
100 fix there. I'll have to test more thoroughly after synching up with
101 Dave.