Carl Worth [Tue, 12 Nov 2013 22:03:40 +0000 (14:03 -0800)]
Don't print a frame-time report for the first frame after glXMakeCurrent
Fips was already dropping the first frame's report (waiting for
another frame to go by to ensure that all the queries are available in
order to compute deltas).
But we were getting wildly out-of-range frame timings for later frames
when the application switched contexts, (since fips was subtracting
query values obtained in one context frome value obtained in another
context). We fix this by dropping one frame-time report at every
context change.
We may very well want to use a different time source to obtain frame
times, since (for DOTA 2 at least) the reports from the dropped frames
would be particularly interesting to see (such as large frames
compiling many shaders or uploading a lot of texture data).
At least we're no longer getting the wild numbers that throw off the
graph scale so badly.
Carl Worth [Tue, 12 Nov 2013 18:30:18 +0000 (10:30 -0800)]
Use consistent argument order for subtract functions and macros.
We were calling subtract_timestamp() on the line just before
TIMESPEC_DIFF() but with reversed argument order in the two. This
could obviously lead to convention. So we stick adopt the (a - b)
convention for both, (as well as the name "subtract" for both).
Carl Worth [Tue, 12 Nov 2013 18:20:10 +0000 (10:20 -0800)]
Fix CPU load reported on initial frame.
We were seeing a result like this:
frame: 0 21.0526 20.9131 -4.54586e+17 0
while the frame obviously didn't take negative time.
The bug was subtracting an unintialized
metrics->previous_cpu_time_ts. In order to have this in place by the
end of the first frame, we need to do an initial query of the CPU time
when first creating our metrics object, (before starting the first
frame).
Carl Worth [Sat, 9 Nov 2013 20:23:53 +0000 (12:23 -0800)]
Capture and print CPU load on a per-frame basis.
We use clock_gettime to measure the amount of CPU time accrued by the
active process. Then, by subtracting the value between succesive frame
times and dividing by the total wall-clock time for the frame, we
determine a measure of the per-frame CPU load from 0.0 to 1.0.
We now print the frame number, frame time, frame latency, and CPU load
in a single-line format for simpler parsing, (with a header comment
describing the names and units for each field).
Carl Worth [Tue, 5 Nov 2013 17:56:07 +0000 (09:56 -0800)]
Add per-frame time and latency measurements
Timing is measured with a glQueryCounter(GL_TIMESTAMP) on every frame.
Latency is measured by comparing glGetIntger(GL_TIMESTAMP) which gets
the timestamp synchronously to a glQueryCounter(GL_TIMESTAMP) which
gets the timestamp asynchronously by inserting a command into the
OpenGL queue.
Carl Worth [Fri, 1 Nov 2013 23:12:12 +0000 (16:12 -0700)]
fips-dispatch: Add functions necessary for GL_TIMESTAMP queries
We're about to start making GL_TIMESTAMP queries in fips, so we need
the GL_TIMESTAMP enum, as well as glQueryCounter, glGetIntegerv64, and
glGetQueryObjecti64v.
While at it, add a couple of related functions such as
glGetQueryObjectiv, and glGetQueryObjectui64v.
Carl Worth [Fri, 1 Nov 2013 17:42:38 +0000 (10:42 -0700)]
Tiny refactoring of metrics_end_frame interface
Previously, the callers were stopping and starting the counter outside
of the call to metrics_end_frame. There's less code duplication and
more robustness by moving the counter stop and counter start into the
metrics_end_frame function itself.
Carl Worth [Sat, 9 Nov 2013 16:26:30 +0000 (08:26 -0800)]
Remove unused parameter from RESTORE_METRICS_OP macro.
Oddly enough, this wasn't actually having any effect. Apparently it's
legitimate to pass empty arguments to function-like macros. But this
wasn't the intention here. I had intended for this macro to never
accept any argument.
Carl Worth [Fri, 8 Nov 2013 19:16:27 +0000 (11:16 -0800)]
Allow fips to compile with OpenGL without GLfixed
The Mesa 9.1 releases don't include the GLfixed datatype, so fips was
failing to compile aginst those versions of Mesa, (since it was trying
to add wrappers using the GLfixed datatype).
But since the underlying functions don't exist in libGL.so, fips
doesn't need the wrappers anyway.
In this commit, we key off of the GL_GLEXT_VERSION macro from glext.h
to decide whether or not to include these newer wrappers. This fixes
compilation of fips on Mesa 9.1.
Carl Worth [Mon, 4 Nov 2013 23:44:23 +0000 (15:44 -0800)]
Fix fips to work even without the GL_AMD_performance_monitor extension.
Even without the extension, we can still do timer queries and print
the time spent in each operation. So detect that the extension is
available, and then use that information to avoid calling into any
functions that are only made available with that extension.
The first report generated with a context that does not have the
extension will include a warning that the extension was not available.
Carl Worth [Sat, 2 Nov 2013 21:36:47 +0000 (14:36 -0700)]
Fix to still print metrics when there are no per-stage cycle-count metrics
There are at least two cases where fips will have no per-stage cycle counts:
1. The AMD_performance_monitor extension is not available
2. The extension is there, but fips cannot find any per-stage
cycle counters
In either case, the previous code would print no timing
information. The problem is that fips was using the number of detected
per-stage cycle counters (in num_shader_stages) as the basis for
allocation of the results array. So with this value being 0, nothing
was allocated, nothing was stored, and nothing was printed.
Here, we fix this by instead allocating space for one result per
operation, and ensuring that ll of the measured time is reported for
that operation.
Carl Worth [Mon, 4 Nov 2013 22:48:01 +0000 (14:48 -0800)]
Remove the context_get_current function
This completes the recent re-factoring begun a few commits ago. Now
the layer violations are eliminated, with the metrics.c code always
accepting a metrics_t* and never reachin up into context.c to find a
global data structure.
Carl Worth [Mon, 4 Nov 2013 22:27:31 +0000 (14:27 -0800)]
Add a pointer to metrics_info_t from metrics_t
This removes one common use of context_t from the metrics.c code, so
we're one step closer to have clean interfaces here, (rather than
everybody reaching into a global context_ structure).
Carl Worth [Mon, 4 Nov 2013 22:17:07 +0000 (14:17 -0800)]
Hide the metrics_t data structure in metrics.c
Previously, this structure was exposed in metrics.h, which was
obviously not very clean. With this structure now private to
metrics.c, the code should be easier to both read and maintain.
Carl Worth [Mon, 4 Nov 2013 22:06:06 +0000 (14:06 -0800)]
Push oustanding-counter data down from context.c into metrics.c
Here, we introduce a new metrics_t structure which is responsible for
kepping track of lists of outstanding counters as well as the array of
collected counter results.
Now, context_t is much cleaner, containing only the ID of the system
context, the metric_info_t and the metrics_t.
Carl Worth [Mon, 4 Nov 2013 21:21:13 +0000 (13:21 -0800)]
Begin re-factoring metrics.c into separate context.c and metrics-info.c
The code in metrics.c was getting a bit unwieldy. Some of it is pushed
up into the existing context.c file. Other portions (specifically, the
code which queries the names of all available performance monitors) is
pushed down into a new metrics-info.c.
There's not yet very hard boundaries between these files yet, (they
are all sharing their internals in header files), but this gives some
structure for future cleanups.
Carl Worth [Wed, 30 Oct 2013 21:39:47 +0000 (14:39 -0700)]
Collect any available results before switching contexts.
Rather than missing out on any measured results by just throwing them
away, we collect anything that is ready.
This way, if any queries don't actually have results ready, we will
throw those away, since we're not going to be able to get a meaningful
result from them with the current context going away.
Carl Worth [Wed, 30 Oct 2013 21:35:21 +0000 (14:35 -0700)]
Fix resource leaks when switching contexts.
Previously, fips was already freeing memory that it had allocated for
its own linked lists of outstanding queries when switching contexts.
In addition, in this commit, we also now call End on any active
timer-query/performance-monitor and then call Delete on all queries
for which we have not previously collected results.
This avoids leaks within the OpenGL implementation as it holds on to
results that fips will never ask for.
Carl Worth [Wed, 30 Oct 2013 21:24:52 +0000 (14:24 -0700)]
Collect timer/monitor results whenever there are >1000 outstanding
Previously, fips always waited for a frame boundary before collecting
timer and monitor results. Now, whenever more than a maximum (set to
1000 here) number of monitors have been fired off, but no results
collected, fips will check and collect results for all timers/monitors
that have results available.
Here's some background on the debugging that led to this change:
With an apitrace collected from "DOTA 2" we ran into crashes, always
on the first frame of the game proper (after the opening menus,
etc.). This frame is unusually large, (roughly half a million OpenGL
calls).
With that large frame, and the resulting large number of outstanding
queries waiting to be collected, we were running into a resource
limit and Mesa's performance-monitor code was crashing on an
unexpectedly NULL bo->virtual pointer.
A little digging determined that a DRM map ioctl was failing due to
the map_count resource in the kernel being larger than the
configured default (roughly 65530).
After checking that neither fips nor Mesa was leaking any large
number of buffer objects, (nor keeping many mapped), we decided to
attempt this more aggressive collection of results in fips.
As far as resource consumption in general, this does seem like a
reasonable thing to do. If we have hundreds of outstanding queries,
surely the oldest of them have completed, and we can free some
resources by collecting those.
On the other hand, it still seems wrong that the kernel is imposing
an arbitrary limit on how many outstanding queries an application
can have. The AMD_performance_monitor specification and
implementation are not intended to have any such limitation. So,
there's still some investigation to be done on what resource is
causing the kernel's map_count to grow so large and to see if
there's a bug there to be fixed.
Carl Worth [Thu, 31 Oct 2013 22:35:39 +0000 (15:35 -0700)]
Fix to print metrics for operations with no per-stage cycle counts
Operations like glTexImage* get a valid time from the timer query, but
get performance counter numbers of zero, (since the operation is
performed in a blit batch which cannot have performance-monitor
operations in it).
We had code in place to protect any divide-by-zero in this case, but
that case was mistakenly setting the resulting time to 0, so any
operations like this were not having their time reported.
To fix this, we can't compute any per-stage time, so we arbitrarily
use stage 0 as the place to store 100% of the time spent, but we
update this per-stage metric value to point to a NULL per-stage name
to avoid any lie in the report.
Carl Worth [Mon, 28 Oct 2013 21:34:26 +0000 (14:34 -0700)]
Add a new context.c file with context_enter and context_leave functions
So far, this just factors out some duplicated code from glxwrap.c and
eglwrap.c into the new context_enter/leave functions.
Eventually, some of the code currently living in metrics.c should
migrate up into context.c, (such as the global current_context
variable in metrics.c).
Additionally, the context.c layer will give us a natural place to
query things such as "is the AMD_performance_monitor extension
available?".
Carl Worth [Fri, 25 Oct 2013 22:12:25 +0000 (15:12 -0700)]
dispatch: Fix dispatcher to perform lookup for the GetProcAddress functions
Previously, the fips dispatch code was directly calling
glxGetProcAddressARB and eglGetProcAddress. This meant that the
dispatch code was calling into fips's own version of these functions.
Up until now, that has worked fine, since fips was not implementing
wrappers for any of the functions supported by fips-dispatch, so
fips's GetProcAddress functions were successfully calling the "real"
GetProcAddress functions and the dispatch code was calling the real
functions.
However, we're about to start adding wrappers for functions that are
also dispatched, (such as glBeginQuery). At this point, it would be
incorrect for the dispatch code to return the fips-wrapped
versions. The whole point of wrapping these functions is to make the
application calls into these functions different than the fips calls
into the real functions (through the dispatch).
To fix this, we ensure that the dispatch code calls glwrap_lookup or
eglwrap_lookup to locate the "real" GetProcAddress functions which in
turn ensures that the dispatch code will never resolve to a wrapped
function.
Carl Worth [Fri, 25 Oct 2013 22:34:17 +0000 (15:34 -0700)]
Fix buffer overrun in accumulate_program_metrics
The convention for the op_metrics array in the context is that callers
do not index it directly, but instead call ctx_get_op_metrics (which
will grow tha array if needed first).
Carl Worth [Thu, 24 Oct 2013 02:47:20 +0000 (19:47 -0700)]
Cleanup outstanding counters at context change.
Without this, and given an application that calls glxMakeCurrent (or
similar) the implementation gets quite confused as fips starts
requesting query results for counter IDs that were only valid for the
previous context.
Carl Worth [Wed, 23 Oct 2013 16:23:44 +0000 (09:23 -0700)]
Add support for performance counters of types other than uint32_t
The AMD_performance_monitor extension also allows counters of type
uin64_t, float, and percentage (which is the same data-type as float).
Fips was already storing the expected type in the group's
counter_types array, so it's a simple matter to look at that and read
a value of the expected type.
Carl Worth [Wed, 23 Oct 2013 21:06:22 +0000 (14:06 -0700)]
Track glext.h ABI changes
I cannot fathom why some internalFormat values changed from GLenum to
GLint while others changed from GLint to GLenum. But, fortunately,
glext.h includes a version field so that we can track this.
Kenneth Graunke [Wed, 23 Oct 2013 19:38:37 +0000 (12:38 -0700)]
Fix conversion from group IDs to group array indices.
The loop that found the array index for a particular group based on the
group ID had a subtle bug: it compared against "i" instead of group_id.
In the i965 implementation, the first group happens to have ID 0, which
meant that the loop would always select the first group (since the ID
equals the array index). This led to assertion failures about the
number of counters in each group.
Carl Worth [Wed, 23 Oct 2013 03:25:56 +0000 (20:25 -0700)]
Perform reporting on a per-shader-stage basis
We use the per-shader-stage performance counters to determine a
relative portion of time that each operation spends in each shader
stage. These portions are then used to multiply the time measured (via
timer query) for each operation to determine a per-shader time. Then,
all the per-shader-stage operations are sorted by these computed times
and printed in the report.
We also print a "% active" value for each shader stage.
The remaining performance counters (other than per-stage active and
stall) are now no longer printed by default. If these are desired,
they can be obtained by passing the --verbose option to the fips
binary or by setting the FIPS_VERBOSE environment variable to a value
of 1.
- /* Since we sparsely fill the array based on program
- * id, many "programs" have no time.
- */
- if (metric->time_ns == 0.0)
+ /* Don't print anything for stages with no alloted time. */
+ if (per_stage->time_ns == 0.0)
return;
- printf ("\t%7.2f ms (% 2.1f%%)",
- metric->time_ns / 1e6,
- metric->time_ns / total * 100);
+ printf ("\t%7.2f ms (%4.1f%%)",
+ per_stage->time_ns / 1e6,
+ per_stage->time_ns / total * 100);
+
+ if (per_stage->active)
+ printf (", %4.1f%% active", per_stage->active * 100);
+
+ printf ("\n");
+
+ /* I'm not seeing a lot of value printing the rest of these
+ * performance counters by default yet. Use --verbose to get
+ * them for now. */
+ if (! verbose)
+ return;
printf ("[");
for (group_index = 0; group_index < info->num_groups; group_index++) {
group = &info->groups[group_index];
for (counter = 0; counter < group->num_counters; counter++) {
+
+ /* Don't print this counter value if it's a
+ * per-stage cycle counter, (which we have
+ * already accounted for). */
+ if (_is_shader_stage_counter (info, group_index, counter))
+ continue;
+
value = metric->counters[group_index][counter];
if (value == 0.0)
continue;
@@ -511,27 +659,97 @@ print_op_metrics (context_t *ctx, op_metrics_t *metric, double total)
printf ("]\n");
}
+static int
+time_compare(const void *in_a, const void *in_b, void *arg unused)
+{
+ const per_stage_metrics_t *a = in_a;
+ const per_stage_metrics_t *b = in_b;
+
+
+ if (a->time_ns < b->time_ns)
+ return -1;
+ if (a->time_ns > b->time_ns)
+ return 1;
+ return 0;
+}
+
static void
print_program_metrics (void)
{
context_t *ctx = ¤t_context;
- int *sorted; /* Sorted indices into the ctx->op_metrics */
- double total = 0;
- unsigned i;
-
- /* Make a sorted list of the operations by time used, and figure
- * out the total so we can print percentages.
+ metrics_info_t *info = &ctx->metrics_info;
+ unsigned num_shader_stages = info->num_shader_stages;
+ per_stage_metrics_t *sorted, *per_stage;
+ double total_time, op_cycles;
+ op_metrics_t *op;
+ unsigned group_index, counter_index;
+ unsigned i, j, num_sorted;
+
+ /* Make a sorted list of the per-stage operations by time
+ * used, and figure out the total so we can print percentages.
*/
- sorted = calloc(ctx->num_op_metrics, sizeof(*sorted));
+ num_sorted = ctx->num_op_metrics * num_shader_stages;
+
+ sorted = xmalloc (sizeof (*sorted) * num_sorted);
+
+ total_time = 0.0;
+
for (i = 0; i < ctx->num_op_metrics; i++) {
- sorted[i] = i;
- total += ctx->op_metrics[i].time_ns;
+
+ op = &ctx->op_metrics[i];
+
+ /* Accumulate total time across all ops. */
+ total_time += op->time_ns;
+
+ /* Also, find total cycles in all stages of this op. */
+ op_cycles = 0.0;
+
+ for (j = 0; j < num_shader_stages; j++) {
+ /* Active cycles */
+ group_index = info->stages[j].active_group_index;
+ counter_index = info->stages[j].active_counter_index;
+ op_cycles += op->counters[group_index][counter_index];
+
+ /* Stall cycles */
+ group_index = info->stages[j].stall_group_index;
+ counter_index = info->stages[j].stall_counter_index;
+ op_cycles += op->counters[group_index][counter_index];
+ }
+
+ for (j = 0; j < num_shader_stages; j++) {
+ double active_cycles, stall_cycles, stage_cycles;
+
+ /* Active cycles */
+ group_index = info->stages[j].active_group_index;
+ counter_index = info->stages[j].active_counter_index;
+ active_cycles = op->counters[group_index][counter_index];
+
+ /* Stall cycles */
+ group_index = info->stages[j].stall_group_index;
+ counter_index = info->stages[j].stall_counter_index;
+ stall_cycles = op->counters[group_index][counter_index];
+
+ stage_cycles = active_cycles + stall_cycles;
+
+ per_stage = &sorted[i * num_shader_stages + j];
+ per_stage->metrics = op;
+ per_stage->stage = &info->stages[j];
+ if (op_cycles)
+ per_stage->time_ns = op->time_ns * (stage_cycles / op_cycles);
+ else
+ per_stage->time_ns = 0.0;
+ if (stage_cycles)
+ per_stage->active = active_cycles / stage_cycles;
+ else
+ per_stage->active = 0.0;
+ }
}
- qsort_r(sorted, ctx->num_op_metrics, sizeof(*sorted),
- time_compare, ctx->op_metrics);
- for (i = 0; i < ctx->num_op_metrics; i++)
- print_op_metrics (ctx, &ctx->op_metrics[sorted[i]], total);
+ qsort_r (sorted, num_sorted, sizeof (*sorted),
+ time_compare, ctx->op_metrics);
+
+ for (i = 0; i < num_sorted; i++)
+ print_per_stage_metrics (ctx, &sorted[i], total_time);
Carl Worth [Wed, 23 Oct 2013 01:47:40 +0000 (18:47 -0700)]
Fix for an implementation with non-contiguous group ID values.
It seems crazy to me that group IDs (being integers) can be anything
other than [0 .. num_groups - 1], but the specification is written
with full generality here.
The code was already querying the group ID values originally, and
assigning those to each group->id slot in the metric_group_info_t
structure. But after that, the code had been assuming it could just
use values from 0 .. num_groups-1.
Fix this by carefully using group_index values ([0..num_groups-1])
when indexing into the various arrays and group->id values when
passing ID values to the various performance-monitor API functions.
Carl Worth [Wed, 23 Oct 2013 00:44:04 +0000 (17:44 -0700)]
Free all fip-allocated data when the program exits
This isn't strictly necessary since the operating system is about to
reclaim all of this data anyway.
The only real advantage of doing this is that it enables us to see in
a valgrind report that there aren't any memory leaks due to direct
allocation by code within fips.
Carl Worth [Wed, 23 Oct 2013 00:23:46 +0000 (17:23 -0700)]
Print performance-counter names in report
This isn't necessarily a very useful way to see the numbers. The
important part of the code here is that fips is now querying the names
so that it can do some useful interpretation of the values based on
the names.
Carl Worth [Tue, 22 Oct 2013 23:30:34 +0000 (16:30 -0700)]
Un-nest an inner loop while printing program metrics
Before we add more code to complicate the way we print performance
counters, it helps to have this code in its own function, (where we
can safely use 'i' instead of 'j' for loop-control variable, etc.).
There was an extra 's' in here before (GroupsString instead of
GroupString), preventing these functions from being used
correctly. Fix this, (since fips will soon be using this function).
Carl Worth [Tue, 22 Oct 2013 19:19:37 +0000 (12:19 -0700)]
Respect GLAZE_LIBGL environment variable (if FIPS_LIBGL is unset)
Since the LD_PRELOAD mechanism of fips may not work with some
programs, users may want to run fips within glaze instead, (which uses
LD_LIBRARY_PATH instead of LD_PRELOAD). In order to make this
convenient, fips can recognize that glaze has identified the
appropriate libGL.so library by examining the GLAZE_LIBGL environment
variable.
So, if the user has not specifically set the FIPS_LIBGL variable, and
the GLAZE_LIBGL variable is set, use it to find the libGL.so to load.
Carl Worth [Tue, 22 Oct 2013 19:18:23 +0000 (12:18 -0700)]
Ensure that the name "fips" appears in all error messages.
With the number of wrappers potentially involved, (fips, glaze,
apitrace, etc.), sometimes it can be ambiguous which error messages
belong to which wrappers.
Ensure that fips, at least, always advertises its own name in its
error messages.
Carl Worth [Tue, 22 Oct 2013 18:04:25 +0000 (11:04 -0700)]
Add collection of (AMD_performance_monitor) performance counters to fips
The implementation involves a linked-list of outstanding
performance-monitor queries next to the existing list of outstanding
timer queries.
The results from the performance counters are stored (without any
interpretation in an array of values next to the existing time values
within each op_metrics_t value for each operation.
The numbers are currently printed with simple counter numbers (no
names and no units) and with the values divided by 1e6. Counters with
values of zero are not printed.
Next steps from here that will make things useful:
1. Use relative number of cycels in each stage to apportion measured
shader time among the various stages, (so that per-stage time
numbers are sorted in the final report).
2. Print percentage active, (by looking at per-stage active and stall times)
3. Print names for counters (other than per-stage active and stall
which will be used in the above two calculations).
4. Fix to silently ignore performance counters if the
AMD_performance_monitor extension is not available.
Carl Worth [Tue, 22 Oct 2013 17:41:16 +0000 (10:41 -0700)]
fips-dispatch: Simplify dispatch code by abstracting resolve functions
All of the resolve functions were structured identically, so rather
than repating the function bodies over and over, we can use a simple
"resolve" macro to implement this code. This gives a net reduction in
source code for better readability and maintainability.
Carl Worth [Wed, 16 Oct 2013 20:03:15 +0000 (13:03 -0700)]
Restore metrics op after temporarily changing for non-shader operation
This fixes the bug where an operation such as glClear would
incorrectly accrue all subsequent time until the next call to
glUseProgram would change the op away from glClear.
Now, each non-shader operation that changes the metrics operation
restores it to its previous value immediately afterward.
Carl Worth [Wed, 16 Oct 2013 03:23:08 +0000 (20:23 -0700)]
Aggregate non-shader GPU operations into their own operations
These operations are named after representative functions, (glClear,
glReadPixels, etc.). Aggregate time spent in any of these operations
is sorted together with the existing reports for time spent in
particular shader programs.
The shader program times should now be more accurate since time spent
in operations such as glClear will now no longer be accumulated into
the most recently-used shader.
Carl Worth [Tue, 15 Oct 2013 20:45:06 +0000 (13:45 -0700)]
Rework timer queries to run continuously.
Previously, we ran timer queries only around gl calls that we
determined were "drawing operations". This had the following
drawbacks:
1. Lots of timer queries, so more overhead than desired
2. Misses accumulating time from non "drawing operations"
3. Misses accumulating time from drawing operations we failed to
identify.
Here, instead, we keep the timer running continuously, solving all
three of the above problems. We first start timer at make-current time
(glxMakeCurrent, glxMakeContextCurrent, or eglMakeCurrent), and only
stop it, (and immediately restart it), in one of the following
circumstances:
1. Current program changed (glUseProgram or glUseProgramObjectARB)
2. At each frame end (glxSwapBuffers or eglSwapBuffers)
Carl Worth [Tue, 15 Oct 2013 20:20:33 +0000 (13:20 -0700)]
Simplify metrics interface by dropping metrics_counter_new
None of the callers of this function were doing anything with the
returned value other than passing it directly to
metrics_counter_start. So it's simpler to just fold the contents of
the metrics_counter_new function into the body of
metrics_counter_start itself.
Carl Worth [Mon, 7 Oct 2013 22:57:34 +0000 (15:57 -0700)]
metrics: Use a more meaningful field name.
I had "ticks" here before I knew the units of the timer-query
result. Since then, Eric dug up the documentation saying that this
timer reports time in nanoseconds. So use a field name of "time_ns"
rather than "ticks".
Eric Anholt [Tue, 27 Aug 2013 18:43:40 +0000 (11:43 -0700)]
Report what the actual units are.
"When the timer query timer is finally stopped, the elapsed time
(in nanoseconds) is written to the corresponding query object as
the query result value"
Carl Worth [Mon, 7 Oct 2013 22:39:12 +0000 (15:39 -0700)]
configure: Fix generated comment for BINDIR_TO_LIBFIPSDIR
The variable names were misspelled before, (incorrect case), so the
comment was being generated with empty values for the two variables,
(making it less than useful).
Carl Worth [Tue, 27 Aug 2013 20:14:53 +0000 (13:14 -0700)]
glxwrap: Initialize fips_dispatch when glxMakeContextCurrent is called
Previously, we only intitialized fips_dispatch if glxMakeCurrent was
called. This caused fips to fail for programs that called
glxMakeContextCurrent instead. Both functions are now handled
indentically, (giving fips a clear indication that GLX is being used,
not EGL).
This fixes the failure of fips with Lightsmark 2008.
Carl Worth [Mon, 5 Aug 2013 17:23:10 +0000 (10:23 -0700)]
dlwrap: Add new dlwrap_dlopen_libfips function
Previously, two different pieces of fips code (both for dlopen and for
glXGetProcAddress) both needed to dlopen the fips library
itself. However, the two pieces were implemented differently, (one
passed a symbol to dladdr to find a filename to dlopen, the other just
passed NULL to dlsym and hopef for the best).
Make things consistent by having a single, shared implementation in
the new function dlwrap_dlopen_libfips, (and implement it with the
more reliable approach of calling dladdr and then the real dlopen).
Carl Worth [Wed, 3 Jul 2013 00:38:28 +0000 (17:38 -0700)]
dlwrap: Add "libGLESv2.so" to the list of supported wrapped libraries
This hooks up libGLESv2 to all of our dlsym machinery. It ensures that
we can intercept any dlsym calls into libGLESv2. It also ensures that
when glwrap looks for underlying, "real", GL functions it will look
into libGLESv2.so if that's the library that the application has
previously dlopened.
This commit fixes the egl-glesv2-dlopen-dlsym and
egl-glesv2-dlopen-gpa tests in the test suite.
Carl Worth [Tue, 2 Jul 2013 20:07:40 +0000 (13:07 -0700)]
test: Add 4 tests using EGL and OpenGLESv2
These are similar variants to the four existing tests using EGL and OpenGL.
To add these tests we add a new configure-time check to find the
compilation flags for GLESv2. We also drop the set_2d_projection code
which was using glLoadIdentity, glMatrixMode, and glOrtho functions
which apparently don't exist in GLESv2. So common.s and all tests with
custom wrappers are modified to drop these calls.
As with the egl-opengl tests, all new tests except for the dlsym-based
test pass. That's not too surprising since there are so many twisty
paths in trying to get all the dlopen/dlsym stuff to work correctly.
Carl Worth [Wed, 3 Jul 2013 00:33:12 +0000 (17:33 -0700)]
glwrap: Don't hardcode "libGL.so.1" for looking up real OpenGL symbols
As preparation for testing using GLESv2 we need to ensure that our GL
wrappers are prepared for an OpenGL implementation in either the
libGL.so.1 library or in libGLESv2.so.2.
When the application is directly linked to an OpenGL implementation,
we don't care about the name at all. In this case, we can simply call
dlsym with RTLD_NEXT to find the real, underlying OpenGL symbols.
But when an application uses dlopen to load the OpenGL library, we
want to carefully call dlsym with a handle for the same library that
the application uses. Previously, the glwrap code was unconditionally
calling dlopen for "libGL.so" and that's not what we want.
Instead, we now have our dlopen wrapper watch for any dlopen of a
library whose name begins with "libGL" and then stashing the returned
handle via a new glwrap_set_gl_handle call. The stashed handle will
then be used by dlsym calls within glwrap.
Carl Worth [Tue, 2 Jul 2013 19:45:18 +0000 (12:45 -0700)]
configure: Drop broken workarounds for missing pkg-config
Any reasonably-modern system should have versions of things like OpenGL
libraries installed with pkg-config libraries.
Regardless, the checks we had in place here for missing gl.pc files
were untested and obviously not very useful, (they didn't actually
look around anywhere for GL headers nor for GL libraries).
We're better off not even pretending to be able to find things without
pkg-config.
Carl Worth [Tue, 2 Jul 2013 19:28:03 +0000 (12:28 -0700)]
fips: Fix dlsym wrapper for egl symbols
Previously, fips was failing to provide its own wrapped versions for
functions such as eglMakeCurrent if the application was using dlsym to
locate the symbol. This led to the failure of the egl-opengl-dlopen-*
tests in the test suite.
The root of the problem was that the fips wrapper for dlopen was only
returning the libfips_handle if dlopen was requested for
"libGL.so". But here, we need to also intercept a dlopen for
"libEGL.so" as well.
However, the fix is not as simple as updating dlopen.
Previously, if dlsym failed to find a libfips-specific version of the
symbol of interest it would defer unconditionally to a call to the
real dlsym with a handle dlopened from "libGL.so". That's obviously
the wrong thing for symbols sougth from "libEGL.so". So, now, our
dlopen caches the originally dlopen'ed handles and encodes an index
into its return value so that the final dlsym can reference the
correct handle in order to find its symbol.
This commit fixes the egl-opengl-dlopen-dlsym and
egl-opengl-dlopen-gpa test cases.
Carl Worth [Tue, 2 Jul 2013 18:53:52 +0000 (11:53 -0700)]
test: Add remaining three egl-opengl tests
In a previous commit message, I had suggested we would be adding five
additional tests here. But unlike GLX, EGL provides only
eglGetProcAddress, (and no eglGetProcAddressARB), so two of the GLX
variants don't apply.
The two dlopen-based tests are currently failing when run under fips,
so once again the test suite has come through and found another bug.
Carl Worth [Tue, 2 Jul 2013 00:44:14 +0000 (17:44 -0700)]
test: Add support for EGL-based test, (and one EGL-based test)
For this, common.c now supports a new macro COMMON_USE_EGL which can
optionally be defined by the test before including common.c.
Some aspects of the common.c interface have changed slightly, (the
create_context call is now either create_glx_context or
create_egl_context, and the caller must explicitly call the new
common_make_current call).
This commit adds a single egl-based test, (egl-opengl-link-call),
which is similar to the existing gl-link-call test. This is basically
to ensure that the new code in common.c is functional.
We plan to follow up with egl-opengl variants for the remaining 5
existing gl tests, (and then egl-glesv2 variants for all 6 as well).
Carl Worth [Mon, 1 Jul 2013 18:22:17 +0000 (11:22 -0700)]
util-x11: Rework init_window interface to accept XVisualInfo
This is a much more correct way of doing things. Previously, we would select
a visual when creating the OpenGL context, but then use a default visual when
creating a window. This was fragile and would fail if the default visual was
not identical to what we had created.
Now, instead, we pass the selected XVisualInfo to our init_window interface
and call XCreateWindow instead of XCreateSimpleWindow. This guarantees that
the visuals match as required.
Carl Worth [Mon, 1 Jul 2013 17:43:50 +0000 (10:43 -0700)]
test: Rename util.c and util.h to util-x11.c and util-x11.h
These utility functions are all specific to the libX11 interface already,
and since we're planning to add some other utility functions soon, (such
as EGL), it will help to not have a too-generic name already used.
While doing this, also split up the interfaces for Display and Window
creation. This will allow us to create the GL context in between to
guarantee that the Window is created with the same visual as the GL
context.
Carl Worth [Thu, 27 Jun 2013 22:20:45 +0000 (15:20 -0700)]
test: Reduce code duplication in test-suite programs
All of the test suite programs previously had their own copies of
common drawing code. Now, this code is put into a shared
handle-events.c. Each test program includes handle-events and can
provide its own prefix for called OpenGL functions by first defining
HANDLE_EVENTS_GL_PREFIX.
Carl Worth [Thu, 27 Jun 2013 20:26:59 +0000 (13:26 -0700)]
fips: Add the beginning of a test suite
So far, there's just one test program. It links against libGL.so and
uses GLX to render a few solid frames. The test suite ensures that it
can be run and that "fips --verbose" actually prints a message.
Carl Worth [Thu, 27 Jun 2013 20:23:55 +0000 (13:23 -0700)]
fips: Add a -v/--verbose flag.
The only real purpose imagined for this for now is to be able to
verify that fips is actually doing something, (for example, if a
program renders less than 60 frames and exits then previous fips would
exit silently).
The --verbose flag will be useful with the upcoming test suite and its
short-lived programs.
Carl Worth [Thu, 27 Jun 2013 03:46:40 +0000 (20:46 -0700)]
configure: Set GL_LDFLAGS and EGL_LDFLAGS in configure script
The libfips library doesn't link directly to libGL nor libEGL so
didn't need these flags. But we're adding test programs that do link
to these, so the test's Makefile needs access to these flags.
Carl Worth [Thu, 27 Jun 2013 01:18:52 +0000 (18:18 -0700)]
Push final collection of CFLAGS/LDFLAGS from Makefile.config to Makefile.local
This makes the final decision more explicit closer to where the flags
are actually being used. This will be helpful as we add other
programs, which can now easily mimic the style of flags collection as
is done for fips.
This also eliminates any potential confusion of FIPS_FLAGS
vs. FINAL_FIPS_FLAGS, etc. The use of "FINAL_" has now been entirely
eliminated.
Carl Worth [Mon, 24 Jun 2013 22:49:44 +0000 (15:49 -0700)]
eglwrap: Add comment describing why we don't lookup into libGLESv2.so
A user recently asked me why we didn't perform lookups in
libGLESv2.so, (instead of just libEGL.so). I actually made the mistake
of writing code to do that before I realized the answer.
Adding the answer in a comment here should help me avoid making that
mistake again.
Carl Worth [Mon, 24 Jun 2013 22:44:47 +0000 (15:44 -0700)]
EGL: Add wrapper for eglGetProcAddress
If an EGL-using program uses eglGetProcAddress to locate functions, we
want to intercept that to return our own versions of the functions,
(to add out metrics timings, etc.).
If the requested function is not implemented in our library, just
defer to the real, underlying eglGetProcAddress function to find the
symbol.
Carl Worth [Mon, 24 Jun 2013 22:27:50 +0000 (15:27 -0700)]
configure: Fix configure check to look for egl.h in the correct directory
This configure check was broken by looking for GL/egl.h instead of
EGL/egl.h as it should. This failure was masked on any system with an
EGL implementation providing a pkg-config file (egl.pc).
Carl Worth [Mon, 24 Jun 2013 20:22:59 +0000 (13:22 -0700)]
fips-dispatch: Completely separate fips-dispatch GL prototypes from GL/gl.h
Move the OpenGL prototypes previously in fips-dispatch.h to a new
fips-dispatch-gl.h. The idea here is that any given file should
include only one of GL/gl.h or fips-dispatch-gl.h.
Files that implement libfips wrappers for OpenGL functions include
GL/gl.h to ensure that they implement functions with the correct
prototypes.
Meanwhile, files that call into OpenGL functions at run-time, (such as
metrics.c which calls the various OpenGL query-related functions),
instead include fips-dispatch-gl.h and do not include GL/gl.h. With
this approach, any new calls to OpenGL functions will cause
compilation warnings if the stubs are not also added to
fips-dispatch-gl.h.
Carl Worth [Sat, 22 Jun 2013 00:10:03 +0000 (17:10 -0700)]
Add dynamic dispatch for any calls to OpenGL functions.
Previously, fips code was making direct calls to OpenGL functions,
(such as glGenQueries, glBeinQuery, etc. within metrics.c). Some
OpenGL implementations do not export these symbols directly,
(expecting the application to instead look the symbols up via a call
to glXGetProcAddressARB or eglGetProcAddress).
The new fips-dispatch code added here does precisely that, (and adds
wrapper for both glXMakeCurrent and eglMakeCurrent in order to know
which GetProcAddress function should be called).
The dispatch code follows the model of piglit-dispatch, (available
under the same license as fips). Thanks to Eric Anholt for suggesting
following the same approach as piglit.
Carl Worth [Mon, 24 Jun 2013 20:19:41 +0000 (13:19 -0700)]
configure: Fix to have compiler warnings enabled while building libfips
In commit e42d9f224a4ef2784f8fd43f9f4f5c593a7ddd57 , when the flags
were split between fips and libfips, the warnings flags were
mistakenly applied to both CFLAGS and LDFLAGS of fips. (What was
actually intended was to have the warnings applied to the CFLAGS of
both fips and libfips).
Carl Worth [Fri, 14 Jun 2013 18:29:33 +0000 (11:29 -0700)]
TODO: Add some additional items suggested by Eero.
Again, simply trying to ensure that good ideas that come in via email
don't get dropped on the floor.
Report shader compilation time.
+Report elapsed time per frame.
+
+Add options to control which metrics should be collected.
+
Add Eric's tiny hash table for collecting per-shader statistics