30.3. Optimizing Rendering

The main goal of performance tuning is to make the application look and feel faster. However, just because the goal is to make the application render faster, don’t assume that rendering is the bottleneck.

Determining Whether Rendering Is the Problem

To find out whether rendering is the problem, modify your application so that it does everything it normally does except render, and then measure its performance. An easy way of getting your application to do everything but rendering is to insert an SoSwitch⁽^C++^|^Java^|^.NET⁾node with its whichChild field set to SO_SWITCH_NONE (the default) above your scene. So, for example, modify your application’s code from:

C++

myViewer->setSceneGraph(root);

.NET

myViewer.SetSceneGraph(root);

Java

myViewer.setSceneGraph(root);

to:

C++

SoSwitch *renderOff = new SoSwitch;
renderOff->ref();
renderOff->addChild(root);
myViewer->setSceneGraph(renderOff);

.NET

SoSwitch renderOff = new SoSwitch();
renderOff.AddChild(root);
myViewer.SetSceneGraph(renderOff);

Java

SoSwitch renderOff = new SoSwitch();
renderOff.addChild(root);
myViewer.setSceneGraph(renderOff);

This experiment gives an upper limit on how much you can improve your application’s performance by increasing rendering performance. If your application doesn’t run much faster after this change, then rendering is not your bottleneck. See Section 30.4, “ Optimizing Everything Else” for information on optimizing the rest of your application.

Isolating Rendering

If you have determined that your application is spending a significant amount of time rendering the scene, the next step is to isolate rendering from the rest of the things your application does. This makes it easier to find out where the bottleneck in rendering occurs. The easiest way to isolate rendering is to write your scene to a file and then use the ivperf program to perform a series of rendering experiments. The code for writing your scene may look like the following:

C++

SoOutput out;
if (!out.openFile("myScene.iv")) { ... error ... };
SoWriteAction wa(&out);
wa.apply(root);

.NET

SoOutput @out = new SoOutput();
if (!@out.OpenFile("myScene.iv")) { ... error ... };
SoWriteAction wa = new SoWriteAction(@out);
wa.Apply(root);

Java

SoOutput out = new SoOutput();
if (!out.openFile("myScene.iv")) { ... error ... };
SoWriteAction wa = new SoWriteAction(out);
wa.apply(root);

Using the ivperf Utility to Analyze Rendering Performance

The ivperf utility reads in a scene graph and analyzes its rendering performance. It estimates the time spent in each stage of the rendering process while rendering the scene graph.

The process of rendering a single frame can be decomposed into five main stages:

Clearing the graphics window
Traversing the Open Inventor scene graph
Changing the graphics state (including materials, transformations, and textures)
Transforming vertices in the graphics pipeline
Filling polygons

The sum of the times spent in these stages does not, in general, equal the total time it takes to render the scene. Depending on the underlying hardware platform and graphics pipeline, some or all of the above can overlap with each other. Thus, completely eliminating one of the stages does not necessarily speed up the application by the time taken by that stage. ivperf takes this into account; it answers questions of the type “if I could completely eliminate xxx from my scene, how much faster would rendering be?” For example, if ivperf indicates that 50% of your time is spent changing the material graphics state, then making your entire scene a single material would make it render twice as fast. Knowing that materials are taking up a significant part of your rendering time, you can then concentrate on minimizing the number of material changes made by your scene.

If you have created your own node classes, call their initClass() methods just after the call to SoInteraction::init()in the ivperf source and link their .o files into ivperf.

The camera control used by ivperf is simplistic: it calls viewAll() for the scene and just spins the scene around in front of the camera when benchmarking. If you have a sophisticated walk-through or fly-through application that uses level of detail and/or render culling, modify ivperf so that its camera motion is more appropriate for your application. For example, have ivperf use the following little scene instead of just SoPerspectiveCamera⁽^C++^|^Java^|^.NET⁾ :

TransformSeparator
  {
  Rotor { rotation 0 1 0 .1 speed .1 }
  Translation { translation 100 0 0 }
  PerspectiveCamera { nearDistance .1 farDistance 600 }
  }

ivperf correctly reports the performance of changing scenes, as long as you give it enough information. It automatically deals with scenes containing engines and animation nodes, but if you are using an SoSensor⁽^C++^|^Java^|^.NET⁾to modify the scene, you should mark nodes that your application frequently changes by giving them the special name “NoCache”. For example, if your application is frequently changing a transformation in the scene, the transformation should appear in the file given to ivperf as:

DEF NoCache Transform { }

To measure how much time transformations might be taking, ivperf temporarily removes all transformations from your scene and then measures how much faster it runs. Beware! This sometimes gives unreliable results; for example, if all your objects become very large or very small without the transformations, then more (or less) time may be spent filling in pixels. If your scene uses render culling, removing the transformations makes more (or fewer) of the objects culled, distorting the results reported by ivperf.
Use SoRotation⁽^C++^|^Java^|^.NET⁾ ,SoRotationXYZ⁽^C++^|^Java^|^.NET⁾ ,SoScale⁽^C++^|^Java^|^.NET⁾ , or SoTranslation⁽^C++^|^Java^|^.NET⁾nodes instead of the general SoTransform⁽^C++^|^Java^|^.NET⁾node. However, don’t bother doing this if you have to replace the SoTransform⁽^C++^|^Java^|^.NET⁾ node with more than one of the simpler nodes to get the same transformation.

Performance Tip for Face Sets

For best performance when creating SoFaceSet⁽^C++^|^Java^|^.NET⁾and SoIndexedFaceSet⁽^C++^|^Java^|^.NET⁾shapes, arrange all the triangles first, then quads, and then other faces.

Optimizing Textures

If your scene contains textures, ivperf reports two numbers: the time you would save if you could turn off textures completely, and the time you would save if you could make your scene use only one texture. On systems with texturing hardware, the number of textures used can dramatically affect performance; see the section called “ Optimizing Texture Management” for hints on optimizing texture management. On systems without texture mapping hardware, the bottleneck is probably filling in the textured polygons.

Open Inventor automatically does two things to speed up rendering on systems without texture mapping hardware:

Open Inventor’s viewers display the scene untextured during interaction by default.
Open Inventor uses lower-quality filters for minifying or magnifying textures.

Optimizing Texture Management

If ivperf reports a lot of time is spent in texture management, then you are running out of hardware texture memory. Try the following:

Use smaller textures. Scale down the images you are using; inadvertently using one big image can easily fill up texture memory on many systems.
Make textures a power of 2 wide and high. Textures of those dimensions (for example 128 x 64 instead of 129 x 70) make startup faster.
Reuse nodes. Open Inventor allows you to modify a texture once it has been read into your application (using the image field of SoTexture2⁽^C++^|^Java^|^.NET⁾ ), and to change the search path for textures (using methods on SoInput⁽^C++^|^Java^|^.NET⁾ ). It therefore does not use the same texture memory for two different SoTexture2⁽^C++^|^Java^|^.NET⁾ nodes with the same filename field. Be sure to reuse the same SoTexture2⁽^C++^|^Java^|^.NET⁾node instead of creating another node with the same filename.
For example, this scene is inefficient:
```
Separator { 
  Texture2 { filename foo.rgb }
  Cube { }
}

Sphere { }
Separator {
   Texture2 { filename foo.rgb }
  Text3 { string "Hello" }
}
```
This scene uses texture-memory efficiently:
```
Separator
  {
  DEF foo Texture Texture2 { filename foo.rgb }
  Cube { }
  }

Sphere { }
Separator {
  USE foo
    Text3 { string "Hello" }
  }
```
Use SoLOD⁽^C++^|^Java^|^.NET⁾nodes to create simpler versions of your objects that are not textured or use smaller texture images when the objects are far away.
Use render culling so the textures for textured objects outside the view volume are not used. For example, imagine a scene that contains 100 textured objects (each with a unique texture), but only 10 of them are in the view volume at any given time. When the scene is rendered, only 10 of the textures need to be in texture memory at any given time, resulting in much better texture management performance.

Using Lights Efficiently

If the scene given to ivperf contains light sources, ivperf informs you how expensive they are compared to rendering your scene with just a single directional light. If ivperf reports that lights are a significant performance bottleneck, try to use fewer light sources, and use simpler lights (a DirectionalLight is simpler than a PointLight, which is simpler than a SpotLight). If possible, put lights inside separators so that they affect only part of the scene, increasing performance for the rest of the scene.

Optimizing Vertex Transformations

If ivperf reports that vertex transformations (which include per-vertex lighting calculations) take up a significant portion of the time it takes to render a frame, you can do the following to optimize per-vertex operations:

Use fewer vertices in your objects. Use SoComplexity⁽^C++^|^Java^|^.NET⁾to turn down complexity for Open Inventor’s primitive objects. If you are using a system with hardware-accelerated texturing, texturing can be used to add visual complexity with very few vertices.
Create less detailed versions of your objects and use SoLOD⁽^C++^|^Java^|^.NET⁾nodes so that fewer vertices are drawn when objects are small. Use an empty SoInfo⁽^C++^|^Java^|^.NET⁾node as the lowest level of detail so that objects disappear when they get very small. A good rule of thumb for choosing levels of detail is that the switch between levels of detail should be fairly obvious if you are concentrating on the object; for most applications, the user concentrates on objects in the foreground and does not notice background objects “popping” between levels of detail. Beware that SoLOD⁽^C++^|^Java^|^.NET⁾ nodes cause smaller caches to be built, which may slow down traversal. See the section called “ Correcting Level of Detail bottlenecks” for more information on efficient use of level of detail.
Make your vertices simpler. Try to use OVERALL rather than PER_VERTEX material binding. Turn off fog. Note that these suggestions are system-specific; on systems with a lot of hardware for accelerated rendering, fogged vertices may be no slower than plain vertices. Be sure to do a quick ivperf test before spending time modifying your application.
Make sure you are not turning on two-sided lighting unnecessarily; avoid SoShapeHints⁽^C++^|^Java^|^.NET⁾nodes that:set vertexOrdering fields to COUNTERCLOCKWISE or CLOCKWISE andset s hapeType fields to UNKNOWN_SHAPE_TYPE
If parts of your scene do not require lighting, use an SoLightModel⁽^C++^|^Java^|^.NET⁾node set to model BASE_COLOR to turn off lighting for those parts of the scene. However, be aware that turning lighting on and off can itself become a bottleneck if done too often.
If you are using SoFaceSet⁽^C++^|^Java^|^.NET⁾or SoIndexedFaceSet⁽^C++^|^Java^|^.NET⁾, try using ivfix to convert them into SoIndexedTriangleStripSet⁽^C++^|^Java^|^.NET⁾ , which draws more triangles with fewer vertices. Note that ivfix cannot create a mesh if your objects have sharp facets or PER_FACE material or normal bindings.
Watch out for expensive primitives with lots of vertices, like SoText3⁽^C++^|^Java^|^.NET⁾and SoSphere⁽^C++^|^Java^|^.NET⁾ .ivperf reports the number of triangles in your scene; make sure the number is reasonable for your desired performance.
Organize your scene graph so that objects that are close to each other spatially are under the same SoSeparator⁽^C++^|^Java^|^.NET⁾, and turn on render culling so that Open Inventor won’t send those objects’ vertices when the objects are not in view. See the section called “Culling Part of the Scene” inThe Inventor Mentor, Chapter 9, for more information on render culling.

See the section called “ Making Open Inventor produce efficient OpenGL” for hints on making Open Inventor produces more efficient OpenGL calls.

Optimizing Pixel Fill Operations

A common bottleneck on low-end systems is drawing the pixels in filled polygons. This is especially common for applications that have just a few large polygons, as opposed to applications that have lots of little polygons.

If ivperf reports that a large percentage of each frame is spent filling in pixels, try to optimize your scene as follows:

Render your scene, or parts of your scene, in wireframe or as points when possible. Viewers have “move wireframe” and “move points” modes built in for exactly this case.
Some systems can fill flat-shaded polygons faster than Gouraud-shaded polygons. Triangle strips and quad meshes set shademodel (FLAT) if they have PER_FACE normals and don’t have PER_VERTEX materials (and vice versa).
SCREEN_DOOR transparency (the default) is faster than blended transparency on some systems (it is slower on other systems). Use the setTransparencyType() method on either SoXtRenderArea⁽^C++⁾or SoGLRenderAction⁽^C++^|^Java^|^.NET⁾to change the transparency type.

Correcting Problems ivperf Does Not Measure

There are several performance problems that ivperf doesn’t catch. The following sections describe them, and give hints on how to improve them.

Making Open Inventor produce efficient OpenGL

If your application is rendering only 10 frames per second with 1,000 triangles per frame, and you know that your graphics hardware is capable of rendering 100,000 triangles per second (10,000 triangles per frame at 10 frames/second), and ivperf reports that your bottleneck is vertex transformations, then your problem might be that Open Inventor is not making efficient OpenGL calls.

Open Inventor is much more efficient at rendering multiple triangles if they are all part of one node. For example, you can create a multifaceted polygonal shape using a number of different coordinate and face set nodes. However, a much better technique is to put all the coordinates for the polygonal shape into one SoCoordinateor SoVertexProperty⁽^C++^|^Java^|^.NET⁾node, and the description of all the face sets into a second SoFaceSet⁽^C++^|^Java^|^.NET⁾node.


	The ivfix utility program collapses multiple shapes into single triangle strip sets. Using fewer nodes to get the same picture reduces traversal overhead for scenes that cannot be cached. Note also that Open Inventor optimizes on a node by node basis and generally can’t optimize across nodes.

An SoFaceSet⁽^C++^|^Java^|^.NET⁾or SoIndexedFaceSet⁽^C++^|^Java^|^.NET⁾has special code for drawing 3- and 4-vertex polygons. To take advantage of that, you must arrange the polygons so that the 3-vertex polygons (if any) are first in the coordIndexarray, followed by the 4-vertex polygons, followed by the polygons with more than 4 vertices.

For some applications, consider implementing your own nodes that implement the functionality of a subgraph of your scene. For example, a molecular modeling application might implement a BallAndSticknode with fields specifying the atoms and bonds in a molecule, instead of using the more general SoSphere⁽^C++^|^Java^|^.NET⁾ ,SoCylinder⁽^C++^|^Java^|^.NET⁾ ,SoMaterial⁽^C++^|^Java^|^.NET⁾ ,SoTransform⁽^C++^|^Java^|^.NET⁾ , and SoGroup⁽^C++^|^Java^|^.NET⁾nodes. If the molecular modeling application changes the molecule frequently so Open Inventor cannot cache the scene, using a specialized node could make traversal orders of magnitude faster (for example, a simple water molecule scene graph with three atoms and two bonds might consist of 20 nodes; replacing this with a single BallAndStick node would make traversal 20 times faster). The BallAndStick node could also perform application-specific optimizations not done by Open Inventor, such as not drawing bonds between spheres whose radii were large enough that they intersected, sorting the spheres and cylinders by color, and so on. See The Open Inventor Toolmaker for complete information on implementing your own nodes.

Correcting culling bottlenecks

If your application uses render culling, it may be spending most of its time deciding whether or not objects should be culled. ivperf lumps this in with bad caching behavior. To find out whether this is the case, look for a lot of CPU time being spent in the SoSeparator::cullTest() or SoBoundingBoxAction::apply()routines.

If a large percentage of the rendering time is spent doing cull tests, try to reorganize your scene so that more triangles are culled for each culling SoSeparator⁽^C++^|^Java^|^.NET⁾. For example, if you have a city scene with thousands of buildings, it may be better to perform one cull test for each city block rather than the thousands of cull tests needed to decide whether or not each individual building is visible. Doing this also allows Open Inventor to build larger render caches, which may increase traversal speed.

Correcting Level of Detail bottlenecks

If your application uses SoLOD⁽^C++^|^Java^|^.NET⁾nodes, it might be spending a significant amount of time deciding which level of detail should be drawn. One way of testing to see if this is the case is to temporarily replace all of the SoLOD⁽^C++^|^Java^|^.NET⁾ nodes in your scene with SoSwitch⁽^C++^|^Java^|^.NET⁾nodes set to traverse the highest level of detail. Then run ivperf again and compare the results. If the SoSwitch⁽^C++^|^Java^|^.NET⁾ node scene is much faster, try doing the following:

Try to group objects so that one level of detail test determines the level of detail for several objects. For example, if you have a group of 10 buildings that are near each other, use one level of detail node instead of 10 level of detail nodes. Doing this also makes it easier for Open Inventor to build larger render caches, which may increase performance by increasing traversal speed.
Make sure you use the SoLOD⁽^C++^|^Java^|^.NET⁾ node introduced in Open Inventor 2.1 instead of the SoLevelOfDetail⁽^C++^|^Java^|^.NET⁾node. The SoLOD⁽^C++^|^Java^|^.NET⁾ node is more efficient because it uses the distance to a point as the switching criterion. See the reference page for more detail.

Making your application feel faster

Sometimes it is worthwhile to sacrifice features temporarily to make your application seem faster to the user. Open Inventor has several features that make this easier:

Use the SoGLRenderAction::setAbortCallback()method to interrupt rendering before the entire scene has been drawn. For this to be most effective, you must organize your scene so that the most important objects are drawn first, and you should abort only when it is important that rendering happen quickly, even if the rendering is not complete, such as when the user is interactively manipulating the scene.
Use one of the “Move...” draw styles if you are using a viewer, so that a simpler version of the scene is drawn when the user is interacting with the viewer.

Use the start and finish callbacks of manipulators and components to temporarily modify the scene to make it simpler while the user is interacting with it.

Interactive Field Settings

When manipulating a complex scene, it is often useful to temporarily change or decrease the value of some parameters in order to maintain interactive performance. The SoInteractiveComplexity⁽^C++^|^Java^|^.NET⁾ node allows the application to define different parameter values for certain fields of other nodes, depending on whether a user interaction, for example moving the camera, is occurring. This means that while the camera is moving these fields will use a specified "interaction" parameter value, but when interactive manipulation is stopped these fields will automatically change to a specified "still" parameter value. Optionally, for scalar fields, the transition from interaction value to still value can be automatically animated using a specified increment. This is a powerful technique for maintaining an interactive frame rate when interacting with GPU intensive datasets or rendering effects, while still getting a final image with very high quality and also giving the user a "progressive refinement" effect while transitioning from interaction back to "still".

The values specified in SoInteractiveComplexity⁽^C++^|^Java^|^.NET⁾ override the values in the fields during rendering, but calling getValue() on the fields still returns the value set directly into the field (or the default value if none was set). These settings are applied to all instances of the node containing the field and are declared with a specially formatted string set in the fieldSettings field. For scalar fields like SoSFInt32⁽^C++^|^Java^|^.NET⁾, the string looks like this:

"ClassName FieldName InteractionValue StillValue [IncrementPerSecond]"

If IncrementPerSecond is omitted, then StillValue is applied as soon as interaction stops. Else the transition from InteractionValue to StillValue is automatically animated. Because incrementing is actually done at each redraw, and redraw happens many times per second, IncrementPerSecond is allowed to be greater than StillValue. In the following code, the field named numSlices belonging to the class SoVolumeRender will be set to 500 during an interaction. When the interaction stops, numSlices will be increased by 2000 every second until its value reachs 1000. Effectively this means that the StillValue (1000) will be reached in (1000-500)/2000 = 0.25 seconds.

SoInteractiveComplexity* interactiveComplexity = new SoInteractiveComplexity;
interactiveComplexity->fieldSettings.set1Value(
0, "SoVolumeRender numSlices 500 1000 2000" );
root->addChild( interactiveComplexity );
root->addChild( volumeRender );

A time delay before changing the value, or starting the animation, can be set using the refinementDelay field.

Note that only a limited number of fields are supported by this node. See the reference manual for the current list.

SoComplexity::value
SoComplexity::textureQuality
SoShadowGroup::quality
SoShadowGroup::isActive,
SoVolumeRender::numSlices
SoVolumeRender::lowScreenResolutionScale
SoVolumeRenderingQuality::gradientQuality
SoVolumeRenderingQuality::lighting
SoVolumeRenderingQuality::edgeDetect2D
SoVolumeRenderingQuality::boundaryOpacity
SoVolumeRenderingQuality::edgeColoring
SoVolumeSkin::largeSliceSupport
SoOrthoSlice::largeSliceSupport

Prev	Up	Next
30.2. Benchmarking Tips	Home	30.4. Optimizing Everything Else


	You can disable notification on these nodes using the SoNode::enableNotify()method to keep changes to them from destroying caches.


	In Open Inventor, single-line, left-justified SoText2⁽^C++^\|^Java^\|^.NET⁾ nodes do not break render caches.


	When the ivfix utility rearranges scene graphs, it groups objects by material.