glBufferSubData vs glBufferData

Author: thothonegan
Tags: opengl macosx graphics

Or why is completely wrong.

So quick summary of the article: Basically FPS was low on OSX because of the glSubBufferData call, which is uploading uniforms to the shaders. He then reads the spec, it says ‘it can block waiting on the GPU’, and concludes that OSX’s driver is the problem.

This is complete bullshit.

So a quick background. The GPU is like a second cpu thats running beside the normal CPU, and you want both running as often as possible. If one is waiting on the other, it stalls. Either GPU needs more commands/data from the CPU, or the CPU is waiting for the GPU to finish.

Now to render something there is two major things you need : state (such as what shader to use, what buffers to use) and data (uniforms, textures, etc). In this case, we’re uploading uniforms (such as position, rotation, etc) to the GPU. So his code is roughly:






To upload uniforms, you basically tell GL ‘upload this data I’ve got here to the GPU’. However we want the GPU to be in as parallel as possible, right? So the upload starts asynchronously (or even delayed) so that it can finish whatever commands it currently has.

Here lies his problem.

glBufferSubData () uploads data to a given buffer, overwriting whatever part of the buffer you mention. This BLOCKS the GPU from doing any work using that buffer. Say you’re drawing using position (0, 0), you dont want it to shift to (1, 1) when you’re halfway through drawing the scene. So if the GPU is in the middle of the previous draw elements call and you try to change the data underneath it, glBufferSubData() is required to stall and wait for the GPU to finish using the buffer, since you dont want to change the data mid-draw.

For comparison, glBufferData() instead replaces the buffer with the given data. While on the GL side it looks like the same buffer, it can be run in parallel since internally a new buffer is created (its similar to register renaming in compilers). So the GPU in the middle of the previous call can just queue up the buffer copy to a different name, and tell later calls to use the new buffer and your program runs fast as always.

So why is his code working fast under windows? “Clever” drivers.

Under Windows, drivers have to deal with terrible developers. (example: )

For example, a lot of developers use glBufferSubData when they should be using glBufferData. So most of the drivers do extra work to see if your replacing the entire buffer with glBufferSubData. If you do, they silently jump to glBufferData which does buffer renaming as mentioned. If he tried the glMap code under windows for example, its unlikely it could have done this optimization (since with glMap/glUnmap, it cant tell what parts you’ve changed or not).

Now you could argue that OSX could be doing this same workaround, or that GL shouldn’t make the choice between the two confusing. But in my mind, this is not a flaw in the driver : you’re picking a function that purposely tries to reuse a buffer, then complaining that it stalled the driver because you’re already using the buffer: exactly what its designed to do.


~ thothonegan

[edit : Few more technical things]

The docs are fun for SubData. Basically there is two conflicting parts, since its a tradeof. (see: )

‘When replacing the entire data store, consider using glBufferSubData rather than completely recreating the data store with glBufferData. This avoids the cost of reallocating the data store.’

True, but its a tradeof like everything. If you want to save memory, glBufferSubData is the way to go (as long as you’re not using it again so it can stall). So big objects used once, glBufferSubData is good.

‘Consider using multiple buffer objects to avoid stalling the rendering pipeline during data store updates. If any rendering in the pipeline makes reference to data in the buffer object being updated by glBufferSubData, especially from the specific region being updated, that rendering must drain from the pipeline before the data store can be updated.’

This is basically whats happening here : if you reuse the buffer, you stall. Uniforms are generally small, so either dont reuse your buffers or use a function that can work around you trying to ‘reuse’ your buffers.