Tuesday, March 27, 2007

Recent optimizations

It's been interesting lately - I've been delving into OpenGL more, in an effort to optimize iShowU. Read more here http://forums.shinywhitebox.com/viewtopic.php?t=438.

It all started when I wanted to create some videos showing the best settings for recording games (World of Warcraft being my main "waste of time" :-). I thought to myself ... "I wonder just how inefficient I'm being here". I set to work with the OpenGLProfiler, and found that of all iShowUs' time, 22-45% was spent in OpenGL (30-45% if capturing from two screens at the same time) while recording World of Warcraft.

I ended up pulling lots of capture code apart, and in the end found the problem (many thanks to the OpenGL list hosted by Apple). BTW - if you ever see the apple code example that implies that a glTexImage2D buffer can be different from a glGetTexImage buffer (i.e: they point to two different memory locations) - it's not true. Well, not true in my case anyway.

The omtimizations, from what I've observed, have been excellent.

It's not going to change the world, but on PPC capturing WoW at 1056x900 went from 17fps to 24fps. Even better gains on Intel. Full screen capture on an Intel MBP (15") went from 18fps average, to 25fps with some CPU time to spare. So in short, iShowU is able to pull data from the OpenGL subsystem much more efficiently. The downside is that this can have a negative performance imapact! If more frames can be captured, then they need to be compressed and stored. This means that the realtime compressor is working harder, and the harddisk bandwidth can be maxed out earlier.

In some ways then, iShowU will now let you "get it into trouble" a little easier. You can bump up the frame rates with Apple Animation for example, and watch the automatic capture status pop into view (after a view seconds) telling you that the system is generating more frames than can be written to disk (note that this is easy on a laptop, because the disk is typically a bit slower than a desktop machine).

But overall it's all good. In all capture cases, performance is improved. That was the goal :-)