Last week, we talked about how you can use the Core Animation instrument to find sources of stalls in your app's feed. That's all fine and good, but what if that instrument is giving your feed a clean bill of health, but you're still seeing stutters? If that's the case then you might have missed something when you were using Time Profiler and you might want to jump back in.
One big source of main thread work that can be easy to miss is JPEG decoding.
But I don't remember decoding any JPEGs...
The fact of the matter is, if your app shows any images downloaded from the web, they're probably JPEGs and you're already decoding and rendering them, even if you don't realize it.
The easiest way to decode and render a JPEG is:
That's right, any time you have a UIImage created from a JPEG, assigning it to be displayed by an image view means that an implicit CATransaction will be created and your image will be copied, decompressed, and rendered on the next display pass triggered by the run loop.
This all happens on the main thread. If your app isn't very image heavy, you can probably get away with this approach, but if you're dealing with a lot of images, especially in a feed, then you may find this decoding adding up to a significant amount of your app's main thread work.
If you don't believe me, here's the stack trace to prove it.
This stack trace is from a pretty straightforward sample app that renders some images and text in a feed.
The panel on the right shows the heaviest stack trace. If you look at the line I've hightlighted, you'll see that there's a function called applejpeg_decode_image_all that's being called. If you follow the stack trace up you'll also notice some copy functions.
The tricky part about this being a big bottleneck in your app is that literally none of the code in the stack is code you wrote directly. If you turn off "System Libraries" in the Call Tree options, this code won't show up at all.
Even if you aren't hiding system libraries, it can be easy to accidentally ignore Apple code in traces since, let's be honest, it can be a little intimidating to look at and it's easy to subconsciously try to find code we're already familiar with.
To counter this tendency, let's take a quick detour and go over what's happening in this stack trace at a high level.
Core Animation Overview
We've talked about how the UI is rendered via the Render Server before, but let's go into a little more detail.
Anything that happens to a CALayer (or UIView by extension) goes through a pipeline of 5 stages to get from the code you've written to showing something onscreen.
- Under the hood, a CATransaction will be created with your layer hierarchy and committed to the Render Server via IPC.
- On the Render Server, the CATransaction will be decoded and the layer hierarchy re-created.
- Next, the Render Server will issue draw calls with Metal or OpenGL depending on the phone.
- Then, the GPU will do the actual rendering once the necessary resources are available.
- If the render work is done before the next vsync, the rendered buffer will be swapped into the frame buffer and shown to the user.
An important thing to take note of is the fact that number 1, aka the CATransaction phase, is the only phase that runs in-process. Everything else happens on the render server!
This means it's the only phase that you can see in your stack traces as well as the only one you can directly influence.
CATransaction Overview
The transaction phase itself is broken down further into 4 stages (I know, too many stages to remember right?). The first two should be pretty familiar to you already.
- Layout: Do all the calculations necessary to figure out layer frames. -layoutSubviews overrides are called at this point.
- Display: Do any necessary custom drawing via CoreGraphics. -drawRect: overrides are called and, interestingly, all string drawing happens here aka UILabels and UITextViews.
- Prepare: Additional Core Animation work happens here including image decoding when necessary.
- Commit: Finally, the layer tree is packaged up and sent to the render server. This happens recursively and can be expensive for complex hierarchies.
Now that we have the high level overview of what Core Animation will do, let's take a look at that stack trace again.
Starting from the top, we see that main() kicks off the run loop by calling [UIApplication run]. Then, in the "do observers" phase of the run loop run we have a CATransaction being committed with the CA::Transaction::commit() function. I've highlighted the section that corresponds to this commit.
If you scan through, you'll see a function called prepare_commit() which we can reasonably assume is that "prepare" phase I mentioned a moment ago. Scan a little more and you'll see that this preparation consists of copying the image into a CGImageProvider and then decoding it.
If you take nothing else away from this, just remember that there's a good amount of work that is happening under the hood when you use a UIKit component and there ain't nothin in this world for free.
Removing This Bottleneck
So if you have found this to be a problem in your app, how the heck can you fix it? After all, you do have to decode jpegs to be able to show them, so it's not like you can just stop doing that.
Well you can't make UIImageView do its decoding more quickly, but if you're so inclined, you are able to preemptively render the jpeg yourself!
Using this function, you can pass in a regular UIImage backed by a JPEG and it will return a UIImage that's backed by an already decompressed and rendered version.
Couple of things to note here:
First, there is, unfortunately, no API that's been exposed for you to tell whether a given UIImage has already been decoded or not. This means that if you're using this trick, you'll want to make sure to carefully cache and re-use these images when you can since they take up a lot more memory than their compressed counterparts.
Second, this particular method is technically "offscreen rendering" in that it's no longer hardware accelerated, but it's the ok-when-it's-useful kind of offscreen rendering, not the kind that makes your GPU stall.
This is where the idea of "perceived performance" otherwise known as "responsiveness" comes into play. This method of decoding can actually be slower than what the image view will do by default, but the win is that its under your control and you can jump to a background thread to do the decoding while keeping the main thread free from blockers.
Now you may see an empty imageView for a moment while scrolling, but you've traded a moment of "placeholder" state for not dropping frames if the decoding takes a while.
Conclusion
Hopefully this medium-depth dive into what's happening under the surface of the humble UIImageView has given you a better appreciation for what UIKit does for you and has also made you a little more comfortable with looking at an intimidating Time Profiler trace.
As far as other resources for looking at JPEG decoding go, this is a topic that's been explored many times in the iOS community.
For instance Texture (previously AsyncDisplayKit) is basically a responsiveness engine that does everything we talked about (measurement, layout, decoding, and drawing) off of main for you.
Fast Image Cache has a really interesting method of getting around this problem where it decompresses JPEGs up front and stores them on-disk in their decompressed form so that the only thing you need to do is retrieve them from disk when they're needed instead of doing all the decompression work on the fly.
Have you decided to use a method like this in your app? If so, or if you have any questions, let me know in those comments!
References: