• Me
  • Tutorials
  • Blog
  • Little Bits
  • Poker

Luke Parham

iOS Developer + Other Stuff

  • Me
  • Tutorials
  • Blog
  • Little Bits
  • Poker

Synchronization Strategies on iOS: Mutex Performance

threads.gif

When it comes to making things "faster", an easy thing to reach for is concurrency.  GCD and NSOperations make dealing with threads a breeze and if your task lends itself to concurrency then it makes a lot of sense.

The thing to remember is, once you've introduced concurrency into a system, it's your responsibility to make sure your data stays in sync. Depending on the application, not doing so could result in hard to debug problems ranging from inconsistent UI to catastrophic data loss.

Since we want to avoid all of those things, it's best to go into multi-threading with synchronization in mind from the start.

That being said, it's also important to remember that ensuring correctness via these strategies will necessarily slow down your code. If you've decided to make a part of your app concurrent, make sure to test the performance to make sure the concurrency you've added actually makes things faster.

Strategies

Like anything, you have a couple options when it comes to keeping your code in sync. To get a feel for how usage looks, we'll look at each of these mechanisms in the context of building out a simple cache. The cache is just a wrapper around NSDictionary that makes it usable from different threads. 

We'll compare how each performs while inserting 10,000 objects into the dictionary. The scores are given in average nanoseconds per insert.

One thing to take note of before we jump in is the documentation for the dispatch_benchmark function you can use to do benchmarking.

  • Code bound by computational bandwidth may be inferred by proportional changes in performance as concurrency is increased.
  • Code bound by memory bandwidth may be inferred by negligible changes in performance as concurrency is increased.
  • Code bound by critical sections may be inferred by retrograde changes in performance as concurrency is increased.

Interestingly, filling this dictionary with 10,000 elements is actually faster if we just do it serially without any multi-threading. Therefore we can be relatively sure that changes in speed we see are related to the differences in these locking strategies. To be fair, there's not a lot of other code it could be due to, but something to keep in mind for when you're looking at more complex situations.

NSLock Objects

First up is the basic NSLock object. To use an NSLock, all you need to do is make sure to initialize it with your object, and then when you want to use it to lock on a critical section, you can use the -lock and -unlock methods.

As you can see, its usage is fairly straightforward. Here, we're calling -lock before accessing the dictionary and then -unlock after we're done.

Avg Insert: 7,944 ns

@synchronized Blocks

Next is the @synchronized structure available to you in Objective-C. This one's an oldie but a goodie. Instead of having to worry about creating extra objects for locking, you can use the object you want to guard as the object to lock on.

This is useful if you have a specific object you're trying to synchronize access to, but if you have a more ambiguous chunk of code you need to lock on, then you can just as easily synchronize on self instead.

Avg Insert: 7,936 ns

C++ Scoped Mutexes

This is the type of mutex most often found in Texture.  You do have to go through the trouble of implementing it yourself, but once you have, the syntax is actually really nice.

These mutexes work by locking when they're initialized and then unlocking when they go out of scope. Since they're initialized on the stack, this means when the method returns. This is nice because you don't have to worry about extra braces or matching -lock and -unlock calls. If you want to apply the lock to a smaller section, just wrap it, and the critical section, in curly braces to limit it to that scope.

Avg Insert: 7,739 ns

Internal Queues with GCD

Finally, we have the most lauded of options, the GCD serial queue. I've heard multiple times now that this is the *right* way to ensure your code is synchronized, but it took me a while to sit down and look at how to do it.

Turns out it's not too crazy.

Since all a lock is really doing is ensuring things happen in order instead of threads stepping on each others' toes, you can accomplish your aim with a custom dispatch queue. 

Once you have your queue ready, you just need to make sure any access to your protected data structure happens in the context of a dispatch to your serial queue.

Since the access does happen in a block, you'll need to make a __block pointer that can be assigned to, synchronously dispatch the dictionary access to the queue, and then return the result after it was run on the queue.

Avg Insert: 6523 ns

Here we see our fastest time so far. I was honestly skeptical of this but it does seem to be the case. I always just kind of assumed the extra work under the hood of creating a block struct with captured arguments and dispatching it to a separate work queue would cost more than these other options but I guess that's why it's good to actually take some measurements.

One extra neat thing is that writing objects can actually be done asynchronously since there's no reason to wait on it to finish from the writing thread. The dictionary access will still get put in line in the queue and is guaranteed to happen before any subsequent fetches.

Concurrent Readers/Single Writer

Let's say you've decided that you want your data structure to allow concurrent reads without locking. In our current situation, NSMutableDictionary is thread-safe for reads, but you still need to lock during writing to avoid reading corrupted data mid-write. This is a pretty common optimization and can make things go a lot faster if you have a lot more reads than writes.

The really cool thing about using queues for synchronization is how easy it is to tweak things a bit to get what we want. Instead of a serial queue, you can use a concurrent queue.

Then, just update your write code to use the dispatch_barrier class of functions. Dispatching to a queue with a barrier means that the dispatched block will wait for all in-progress blocks to complete and then block any subsequent blocks until it has finished.

This means that blocking will only occur around writes like we were hoping. All reading threads will get their answer back synchronously without needing to wait on each other due to locks!

Now, this can of course be accomplished in other ways, but it doesn't really come for free with any of the other strategies we've looked at.

Conclusion

As always, your use cases may differ and the findings here may not reflect reality in your app. If your code is sensitive enough you should always be taking measurements yourself to see what works best for your use-case.

One thing I am a little curious about is if the performance of dispatch queues suffers as the blocks dispatched to them start capturing more and more objects.

If you want to pull down and play around with the code, here's a small test project on github. If you're still a little fuzzy on why we need any of this, a kind of fun thing to try is to run the tests but disable all of these mutexes. You're pretty much guaranteed to get a cryptic crash involving deallocing data before it was allocated. That's because, without locking, it's easy to get in a situation where an object is used in two threads that both reduce its retain count to 0 and both try to dealloc it. 

The nice thing about doing tests like this is that it will show you just how resilient your code is to being used in a multi-threaded environment.

Have you played around with any strategies that blow these out of the water? Just wanna tell me my measurements are st00pid? Go ahead and let me know down in the comments. 

Blog RSS

 

 

Tuesday 06.05.18
Posted by Luke Parham
Comments: 2
 

Closed Source is Best Source

Wow!

You thought I couldn't come up with the most boring topic of all time? You were sorely mistaken. On both accounts really. It's probably a case of Stockholm Syndrome or something, but I've kind of enjoyed learning some of the ins and outs of the Build Settings tab in Xcode.

This is kind of a specific topic, but it doesn't seem like it's talked about all that much. Sometimes you write code and that code gets turned into an app. Other times you write code for an open source project. Neato.

So what if you work at a company that wants to distribute an SDK to clients, but can't give them the source code directly? The answer is that you'll have to figure out how to package your code up into a pre-compiled framework and give them that.

Unfortunately, there's a lot of things that can go wrong along the way.

Building Pre-Compiled Frameworks That Don’t Break Client Apps

When you’re building a closed-source framework to distribute to clients, there’s a few extra things you have to think about that don’t necessarily come to the fore-front when you’re working on internal code or an open source project.

One thing that can be annoying at first is dealing with which architectures are included in the framework you’re giving out.

Just building a framework can be easy. First, make sure your target is actually a "framework" target. If this is the case, all you need to do is hit the play button in Xcode. Unfortunately, odds are good that you can’t just hand this framework over to your clients. 

Differing Architectures

A pre-compiled framework is really just a Mach-O binary packaged up in a conventional folder structure. This folder also includes library headers and any required resources the framework might need.

As far as the binary goes, there are two families of architectures it can be compiled to target. An "architecture" means assembly code that is made for a specific type of CPU. First, you have the armvX set of architectures which relate to CPUs running on physical devices. Then, you have the i386, x86 and x86_64 architectures which are binaries built to run on a simulator.

The rub you’ll immediately run into is that your framework will only include the architectures of the device that was selected in Xcode at build time. This means that if you had a simulator selected, then your framework will include simulator architectures and the client’s app will only be runnable in the simulator. In this case, the client won’t even be allowed to submit their app to the App Store!

Same goes for building to a device. Give them a framework with no simulator symbols and you’ll get complaints when your client goes to try something out in the simulator.

Debug vs Release Mode

The next problem you might run into is giving them a framework that wasn’t built in release mode. This won’t be immediately obvious since they’ll be able to build and test their app locally just fine. The problem comes when their app is once again rejected by iTunes Connect. This time the problem will be that they're including Debug Symbols. 

In general, release mode means optimizations have been turned on and all apps released to the App Store need to be built in release mode.

How Not to Jack Things Up

Since your goal is to get your framework into your client’s app without changing their mind about this relationship, you’re gonna want to try to avoid these mistakes.

Lucky for you, doing so can be reasonably straightforward, even if it does take a bit of extra thought.

Building a Fat Binary

To solve the problem where your client can only build for simulators or real devices, you can create what’s called a “fat” binary. It’s fat because instead of just including one set of architectures you can shove them all into one big framework.

You can do this at the command line using the xcodebuild and lipo commands.

    1. First, build a version of the framework for devices. You don’t have to specify a device since the “Generic Device” option is the default.


    2. Next, do the same thing but specify one of your installed simulators using the -destination argument.


    3. Finally, you can take the two versions of your framework and mash them together into the final product.

The lipo command is a little bit harder to parse. First, you’re telling it that you want to create new binary and specifying the name with whatever comes after the -output flag. 

Then you provide the paths to the two binaries that you’re trying to mash together.

A couple things to keep in mind is that the lipo command just creates a binary. You’ll want to make sure <build-location>/<framework-name>.framework/ already exists and that it includes the necessary headers and extra resources that were automatically included in the two frameworks you created using the xcodebuild command.

Also, notice we did specify that we wanted to build ‘Release’ versions of the frameworks in our xcodebuild commands. This means our second problem is taken care of as well.

The Problem Our Solution Creates

Like all good software fixes, this one creates one new problem in the process of fixing our original problems.

Giving them a fat binary did make it so their app can be built for both simulators and devices while testing, but once again, when they go to submit their app to the App Store they’ll be met with rejection. Turns out your app is only allowed to have device architectures to be considered a valid submission!

The only fix for this problem is to once again use the lipo command to strip out any unnecessary symbols before submitting an app to the App Store. Unfortunately, the app in question is out of your hands, so you’ll need to give your client a script to include in their build phases, or at least give them a heads up that they’ll need to do this before submitting.

Here's the one we use:

Don't worry, I'm quite sure we didn't write it. Could have come from this blog post; we may never know. 

I agree that this feels overly clunky, but sometimes there’s not much you can do about it.

Blog RSS
tags: frameworks, lipo, automation, ios
Monday 04.30.18
Posted by Luke Parham
 
Newer / Older

Powered by Squarespace.