Imagine a cook in the kitchen, making three dishes. A steak, some boiled eggs, and a salad. A cook with a one-track mind could do only one dish at a time. He would defrost the meat, slice it, put it in the oven, and wait for it to be done, only then taking eggs from the fridge, boiling them, peeling them, and finally preparing the salad.
That is obviously not the way cooks behave in a kitchen because that would be massively ineffective.
While a cook can only do one thing at a time, be it slicing or peeling, there is a vast amount of time to be saved by not sleeping on the job while waiting for something. Make the salad while the meat is in the oven, right?
In the software world, the cook is the CPU and the tasks it is faced with usually have long waiting periods in them too. The CPU is idling most of the time, only occasionally doing work.
Actually, the retrieval of an external resource is a classic example associated with this kind of task. Let’s take a closer look at this situation. Imagine we need to fetch 100 URLs from a remote server, and we do so sequentially. The actual work — sending a request or processing a received response — would be small compared to the long periods of time spent waiting for the server to respond and for the network to bring the answer to us.
Requests are executed synchronously, the next one starting only after the previous one has finished. To save time, we need to execute them in a more complex pattern, juggling them to achieve the best efficiency.
One way to do it is with threads. This way several requests can be executed in parallel, rotating constantly and giving each one a fixed amount of time to work. We will still be waiting for the server response, but at least we will be waiting for several requests at once.
Python provides a convenient API for carrying out this task. With several threads, each executing intermittently within its own time slice, our CPU utilization would look like this:
- The thread execution is intermittent rather than parallel. The CPU is constantly switching between them.
- Each thread is being executed for a fixed amount of time, regardless of whether it is doing real work or just waiting for an action to be completed.
The threads/scheduler model and fixed timeslots leave a few things to be desired.
- Context switching has a cost
- Threads that have nothing to do still hold their timeslots.
It would be great if the switches would occur not at fixed time slices, scheduled by an over-watching system (so-called preemptive multitasking), but exactly at the time the task has completed its current job and starts waiting for the next action. That would be the perfect moment for this task to allow another task to be passed to the CPU. This process is called “cooperative multitasking.”
Instead of the Scheduler guy with a stopwatch, we have the Loop guy who is receiving control when the tasks have nothing left to do.
– Dave, you open?
– Nah
– Bill, can you go?
– Nope
– Jake, your turn. (Jake runs for the basket)
– (some time later): I’m done.
– Dave, how about now?
We need special functions to make this possible. Classical functions do not return control to anybody until they are completely done (or they fail their task). To make them collaborate we need them to be able to give back control when they do not need it for a moment.
These polite functions are called coroutines. A coroutine is a function that can suspend execution at some point and resume it later. By implementing our downloading application with coroutines, we will get a performance graph like this:
This method utilizes the CPU to near 100% capability, and thus greatly improves the application’s performance.
This practice is most beneficial when there is a lot of waiting time to optimize, which is often the case. Disk reads, database access, and network requests take up a great deal of time on a typical web application.
With coroutines we can get rid of unnecessary waiting time and serve additional customers. By using CPU more effectively, we can easily serve tens of thousands of requests per second.