To understand the importance of goroutines, we must first understand concurrency and parallelism.
As we all know, it’s the CPU, or more precisely, its core, that executes our code, and one CPU core can handle only one instruction at a time. It’s OK back in the days when people still used punch cards to run their programs, but today, it can be hardly satisfying if we have to run our programs one by one. To solve this problem, there are basically two solutions: Have more cores, or interleave programs.
By having more cores, we can increase the number of programs that can run simultaneously, since having n cores means n instructions can be handled at a time, and that’s parallelism. And by interleaving programs, a few instructions from Program A will be executed, and then the kernel will take over to make some arrangement, and then a few instructions from Program B will be executed, and then the kernel will again take over, and then a few more instructions from Program A will be executed and so on. If your PC is fast, it will feel like multiple programs are running simultaneously even if there is only one CPU core, and that’s concurrency.
To take advantage of parallelism, it’s normal to split the workload of a program into multiple threads but, although threads have many advantages over processes, they are just lightweight processes and not really cheap. Just like opening a dozen applications on your computer, if you create too many threads, your computer will feel lagging just the same because of concurrency.
To avoid creating a whole army of threads and cause other programs to starve (e.g. 1024 threads of the same program v.s. your Spotify), we should only create as many threads as we need. Generally speaking, it’s better to create no more than one thread for each core. It’s not easy, however. For example, a server may need to handle hundreds or thousands of requests per second, to say the least, and even with IO multiplexing techniques such as epoll, a handful of threads (if limited by the number of cores) are far from enough for time-consuming requests. Concurrency is still needed. But we don’t want our threads to compete with other programs for CPU time and we just want some level of concurrency within our own program. That is to say, what we need is just a scheduler after all. And that leads us to user-level threading. That is, we implement our own scheduler for our program.
Just like the kernel scheduler, user-level threading comes in two flavors: cooperative (coroutines) and preemptive. Earlier versions of Go use the former. The basic idea behind cooperative threading is simple: you can create many coroutines and you can switch to another coroutine whenever you like but you have to do it manually. By manually switching between coroutines, you can achieve some level of concurrency, but as you can see, if we call a blocking function like read(), there is no way for us to go anywhere before it returns. And that’s where preemptive scheduling comes into play. Preemptive scheduling uses a scheduler to ensure that multiple routines are interleaved properly. However, to achieve this level of control using a language without native support, like C, a thorough rewrite of the language is necessary and that brings me to Go.
Goroutines combine kernel-level threads (you can use GOMAXPROCS to set the number of kernel-level threads you would like to use) with user-level threads, taking advantage of both parallelism and concurrency. With goroutines, you get a preemptive scheduler straight out of the box, and you don’t need to worry about building your own thread pools, and I think that’s exactly what makes Go so useful for servers and cloud applications.