Welcome to Parallelism Week at SQL University. My name is Adam Machanic, and I'm your professor.
Imagine having 8 brains, or 16, or 32. Imagine being able to break up complex thoughts and distribute them across your many brains, so that you could solve problems faster. Now quit imagining that, because you're human and you're stuck with only one brain, and you only get access to the entire thing if you're lucky enough to have avoided abusing too many recreational drugs.
For your database server, on the other hand, that multi-brain dream is reality. It's common for newer servers to have 16 or more logical processors, and numbers above 64--once the hallmark of extremely high-end hardware--are today no reason to raise an eyebrow. SQL Server is designed to take advantage of servers with a high number of processors, utilizing two or more at the same time to help it solve big problems faster. This is called parallelism, and understanding how it works and why is one the keys to making your database applications scale.
This first post for Parallelism Week details the foundation material you need to understand before we dive in and investigate some of the query processing components in more detail. The next post will discuss how parallel query processing works and the various components you'll see when you read parallel query plans, and the final post of the week will get into some of the means by which you can control parallelism at both the server and query level.

Figure 1: A four-brained computer, frying an egg.
Background: Threads, Threading, and Other Sewing Analogies
Each CPU in your computer can do only one thing at a time, albeit very quickly. But if forced to finish an entire task before moving on to the next, certain tasks would seem choppy and sluggish even running on extremely fast processors. To get around this issue modern operating systems are designed to be able to do lots of things at once (concurrently). This is done by taking advantage of the fact that the CPU can switch back and forth between tasks almost as quickly as it can get them done. Each program on your computer runs under a process space--a virtual construct to which resources such as memory and CPU time can be granted. And each process can use one or more logical threads to do work on the CPU. Under the context of individual logical threads, a program is able to appear to do more than one thing at once. In reality the way this works is via time slicing.
The Windows operating system includes a component called a scheduler, whose job it is to control which programs get CPU time and when. Operating system designers have come up with various methodologies that can be used to control CPU scheduling, and Windows uses a system called preemption. The idea behind preemption (or preemptive scheduling) is that by default each active thread (i.e., a thread that actually has some work to do) gets an equal amount of time on the CPU. The amount of CPU time is split equally, and the CPU switches back and forth between the various tasks vying for its attention. Each equal CPU time slice is called a quantum. In reality, some threads are more important than others, and these are said to have a higher priority. In a preemptive environment, the higher priority threads are able to force the lower priority threads to yield, in order that the higher priority threads can get more of the CPU time and finish their work more quickly.
Imagine yourself sitting at your desk, and imagine 10 of your co-workers standing around your desk, each asking you to help them with a different project. Now imagine that each minute you abruptly stop whatever conversation you were having and turn to the next co-worker, and then the next, over and over, until all of the projects are complete. This would quickly become confusing and overwhelming for us, but it's effectively what the CPU does as it works on multiple concurrent threads using time slicing. Each change from one thread to another is called a context switch, and although the CPU can do so much more effectively than the human brain, it's not free. If a CPU is overloaded with logical threads and is doing too many context switches, it is said to be thrashing, and can spend more time switching back and forth than getting actual work done.
SQL Server runs on top of the Windows operating system, and so it must work within the confines of the operating system's preemption model. However, internally SQL Server uses its own scheduler, which is one of the components of the SQL Server Operating System (SQLOS for short). Unlike Windows, the SQLOS scheduler uses a cooperative (also called non-preemptive) scheduling model. In this model each thread (abstracted within SQL Server by an entity called a worker) can spend as much time as it needs on the CPU until its task is finished. The idea is that preemption should not be necessary because threads will eventually have to stop and acquire non-CPU resources (e.g. data from the disk system) and will enter wait states, thereby liberating the CPU for other tasks. SQL Server's cooperative model is somewhat limited in its flexibility since again, as a Windows application, it must also abide by the operating system's preemptive model. This means that SQL Server's threads will still be subject to time slicing if there are competing processes running on the server. But in optimally-configured environments where that’s not the case, SQL Server’s cooperative scheduler will effectively be in control of the full lifetime of its threads, keeping them active until they are either finished or need to enter a wait state.
Note that for brevity I've simplified descriptions of the inner workings of SQLOS and scheduling in general. I have also completely bypassed much of the background on wait states. For more information on these topics, especially on the various wait states and how they impact performance, I highly recommend reading Tom Davidson's white paper, "SQL Server 2005 Waits and Queues."
Multitasking and Parallel Processing
Much of the detail around the execution models described in the previous section involves interaction between multiple programs, which the operating system makes us believe are running on the same CPU at the same time. This is known as multitasking, and enables me to write this post in my Web browser while two instances of SQL Server, PowerPoint, Word, and a number of other background processes are all doing work on my laptop. The quanta are so fast that I don't even notice, as I type this sentence, that the Web browser is being switched on and off of the CPU to which it's assigned.
SQL Server internally participates in a form of multitasking. Each worker within the process space is bound to a scheduler, which handles the cooperative scheduling for a single logical processor. Each scheduler can have many associated workers, but just like in the operating system only one worker at a time can actively work with the CPU. Therefore, it's possible on a dual-core server to have, e.g., 10 concurrent queries running. What's really happening is that the various queries--by virtue of the workers actually doing the processing--are switching in and out of wait states, so only up to two queries are actually consuming CPU time at any given moment. This assumes that each query uses only a single worker--which we'll see is not a safe assumption.
Above and beyond simple multitasking, SQL Server can break certain queries into component parts, each of which can be processed on a separate logical CPU. This is known as parallel processing, or, more simply, parallelism. When the query optimizer determines that a query--or, more accurately, one or more components of a query--can participate in parallelism (or can "be parallelized" or "go parallel"), two or more workers are used to do the processing necessary to satisfy the query. The various workers for all of the queries running concurrently within SQL Server are each subject to the cooperative scheduling model, and go on and off the schedulers and in and out of wait states as necessary. This means that large systems can often support hundreds or thousands of workers serving tens or hundreds of concurrent queries, even with comparatively few logical processors actually doing the work.
The Pros and Cons of Parallel Processing
Parallelism, as it turns out, is quite the double-edged sword.
On one hand, taking all of the work that a query has to do and doing it in parallel on several CPUs rather than on a single one can mean extreme performance gains. Certain queries can scale almost linearly as more CPUs are introduced. This means that for each additional CPU, run-time decreases proportionally, so a query running on 10 CPUs will run in approximately 1/10th as much time as the same query running on one CPU.
Certain operations, such as sorts, are especially well-suited to parallel processing due to the way their algorithms work. The best sort algorithms operate based on a predictable scale of N * log(N) operations per set, where N is the number of elements in the set. We can show mathematically that parallel sorts can in fact use less overall CPU time than the same sort on a single processor. If C is the number of CPUs across which a sort is being processed, then for any value of N or C greater than 1, C * ((N/C) * log(N/C)) is less than (N * log(N)). This means that as long as the cost of merging the sorted sub-ranges from each worker thread is not too high, the overall sort operation will require fewer clock ticks done in parallel than done in serial. More on this in a future post.
Taking an opposite stance on the performance question, consider that splitting up work and binding a single query to multiple workers does require some overhead. There is memory overhead in maintaining state for each worker, CPU overhead due to a greater number of context switches that must occur, and wait state overhead at the points at which worker threads must join, or synchronize with one another. In addition to these basic costs associated with actually processing queries in parallel, SQL Server has a limited number of pooled (reusable) threads to work with and if too many queries go parallel at the same time the thread pool can be exhausted, causing any new queries to be forced to wait until an active thread becomes available.
To combat these situations, SQL Server's query optimizer only creates parallel plans when parts of the query exceed a given cost threshold, meaning that the optimizer believes that there should be sufficient return to warrant the parallelism investment. This system eliminates unnecessary parallelism in many--but certainly not all--cases, but also inhibits parallelism in some situations where it may be beneficial. A DBA or developer concerned with creating a well-tuned system must understand parallelism well enough to monitor for its proper use, and must know how to modify settings appropriately to guide the decisions made by the optimizer. In this way, a balance can be achieved such that parallelism is used to make the biggest queries run faster, and not used for small queries for which there would be no gain.
Looking Forward
This concludes the first of three parts of parallelism week here at SQL University. Part two will cover parallel query plans, and part three will get into the various ways that we, as end-users, can control parallelism in the query engine. If you have any questions on what's covered in the post, please leave a comment below and I'll answer as soon as possible.
Enjoy!