Concurrency in C# Cookbook中文翻译 :1.2并发性概述:并行程序设计导论

Introduction to Parallel Programming


Parallel programming should be used any time you have a fair amount of computation work that can be split up into independent chunks. Parallel programming increases the CPU usage temporarily to improve throughput; this is desirable on client systems where CPUs are often idle, but it’s usually not appropriate for server systems. Most servers have some parallelism built in; for example, ASP.NET will handle multiple requests in parallel. Writing parallel code on the server may still be useful in some situations (if you know that the number of concurrent users will always be low), but in general, parallel programming on the server would work against its built-in parallelism and therefore wouldn’t provide any real benefit.


There are two forms of parallelism: data parallelism and task parallelism. Data parallelism is when you have a bunch of data items to process, and the processing of each piece of data is mostly independent from the other pieces. Task parallelism is when you have a pool of work to do, and each piece of work is mostly independent from the other pieces. Task parallelism may be dynamic; if one piece of work results in several additional pieces of work, they can be added to the pool of work.


There are a few different ways to do data parallelism. Parallel.ForEach is similar to a foreach loop and should be used when possible. Parallel.ForEach is covered in Recipe 4.1. The Parallel class also supports Parallel.For, which is similar to a for loop, and can be used if the data processing depends on the index. Code that uses Parallel.ForEach looks like the following:


void RotateMatrices(IEnumerable<Matrix> matrices, float degrees)
    Parallel.ForEach(matrices, matrix => matrix.Rotate(degrees));

Another option is PLINQ (Parallel LINQ), which provides an AsParallel extension method for LINQ queries. Parallel is more resource friendly than PLINQ; Parallel will play more nicely with other processes in the system,while PLINQ will (by default) attempt to spread itself over all CPUs. The downside to Parallel is that it’s more explicit; PLINQ in many cases has more elegant code. PLINQ is covered in Recipe 4.5 and looks like this:

另一个选项是PLINQ (Parallel LINQ),它为LINQ查询提供了一个AsParallel扩展方法。Parallel比PLINQ更加资源友好;Parallel与系统中的其他进程一起运行会更好,而PLINQ(默认情况下)会尝试将自己扩展到所有cpu上。并行的缺点是它更明确;PLINQ在很多情况下都有更优雅的代码。PLINQ包含在配方4.5中,看起来像这样:

IEnumerable<bool> PrimalityTest(IEnumerable<int> values)
    return values.AsParallel().Select(value => IsPrime(value));

Regardless of the method you choose, one guideline stands out when doing parallel processing.


TIP:The chunks of work should be as independent from one another as possible.


As long as your chunk of work is independent from all other chunks, you maximize your parallelism. As soon as you start sharing state between multiple threads, you have to synchronize access to that shared state, and your application becomes less parallel. Chapter 12 covers synchronization in more detail.


The output of your parallel processing can be handled in various ways. You can place the results in some kind of a concurrent collection, or you can aggregate the results into a summary. Aggregation is common in parallel processing; this kind of map/reduce functionality is also supported by the Parallel class method overloads. Recipe 4.2 looks at aggregation in more detail.


Now let’s turn to task parallelism. Data parallelism is focused on processing data; task parallelism is just about doing work. At a high level, data parallelism and task parallelism are similar; “processing data” is a kind of “work.” Many parallelism problems can be solved either way; it’s convenient to use whichever API is more natural for the problem at hand. Parallel.Invoke is one type of Parallel method that does a kind of fork/join task parallelism. This method is covered in Recipe 4.3; you just pass in the delegates you want to execute in parallel:


void ProcessArray(double[] array)
        () => ProcessPartialArray(array, 0, array.Length / 2),
        () => ProcessPartialArray(array, array.Length / 2, array.Length)

void ProcessPartialArray(double[] array, int begin, int end)
    // CPU-intensive processing...  cpu密集型操作

The Task type was originally introduced for task parallelism, though these days it’s also used for asynchronous programming. A Task instance—as used in task parallelism—represents some work. You can use the Wait method to wait for a task to complete, and you can use the Result and Exception properties to retrieve the results of that work.


Code using Task directly is more complex than code using Parallel, but it can be useful if you don’t know the structure of the parallelism until runtime. With this kind of dynamic parallelism, you don’t know how many pieces of work you need to do at the beginning of the processing; you find out as you go along. Generally, a dynamic piece of work should start whatever child tasks it needs and then wait for them to complete.


The Task type has a special flag, TaskCreationOptions.AttachedToParent,which you could use for this. Dynamic parallelism is covered in Recipe 4.4.


Task parallelism should strive to be independent, just like data parallelism. The more independent your delegates can be, the more efficient your program can be. Also, if your delegates aren’t independent, then they need to be synchronized, and it’s harder to write correct code if that code needs synchronization. With task parallelism, be especially careful of variables captured in closures. Remember that closures capture references (not values), so you can end up with sharing that isn’t obvious.


Error handling is similar for all kinds of parallelism. Because operations are proceeding in parallel, it’s possible for multiple exceptions to occur, so they are wrapped up in an AggregateException that’s thrown to your code. This behavior is consistent across Parallel.ForEach, Parallel.Invoke, Task.Wait, etc. The AggregateException type has some useful Flatten and Handle methods to simplify the error handling code:

错误处理类似于所有类型的并行。由于操作是并行进行的,可能会发生多个异常,因此它们被封装在一个抛出到代码的AggregateException中。这种行为在并行中的Parallel.ForEach,Parallel.Invoke, Task.Wait, 等等是一致的。AggregateException类型有一些有用的Flatten和Handle方法来简化错误处理代码:

    Parallel.Invoke(() => { throw new Exception(); },
        () => { throw new Exception(); });
catch (AggregateException ex)
    ex.Handle(exception =>
        return true; // "handled"

Usually, you don’t have to worry about how the work is handled by the thread pool. Data and task parallelism use dynamically adjusting partitioners to divide work among worker threads. The thread pool increases its thread count as necessary. The thread pool has a single work queue, and each threadpool thread also has its own work queue. When a threadpool thread queues additional work, it sends it to its own queue first because the work is usually related to the current work item; this behavior encourages threads to work on their own work, and maximizes cache hits. If another thread doesn’t have work to do, it’ll steal work from another thread’s queue. Microsoft put a lot of work into making the thread pool as efficient as possible, and there are a large number of knobs you can tweak if you need maximum performance. As long as your tasks are not extremely short, they should work well with the default settings.


TIP:Tasks should neither be extremely short, nor extremely long.


If your tasks are too short, then the overhead of breaking up the data into tasks and scheduling those tasks on the thread pool becomes significant. If your tasks are too long, then the thread pool cannot dynamically adjust its work balancing efficiently. It’s difficult to determine how short is too short and how long is too long; it really depends on the problem being solved and the approximate capabilities of the hardware. As a general rule, I try to make my tasks as short as possible without running into performance issues (you’ll see your performance suddenly degrade when your tasks are too short). Even better, instead of using tasks directly, use the Parallel type or PLINQ. These higher level forms of parallelism have partitioning built in to handle this automatically for you (and adjust as necessary at runtime).


If you want to dive deeper into parallel programming, the best book on the subject is Parallel Programming with Microsoft .NET, by Colin Campbell et al.(Microsoft Press).

如果你想深入研究并行编程,关于这个主题最好的书是Colin Campbell等人(微软出版社)所著的《与Microsoft . net并行编程》。

