What's New

License Change from GPL to a more permissive Apache License

Like Intel® TBB 2.0, the Intel® TBB 2017 brings both technical improvements and becomes more open with the switch to an Apache* 2.0 license, which should enable it to take root in more environments while continuing to simplify effective use of multicore hardware.

Reduce overhead on well-balanced workloads with expanded set of partitioners.

Intel® TBB 2017 has expanded a set of partitioners with the tbb::static_partitioner. It can be used intbb::parallel_for and tbb::parallel_reduce to split the work uniformly among workers. The work is initially split into chunks of approximately equal size. The number of chunks is determined at runtime to minimize the overhead of work splitting while providing enough tasks for available workers. Whether these chunks may be further split is unspecified. This reduces overheads involved when the work is originally well-balanced. However, it limits available parallelism and, therefore, might result in performance loss for non-balanced workloads.

Improved Dynamic Memory allocation

Improved dynamic memory allocation replacement on Windows* OS to skip DLLs for which replacement cannot be done, instead of aborting. For 64-bit platforms, quadrupled the worst-case limit on the amount of memory the Intel® TBB allocator can handle. Intel® TBB no longer performs dynamic replacement of memory allocation functions for Microsoft Visual Studio 2008 and earlier versions.

Fully supported flow graph feature with enhancements to specify concurrency, external communications, and a composability layer to support heterogeneous computing.

The tbb::flow::async_node is re-implemented using tbb::flow::multifunction_node template. This allows to specify a concurrency level for the node. Since Intel TBB 4.4 Update 3 a special tbb::flow::async_msg message type was introduced to support communications between the flow graph and external asynchronous activities

Streaming workloads to external computing devices is significantly reworked in this Intel® TBB 2017 and introduced as a preview feature. Intel® TBB flow graph now can be used as a composability layer for heterogeneous computing.

Unlock additional performance for multi-threaded Python by enabling threading composability

An experimental module which unlocks additional performance for multi-threaded Python programs by enabling threading composability between two or more thread-enabled libraries.

Threading composability can accelerate programs by avoiding inefficient threads allocation (called oversubscription) when there are more software threads than available hardware resources.

For more details on all these new features read the following blog.

Features

Performance & Productivity

Parallel Algorithms - Generic implementation of common parallel performance patterns

Generic implementations of parallel patterns such as parallel loops, flow graphs, and pipelines can be an easy way to achieve a scalable parallel implementation without developing a custom solution from scratch.

Scheduler - Engine that manages parallel tasks and task groups

Intel® TBB task scheduler supports task-based programming and utilizes task-stealing for dynamic workload balancing – a scalable and higher level alternative to managing OS threads manually. The implementation supports C++ exceptions, task/task group priorities, and cancellation which are essential for large and interactive parallel C++ applications.

Concurrent Containers - Generic implementation of common idioms for concurrent access

Intel® TBB concurrent containers are a scalable alternative to serial data containers. Serial data structures (such as C++ STL containers) often require a global lock to protect them from concurrent access and modification. Concurrent containers allow multiple threads to concurrently access and update items in the container maximizing the amount of parallel work and improving application’s scalability.

Synchronization Primitives- Exception-safe locks,  mutexes, condition variables, and atomic operations

Intel® TBB provides a comprehensive set of synchronization primitives with different qualities that are applicable to common synchronization strategies. Exception-safe implementation of locks help to avoid dead-locks in C++ programs which use C++ exceptions. Usage of Intel® TBB atomic variables instead of C-style atomic API minimizes potential data races.

Scalable Memory Allocators - Scalable memory manager and false-sharing free memory allocator

The scalable memory allocator avoids scalability bottlenecks by minimizing access to a shared memory heap via per-thread memory pool management. Special management of large (8KB) blocks allow more efficient resource usage, while still offering scalability and competitive performance. The cache-aligned memory allocator avoids false-sharing by not allowing allocated memory blocks to split a cache line.


Flexibility

Applicable to Various Application Domains

The Intel® TBB flow graph as well as generic template functions are customizable to a wide variety of problems.

User-Defined Tasks

When an algorithm cannot be expressed with high-level Intel® TBB constructs, the user can choose to create arbitrary task trees. Tasks can be spawned for better locality and performance or enqueued to maintain FIFO-like order and ensure starvation-resistant execution.


Forward Scaling

Dynamic Task Scheduling

Intel® TBB allows a developer to think of parallelism at the higher level to avoid dealing with low level details of threading. A developer expresses a parallel model in terms of parallel tasks and relies on Intel® TBB to execute them in an efficient way by dynamically detecting the appropriate number of threads. This makes Intel® TBB based solutions independent of the number of CPU’s and allows for improved performance and scalability with the growing number of cores in the future.


Composability

Support for Various Types of Parallelism

Intel® TBB task scheduler and parallel algorithms support nested and recursive parallelism as well as running parallel constructs side-by-side.  This is useful for introducing parallelism gradually and helps independent implementation of parallelism in different components of an application.


Compatibility

Co-existence with Other Threading Packages

Intel® TBB is designed to co-exist with other threading packages and technologies (Intel® Cilk™ Plus, Intel® OpenMP, OS threads, etc.). Different components of Intel® TBB can be used independently and mixed with other threading technologies.

Compiler-independent Solution

Intel® TBB is a library solution and can be used in software projects built by multiple compilers, across numerous platforms.


Simple Licensing under Apache v2.0

Royalty-free Distribution

Redistribute unlimited copies of the Intel® TBB libraries and header files with your application.

Open Source Version

 Available for download from threadingbuildingblocks.org.  The broad support from an involved community provides developers access to additional platforms and operating systems.