Flow Graph Designer Features and Sample Walk-Through

Intel® Threading Building Blocks Concurrent Vector

Use concurrent_vector for thread safe vector operations

GDCE: Parallelize your Games with Intel® TBB

During GDC, we had the opportunity to talk to Dr. Mario Deilmann from Intel about Intel® TBB and why game developers should consider using this template library. Moreover we learnt what the main benefits are of Intel® TBB and how it can help developers to parallelize their games.

An Introduction to the Intel® TBB Flow Graph

This video will introduce you to the flow graph feature in Intel® Threading Building Blocks (Intel® TBB) and provide examples of how it can be used. The flow graph feature provides a flexible and convenient API for parallel reactive and streaming applications.

Flow Graph Designer - Performance Analysis

This video explains the performance analysis features available in Flow Graph Designer . It includes an overview of the performance timelines and an example workflow for analyzing a Flow Graph application.

Flow Graph Designer Features and Sample Walk-Through

This video demonstrates the features of Flow Graph Designer while stepping through its workflow using one of the included sample files: samples/code_generation/feature_detectio­n.graphml.

Why Use Intel® TBB?

Why use it?

Intel® Threading Building Blocks (Intel® TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable and composable, and that have future-proof scalability.

What is it?

Widely used C++ template library for task parallelism


Primary features
  • Parallel algorithms and data structures
  • Scalable memory allocation and task scheduling


Reasons to use
  • Rich feature set for general purpose parallelism
  • C++; Windows*, Linux*, OS X* and other OSes

Learn More

Latest Posts

November 24, 2015: Intel® TBB 4.4 Update 2 released
Files are available in the stable releases section. Download it!

Find out more about the changes.



November 19, 2015: Intel® TBB 4.4 Update 1 released

Files are available in the stable releases section. Download it!

Find out more about the changes.


August 26, 2015: Intel® TBB 4.4 released

Files are available in the stable releases section. Download it!

Find out more about the changes.

- See more at: https://www.threadingbuildingblocks.org/news-archive#sthash.EFAV1o26.dpuf

Files are available in the stable releases section. Download it!

Find out more about the changes.

Hi all,

I have an app that captures 6 x HD television feeds in real time via 6 separate threads. The second part of the app requires that all of the 6 HD buffers get resized (1 into 1280x720 and 5 into 640x360). The third part of the app is that once the resizing is completed then the 6 resized images are composited together to recreate one full HD image (1920x1080) which is then output back to TV.

The problem is that the final output is not stable and seems to drop frames in some of the sub-windows but not all of them. I am assuming that this is most likely a timing issue compounded by the WaitForMultipleObjects construct that I am using.

I am assuming from what I have read about TBB that there maybe a more productive way of streamlining this application using TBB but not sure where I should begin.

Any suggestions greatly appreciated.

Warren Brennan



I have a dynamically growing 3D grid of pointers to objects (containers of 3D points, amongst other meta data). Each cell's pointer to container can be accessed by a 3D address I,J,K into the grid.

Essentially the 3D grid represents 3D space and points are added to n grid cells dependent on their spatial extents (all cells are of the same size). As the space is further explored points fall into new potential cells that do not yet exist in the grid, so the grid is expanded and new cell container objects are created to hold the new points.

I am currently trying to use read write mutexes (and other mechanisms) to provide concurrent cell object addition, read only access, and write access. So that multiple threads can (concurrently) read from multiple cells, threads can get write access to a cell and all other threads block when trying to access that cell, and other threads can be adding new cells on the fly. Btw the number of cells grow from 1 to potentially 100s of cells.

Would this be a good candidate for using a concurrent_hash_map, where the key is an IJK address, and the value is a pointer to one of my cell container objects? Can the map provide the kind of manipulation of the grid I described? Wondering if only 100's of cells is ok (I was reading 1000s are typical), if adding to the map can be handled safely, how to effectively hash IJKs, etc. Kinda vague question's I know. Or would another container would be more appropriate?

I am a newbie to TBB and any advice would be appreciated.



Hi guys. I'm testing Intel TBB and i would appreciate any comments.

"To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner."

So, why boost::threading pool is more high performance than Intel TBB? 

Intel TBB is a task oriented model that it know hardware features. It know a better way to do something. So...I dont understand because Intel TBB has low performance than boost threading pool. 

ps: I have a Core i7 Windows Pro and Intel TBB testing creates 4 threads to execute task.

Thank you very much for your time.

class Engine
    Engine() : m_v( Engine::Empty ) {}
    Engine( const Engine& eng ) : m_v( eng.m_v ){}
    Engine( std::vector< std::string >& v ) : m_v( v ){}

    void operator()( tbb::blocked_range< size_t >& r ) const

    { // parallel_for
        std::vector< std::string >& v = m_v;

        for( size_t iIndex = r.begin(); iIndex != r.end(); ++iIndex ) 
            Verify( v[ iIndex ] );

    void Verify( std::string& str ) const



    std::vector< std::string >& m_v;

    void Start()
        boost::thread_group grp;

        for( int iIndex = 0; iIndex < 10; iIndex++ ) //creating 10 threads...
            grp.create_thread( boost::bind( &Engine::WorkThread, this, iIndex ) );



    void WorkThread( int iIdx ) // Each thread take a range from vector...thread 0 handle m_v[0]...m_v[99], thread 1 handle m_v[100]...m_v[199], ....
        int iStart = ( iIdx * 100 );
        int iEnd = iStart + 99 + 1;

        for( int iIndex = iStart; iIndex < iEnd; iIndex++ )
            Verify( m_v[ iIndex ] );



    void ParallelApply( std::vector< std::string >& v ) // low performance(Intel TBB)
        DWORD dwStart = GetTickCount();

        tbb::parallel_for( tbb::blocked_range< size_t >( 0, v.size() - 1 ), Engine( v ) ); 

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~1000 miliseconds

    void ThreadLevelApply( std::vector< std::string >& v ) // high performance(boost::threading pool)
        DWORD dwStart = GetTickCount();

        Engine eng( v );

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~500miliseconds


Using Intel TBB's new flow graph feature, we accomplished what was previously not possible, parallelize a very sizable task graph with thousands of interrelationships - all in about a week.
Robert Link
GCAM Project Scientist, Pacific Northwest National Laboratory


User Guide and Design Patterns
Reference Manual
Stable Documentation
Latest Documentation
Release Notes


TBB Forums
Code Samples
Hot Topics
TBB Getting Started


Structured Parallel Programming:
Patterns for Efficient Computation

Buy Now

Intel Threading Build Blocks:
Outfitting C++ for Multi-Core Processor Parallelism

Buy Now