Using Intel® Threading Building Blocks

Intro to Intel® TBB parallel_for

Use parallel_for to parallelize iterations within a for loop

Introduction to Intel® Threading Building Blocks

This video introduces you to Intel® Threading Building Blocks and provide some examples of how to use it.

An Introduction to the Intel® TBB Flow Graph

This video will introduce you to the flow graph feature in Intel® Threading Building Blocks (Intel® TBB) and provide examples of how it can be used. The flow graph feature provides a flexible and convenient API for parallel reactive and streaming applications.

Optimizing Game Architecture with Intel® TBB

Brad Werth, Intel Senior Software Engineer, spoke to game developers at GDC about how the cross-platform library, Intel® TBB, can enhance game performance.

Using Intel® Threading Building Blocks

Introduction to using the parallel_for template from Intel® TBB. Follow along with the sample code from the Add Parallelism Evaluation Guide.

Why Use Intel® TBB?

Why use it?

Intel® Threading Building Blocks (Intel® TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable and composable, and that have future-proof scalability.


What is it?

Widely used C++ template library for task parallelism

 


Primary features
  • Parallel algorithms and data structures
  • Scalable memory allocation and task scheduling

 


Reasons to use
  • Rich feature set for general purpose parallelism
  • C++; Windows*, Linux*, OS X* and other OSes

Learn More


Latest Posts

News
Forums
February 13, 2014: Intel® TBB 4.2 update 3 released

Files are available in the stable releases section. Download it!

Find out more about the changes.

 

December 2, 2013: Intel® TBB 4.2 update 2 released

Files are available in the stable releases section. Download it!

Find out more about the changes.

October 28, 2013: Intel® TBB 4.2 update 1 released

Files are available in the stable releases section. Download it!

Find out more about the changes.

Hi all,

I have an app that captures 6 x HD television feeds in real time via 6 separate threads. The second part of the app requires that all of the 6 HD buffers get resized (1 into 1280x720 and 5 into 640x360). The third part of the app is that once the resizing is completed then the 6 resized images are composited together to recreate one full HD image (1920x1080) which is then output back to TV.

The problem is that the final output is not stable and seems to drop frames in some of the sub-windows but not all of them. I am assuming that this is most likely a timing issue compounded by the WaitForMultipleObjects construct that I am using.

I am assuming from what I have read about TBB that there maybe a more productive way of streamlining this application using TBB but not sure where I should begin.

Any suggestions greatly appreciated.

Warren Brennan

 

 

I have a dynamically growing 3D grid of pointers to objects (containers of 3D points, amongst other meta data). Each cell's pointer to container can be accessed by a 3D address I,J,K into the grid.

Essentially the 3D grid represents 3D space and points are added to n grid cells dependent on their spatial extents (all cells are of the same size). As the space is further explored points fall into new potential cells that do not yet exist in the grid, so the grid is expanded and new cell container objects are created to hold the new points.

I am currently trying to use read write mutexes (and other mechanisms) to provide concurrent cell object addition, read only access, and write access. So that multiple threads can (concurrently) read from multiple cells, threads can get write access to a cell and all other threads block when trying to access that cell, and other threads can be adding new cells on the fly. Btw the number of cells grow from 1 to potentially 100s of cells.

Would this be a good candidate for using a concurrent_hash_map, where the key is an IJK address, and the value is a pointer to one of my cell container objects? Can the map provide the kind of manipulation of the grid I described? Wondering if only 100's of cells is ok (I was reading 1000s are typical), if adding to the map can be handled safely, how to effectively hash IJKs, etc. Kinda vague question's I know. Or would another container would be more appropriate?

I am a newbie to TBB and any advice would be appreciated.

-Ryan

 

Hi guys. I'm testing Intel TBB and i would appreciate any comments.

"To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner."

So, why boost::threading pool is more high performance than Intel TBB? 

Intel TBB is a task oriented model that it know hardware features. It know a better way to do something. So...I dont understand because Intel TBB has low performance than boost threading pool. 

ps: I have a Core i7 Windows Pro and Intel TBB testing creates 4 threads to execute task.

Thank you very much for your time.


class Engine
{
public:
    Engine() : m_v( Engine::Empty ) {}
    Engine( const Engine& eng ) : m_v( eng.m_v ){}
    Engine( std::vector< std::string >& v ) : m_v( v ){}

    void operator()( tbb::blocked_range< size_t >& r ) const

    { // parallel_for
        std::vector< std::string >& v = m_v;

        for( size_t iIndex = r.begin(); iIndex != r.end(); ++iIndex ) 
            Verify( v[ iIndex ] );
    }

    void Verify( std::string& str ) const
    {

       ...

    }

    std::vector< std::string >& m_v;

    void Start()
    {
        boost::thread_group grp;

        for( int iIndex = 0; iIndex < 10; iIndex++ ) //creating 10 threads...
        {
            grp.create_thread( boost::bind( &Engine::WorkThread, this, iIndex ) );
        }

        grp.join_all();

    }

    void WorkThread( int iIdx ) // Each thread take a range from vector...thread 0 handle m_v[0]...m_v[99], thread 1 handle m_v[100]...m_v[199], ....
    {
        int iStart = ( iIdx * 100 );
        int iEnd = iStart + 99 + 1;

        for( int iIndex = iStart; iIndex < iEnd; iIndex++ )
            Verify( m_v[ iIndex ] );

    }

...

    void ParallelApply( std::vector< std::string >& v ) // low performance(Intel TBB)
    {
        DWORD dwStart = GetTickCount();

        tbb::parallel_for( tbb::blocked_range< size_t >( 0, v.size() - 1 ), Engine( v ) ); 

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~1000 miliseconds
    }

    void ThreadLevelApply( std::vector< std::string >& v ) // high performance(boost::threading pool)
    {
        DWORD dwStart = GetTickCount();

        Engine eng( v );
        eng.Start();

        DWORD dwEnd = GetTickCount();

        std::cout << "(" << dwEnd - dwStart << ")" << "Elapsed" << std::endl; // ~500miliseconds
    }

 

Intel TBB provided us with optimized code that we did not have to develop or maintain for critical system services. I could assign my developers to code what we bring to the software table - crowd simulation software.
Michaël Rouillé
CTO, Golaem

Documentation

User Guide and Design Patterns
Reference Manual
Stable Documentation
Latest Documentation
Doxygen
Release Notes
CHANGES file

Resources

TBB Forums
Code Samples
FAQs
Licensing
Hot Topics

 

Structured Parallel Programming:
Patterns for Efficient Computation

Buy Now

Intel Threading Build Blocks:
Outfitting C++ for Multi-Core Processor Parallelism

Buy Now