Tutorial : Flow Graph

Previous : Concurrent Containers Top : Tutorial    Next : Scalable Memory Allocator

Flow Graph

The Intel TBB library is well known for its support of loop parallelism; the flow graph interface extends its capabilities to allow fast, efficient implementations of dependency graph and data flow algorithms, enabling developers to exploit parallelism at higher levels in their application.  The flow graph interface was introduced in Intel TBB 4.0

Consider Fig 1(a), where a simple application invokes four functions sequentially. A loop-parallel approach, Fig 1(b), exploits concurrency in each function with algorithms such as an Intel TBB parallel_for or parallel_reduce, reducing execution time.  However, Fig 1(b) may still be overly constrained. For example, functions B and C may both require the output generated by A, but C may not depend on the output of B.   Fig 1(c) shows the graph- and loop-parallel implementation of this example.  In this implementation, the loop-level parallelism is exposed and the overly constrained total ordering is replaced with a partial ordering that would allow B and C to execute concurrently.

(a) a sequential execution

(b) a loop-parallel execution

(c)  a graph and loop-parallel execution

When using a flow graph, computations are represented by nodes and the communication channels between these computations are represented by edges.  The user is responsible for using edges to express all dependencies that must be respected when nodes are scheduled for execution, giving the Intel TBB scheduler the flexibility to exploit the parallelism that is explicit in the graph topology.  When a node in the graph receives a message, an Intel TBB task is spawned to execute its body object on the incoming message. 

The flow graph interface supports several different types of nodes, as shown below, including functional nodes that execute user-provided body objects, buffering nodes that can be used to order and buffer messages as they flow through the graph, aggregation and deaggregation nodes that join and split messages, and other special purpose nodes.   Users connect instances of these node types together with edges to specify the dependencies between them and provide body objects to perform their work.

 

Fig 1.  The types of nodes supported by the flow graph interface

Below is the source code for a simple “Hello World” flow graph application.   This example does not contain any parallelism, but demonstrates the syntax of the interface.   In this code, two nodes are created, hello and world, which are constructed with lambda expressions that print “Hello” and “World” respectively.  Each node is of type continue_node, a functional node type provided by the interface.  The make_edge call creates an edge between the hello node and the world node.  Whenever a task spawned by the hello node completes, it will send a message to the world node, causing it to spawn a task to execute its lambda expression.

#include "tbb/flow_graph.h"
#include <iostream>

using namespace std;
using namespace tbb::flow;

int main() {
    graph g;
    continue_node< continue_msg> hello( g,
      []( const continue_msg &) {
          cout << "Hello";
      }
    );
    continue_node< continue_msg> world( g,
      []( const continue_msg &) {
          cout << " World\n";
      }
    );
    make_edge(hello, world);
    hello.try_put(continue_msg());
    g.wait_for_all();
    return 0;
}

 

In the code above, the call to hello.try_put(continue_msg()) sends a message to the hello node, causing it to spawn a task to execute its body object.  When that task completes, it sends a message to the world node.   Only when all of tasks spawned by the nodes are complete, does the call to g.wait_for_all() return.

The Intel TBB flow graph interface enables the expression of very complex graphs that may include thousands of nodes and edges, cycles, buffering and more.  There are a number of resources available to learn more about the flow graph feature in the Intel® Threading Building Blocks library.

 

Previous : Concurrent Containers Top : Tutorial   Next : Scalable Memory Allocator