GRAPH TRAVERSER

Processes recursive operations on each element of the input stream. It is suitable to navigate through a graph/tree of data.

This page discusses:

Concepts
Example

Concepts

To configure the GRAPH TRAVERSER to process your graph of data the way you want, you must define one or several processing units (called Node processor).

A Node processor first receives, and then applies treatments to a node, to finally forward the current node children to its successor Node processor. Node Processors call each other recursively to traverse the whole graph until all nodes have been processed, or a stop condition has been reached.

Processor

A processor treats a single type of node forwarded by its predecessor. To define which processor receives the element from the input stream, you must define an Initial processor in your configuration.

A processor applies the following treatments in this order:


Action	Description
`Update context variables`	Updates context variables (see next sections).
`Emit expression`	Defines what the GRAPH TRAVERSER operator will output (all emit expressions of all processors must return the same type).
`Emit predicate`	Expression returning a Boolean. This expression decides whether to call the Emit Expression for the current node or not.
`Transition`	Forwards elements to the successor Node processor. For more information, see Transition.
`Stop node condition`	Expression returning a Boolean. This expression decides whether to call a Transition for the current node or not.

Transition

A Transition is a function that provides elements to forward to the next processor. There are several types of transitions.


Transition	Description
`Explicit Nodes Transition`	Expects an expression providing children to the successor Node Processor referenced by `To`. You can use this kind of transition for most use cases.
`Search Transition`	Searches items directly from the index (similar to `SEARCH` operator). Expects an Item class, an Index unit, and a predicate. It provides matching items to the successor Node Processor referenced by `To`.
`Conditional Transition`	Advanced Transition type that targets 2 processors. It determines the processor to apply at runtime using the `Switch predicate`. If the `Switch predicate` is: `true`, it uses `Main transition`, `false`, it uses the `Fallback Transition`

Context

You can define context variables to store the current state of your traversal. Node processors can access and update these variables. To define a variable, you must provide its static type and initialization value.

There are 3 kinds of variables to answer the different kind of use cases. The differences between these 3 types of variable are the visibility, and the number of instances for each variable.


Variable type	Description
global variable	There is only one instance of this variable for the whole traversal. This instance is visible by everybody. When a Node processor updates the value, the new value is available to all Node Processors. Common use case: count the number of processed nodes during the whole traversal.
node variable	One instance of this variable is created anytime a new node is processed. The 2nd time a node is processed, it can fetch the previously set value of this variable when the node was processed for the 1st time. Common use case: Count the number of times a node has been processed during the traversal.
path variable	When a node processor updates a path variable, it copies its new value for each child. This way, the changes made on a path variable are only visible to a node child. Common use case: For each node, store the path, that is the succession of nodes, from the root that led to this node.

Depth-first search (DFS)

The GRAPH TRAVERSER operator performs a Depth-first search (DFS) graph traversal. This means that it always processes the children of a given node before its siblings.

Example

In this example, our graph of data is a social network indexed with the following Data Model:

class User {
    name: String;
    friends: List<User>;
}

We want to compute all paths that can lead from UserA to UserD, User A being the root, and UserD the end condition. The Social Network looks like this:

graph TD
A[UserA] -->|friends with| B[UserB]
A[UserA] -->|friends with| C[UserC]
C[UserC] -->|friends with| D[UserD]
B[UserB] -->|friends with| D[UserD]

The correct output for this use case is [UserA, UserC, UserD] and [UserA, UserB, UserD].

To do this, we must configure the GRAPH TRAVERSER as follows:

graph TD
A[user] -->|user->friends| A[user]

We need:

A single Node Processor called user treating nodes of type: Item<User>
A Path variable to collect graph paths from UserA to UserD called path of type: List<Item<User>>.

As the initial processor is user, the configuration of the Node processor is:

Context update: path = [...path, user]
Emit expression: path 
Emit predicate: user == UserD 
Explicit node transition: user->friends targeting To: user

In this example, relationships are oriented and UserD has no children, so the traversal stops, because there are no more node to process. However, in a more realistic model, relationships are nonoriented and it can exist cycles in the graph (for example, UserD can be friend with UserA). You can handle this kind of use case with a Node variable: visited: Boolean = false that flags whether a node has been processed or not. And if you add the Stop Node condition: visited to user, the traversal stops on each path if the current node has already been processed. There will be no risk of infinite looping on cycles..