Main Workflows

This section describes the most important steps to configure Data Factory Studio. You can read it carefully and click the links to get to the procedures detailing the configuration of each step.

This page discusses:

Tip: What do you want to search is the first question to ask yourself. To configure your Data Factory Studio properly, you need to have a good understanding of the data to index from the S3.

Index Files or Events in a Semantic Graph Index Unit

Step

Description

See ...

1

To index tabular files, create a dedicated account with the permissions required by Data Factory Studio on the S3.

To index event orders, prepare a JSON source with serialized event data.

Preparing the Source S3 for Amazon

JSON Source for Event Data

2

Define an external S3 source storage by specifying its connection parameters.

Defining External S3 Buckets

3

Add or import a Semantic Graph Index Unit.

Add a New Index Unit

4

Configure the Data Model in the index unit.

You can also import Data Model classes from an ontology previously defined in Ontology Editor.

Define the Data Model

Import a Class from an Ontology

5

Add a new ingestion pipeline to push data in your Data Model classes.

Add a New Pipeline

6

Configure the pipeline S3 source. You first reference the external S3 source storage defined in step 2.

Configure the Source

7

Optionally, configure data processing.

Configure the Processing

8

Configure the pipeline destination, that is the Data model class where you want to push indexed data.

Configure the Destination

9

Run the pipeline configuration to index data from the S3.

Run the Pipeline Configuration

Copy Binary Objects in a DFS Object Storage

Step

Description

See ...

1

Add or import an Object Storage Bucket.

Add a New Object Storage Bucket

2

Upload binary files manually in the DFS Object Storage bucket, and specify prefixes to classify them.

Alternatively, to schedule the upload automatically, you can choose to create a Copy Object Data pipeline.

Upload Files in the Bucket

Copying Object Data

3

You can then typically store the object file ids uploaded in the DFS Object Storage, in a Semantic Graph Index unit attribute value.

Procedure to show images for the Image, Result List, and Hit Details visualizations in Data Perspective Studio.