Frequently Asked Questions

This page answers several questions that you may have concerning Data Factory Studio.

This page discusses:

See Also
Basic Concepts
Main Workflows

Can I Add Other Sources Than CSV?

Yes. The application can also index JSON and Parquet files.

What Happens When You Push a New File?

Data Factory Studio detects when new sources files are added to the S3 Bucket (specified in Pipeline > Source > S3 Bucket), and indexes these files automatically. Similarly, it also detects all updates made to existing source files, and reindexes these files to keep the index up to date.

What Happens When Sources Files are Deleted?

The behavior depends on the pipeline type.

Pipeline Type Description
Index Tabular Data Pipeline Data Factory Studio detects when files are deleted, and deletes all associated items from the index.
Index Event Data Pipeline Nothing happens, your index stays the same.

Events follow the append-only paradigm like event-logs in other databases. So unlike tabular pipelines, here Data Factory Studio does not try to reverse changes brought by removed files. If your use-case requires removing things from the index the json-event format provides actions like DeleteItem. For more information, see Delete Item.

Copy Object Data pipeline Files removed from the source files are removed from the destination too.

What Happens With Lines Deleted in Source Files?

Data Factory Studio detects when lines are deleted or updated in source files. Data Factory Studio deletes all associated items from the index, and recrawls the files.

How To Manage URIs (Item Identifier)?

It is better to specify URI values in your source files. For CSV files, create a dedicated column with uri as header, and define URI values in its cells. The URI value must be unique in the entire Index Unit. You can consider it as the global primary key of the index.

Important: Pushing an item with an existing URI overrides it in the Index Unit.

To point to other elements in the index, use an attribute of Reference type. The value of this attribute must be the URI of the target element (even if it does not index yet in the Index Unit). In the following example, you specify who is Michael's neighbor by referencing the URI value associated with Jane.

uri, name, neighbor
1, Michael, 5
5, Jane, 1
Note: If you do not specify URI values in your source files, Data Factory Studio creates them automatically for all items (CSV lines) in the Index Unit, using the following elements: [s3bucketname]:[csvFolderPath]/[csvFileName]:[lineNumber]

For example, semanticgraphindex-814590966889-euw:test-samples/avocados/avocados.csv:98