Add a New Pipeline
Define one pipeline per Data Model class.
Before you begin: You have created an index unit.
-
From the left panel, select
Storage.
-
Click
Add
pipeline.
-
Click
Create pipeline.
-
Enter a Pipeline name.
-
For Pipeline type, select
Index Tabular Data.
-
Click Create.
The General information page opens. Recommendation:
Click Save after any editing
operation.
Configure the Source
Once the pipeline is created, configure the source, that is the connection to the
storage on which Data Factory Studio must fetch source files.
-
Select the
Source tab.
-
Configure the Storage parameters.
- Optional: If lineage with Datasets Governance is activated, you see a Source Lineage section on the page. In that case, you must select a catalog previously defined in Datasets Governance from Source Catalog.
- Configure the Scheduling, that is the refresh frequency of the source to fetch.
By default, Data Factory Studio refreshes data every minute. You can specify longer intervals, like every 10 minutes, every hour, every day, etc. You can also specify the source scheduling to run continuously, which means that each time an S3 fetch request comes to an end, a new one starts automatically. You can define your own custom source scheduling using a Quartz Cron Expression. For example: "0 0 13 * * ?" indicates that you want to schedule the processor to run at 1:00 PM every day."0 20 14 ? * MON-FRI" indicates that you want to schedule the processor to run at 2:20 PM from Monday to Friday."0 15 10 ? * 6L 2011-2017" indicates that you want to schedule the processor to run at 10:15 AM, on the last Friday of every month, between 2011 and 2017. For more information on Quartz format, see https://www.quartz-scheduler.org/documentation/ -
Select a Data format.
-
Click Save.
Configure the Processing
Advanced users, already familiar with Semantic Graph Index Data Queries can optionally define a MAP expression in the Processing tab.
The expression input must be a tuple, called record , with your data source values. For more information, see Appendix - About Processing.
Configure the Destination
Define in which Data Model class, the ingestion pipeline must push fetched data. The index unit must match the CSV columns one by one.
Notes:
You cannot use the 3DSpace as
destination target. No pipeline can modify it.
-
Select the
Destination tab.
-
Enter the Index unit where you want to push data.
-
For Package, select a data model package, for example,
org.common .
-
For Class, select the name of the class which must store
indexed data, for example
car . Remember that you have one pipeline per
class.
-
Optional: If your Data Model contains a property of type List, scroll down to the Index Properties table, and
specify a separator in the Multivalued separator column.
Note:
The Multivalued separator must be different from the
Value separator defined in the
Source tab.
-
Click Save.
Run the Pipeline Configuration
Once the configuration is complete, you can apply it to start indexing documents.
- Select the
Run tab.The running status of the indexing displays at the top of the screen. - To start the pipeline indexing, click
Run.You can stop the indexing at any time and restart it when ready.
|