Once the pipeline is created, configure the source, that is the connection to the
storage on which Data Factory Studio must upload object files.
-
Select the
Source tab.
-
Configure the Storage parameters.
- Optional: If lineage with Datasets Governance is activated, you see a Source Lineage section on the page. In that case, you must select a catalog previously defined in Datasets Governance from Source Catalog.
- Configure the Scheduling, that is the refresh frequency of the source to fetch.
By default, Data Factory Studio refreshes data every minute. You can specify longer intervals, like every 10 minutes, every hour, every day, etc. You can also specify the source scheduling to run continuously, which means that each time an S3 fetch request comes to an end, a new one starts automatically.
You can define your own custom source scheduling using a Quartz Cron Expression. For example:
"0 0 13 * * ?"
indicates that you want to schedule the processor to run at 1:00 PM every day."0 20 14 ? * MON-FRI"
indicates that you want to schedule the processor to run at 2:20 PM from Monday to Friday."0 15 10 ? * 6L 2011-2017"
indicates that you want to schedule the processor to run at 10:15 AM, on the last Friday of every month, between 2011 and 2017.
For more information on Quartz format, see https://www.quartz-scheduler.org/documentation/ -
Click Save.