Appendix - About Processing

You can write expressions to process your data before indexing. Data Factory Studio applies these expressions to each line of the source file, like a map function.

Important:

Source files have to contain a header. Data Factory Studio uses this header to map columns with index properties.
Expressions do not support the Geometry.ofWkt() and Geometry.toWkt() builtin functions.
To insert comments in your expression, use /* */ and not //

This page discusses:

Expression Input
Expression Output
Examples

Expression Input

The record object represents the current source file line, using headers as variables, and cells as values. It is a Named Tuple containing each value of the source file line as String. For more information, see Expression Language.

For example, with the following csv:

name, age, status
Alice, 20, single
Bob, 40, married
Camille, 70, widow

Data Factory Studio calls the expression 3 times, and record successively holds the following values:

{name: "Alice", age: "20", status: "single"}
{name: "Bob", age: "40", status: "married"}
{name: "Camille", age: "70", status: "widow"}

In this example, you can access the age value with record.age.

Expression Output

Your expression must return a Tuple matching the targeted Index Unit data model. For more information, see Expression Language.

Note: The attribute order does not matter. Only attribute names are relevant. For example, {x: 123, y:456} is totally equivalent to

{y:456,
        x:123}

Supported Types for the Output Tuple Elements


Type	Support
`NULL`	Yes
`STRING`	Yes
`TEXT_MATCH`	Yes
`HIERARCHICAL_STRING`	No
`BOOLEAN`	Yes
`DECIMAL`	Yes
`FLOAT`	Yes
`INTEGER`	Yes
`DATE`	Yes
`DATE_TIME`	Yes
`LOCAL_DATE_TIME`	Yes
`PERIOD`	Yes
`DURATION`	Yes
`TYPE`	No
`UNIT`	No
`BINARY`	No
`GEOMETRY`	No
`FUNCTION`	No
`ZONE_OFFSET`	No

`LIST`	Yes
`TUPLE`	No
`SET`	Yes
`MAP`	No

`ITEM`	No
`DICTIONARY_CODED_STRING`	No

`TYPE_PARAMETER`	No
`GRAPH_TYPE_PARAMETER`	No

Examples

Parsing Dates and Time

Input data:

a_date, a_datetime, a_time
01/01/2020, 01/01/2020 20:15:36, 20:15:36
01/01/1970, 01/01/1970 00:01:01, 00:01:01
29/12/1969, 29/12/1969 23:59:59, 23:59:59

You can parse dates with a date format other than ISO 6801, with an expression like:

{a_date: Date.parse(text: record.a_date, format: "dd/MM/yyyy"), 
a_datetime: LocalDateTime.parse(text: record.a_datetime, format: "dd/MM/yyyy HH:mm:ss"), 
a_time: LocalDateTime.parse(text: "01/01/1970 "+record.a_time, format: "dd/MM/yyyy HH:mm:ss")}

Parsing Lists

Input data:

name, neighbors, neighbors_distance
alice, bob|camille, 123;456
bob, david|camille|alice, 147;258;369
camille, ,
david, emilie, 741
emilie, bob|david, 159;753

You can process this data with the following expression:

{...record, neighbors: record.neighbors IS NULL ? [] AS List<String>: 
record.neighbors.split("|"), neighbors_distance: record.neighbors_distance IS NULL ? [] AS List<Integer> : 
record.neighbors_distance.split(";").map(function: x => Integer.parse(value: x))}

Parsing Lists with Trimming and Unquoting

Input data:

name, neighbors
alice, "bob"| camille
bob, david | "camille" | alice
camille, 
david, "emilie"
emilie, "bob"| david

You can process this data with the following expression:

{...record, neighbors: record.neighbors IS NULL ? [] AS List<String> : record.neighbors.split("|").
map(s => s.trim()).map(s => s.endsWith("\"") AND s.startsWith("\"") ? s.substring(1, - 1) : s )}

Simple Cleaning

Input data:

name, age, status
Alice, 20, Single
Bob, 40, married
Camille, 70, Widow
David, 42, single
Emilie, 25, not married

Example of cleaning expression for this CSV file:

{...record, status: record.status.toLowerCase().replace("not married", "single")}