About Datasets

Datasets are the files and tables that data workers need to find and access.

A dataset is a collection of data that can be structured (known and fixed format, and schema) or unstructured (it does not fit into a spreadsheet or database and can be textual, non-textual or machine generated).

  • My Datasets tab: Gives access to the list of your Datasets.
  • All Datasets tab: Gives access to all Datasets other users shared with you in addition to yours own.

A Dataset has the following information displayed in its tabs:

Properties Displays the key attributes of the dataset such as:
  • Title
  • Description
  • Publisher
  • Descriptor datasets in the Described by section
  • Distribution in the Access section.
Access Rights Displays the list of members / User Groups that have access to the dataset along with their access rights.
Lineage Shows the lineage of the dataset. The activity from which it originates, the activities where it is an input.
Preview Provides an overview of the files (standard text, html, csv) that are below 1Mb.
Catalogs Shows the catalogs that contain this dataset.
Additional information Shows the other attributes of the dataset.
Data properties
Properties define the dataset.

You can choose one or more properties for your dataset:

  • Type: the nature of the dataset (Data schema or Ontology)
  • Creator: the agent responsible for the dataset production
  • Publisher: the agent responsible for the dataset availability
  • Issued: issuance date
  • Modified: last update date
  • Identifier: A unique identifier of the dataset
  • Landing page: a web page to access the dataset or/and additional information
  • Spatial resolution in meters: Minimum spatial separation resolvable in a dataset, measured in meters
  • Temporal resolution: minimum time resolvable in the dataset

Data Distributions

A data distribution is a specific representation of a dataset. A dataset can be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles.

A distribution has different attributes: a title, a description, and an identifier.

You can select the model, schema, ontology to which the distribution of the dataset conforms in the Conforms to field.

You can choose one or more of these properties for your distribution:

  • Identifier: ID of the dataset for a given distribution
  • Data source: allows you to select your previously created data source
  • Access URL: allows you to directly access to a web page, SPARQL endpoint, feeds etc.
  • Download URL: allows you to access the URL of a downloadable file
  • Format: Format of the distribution (if it is defined by IANA use Media type)
  • Media type: Media type of the distribution defined by IANA
  • Compress Format: The compression format of the distribution in which the data is contained in a compressed form, e.g: to reduce the size of the downloadable file.
  • Package format: The package format of the distribution in which one or more data files are grouped together, e.g: to enable a set of related files to be downloadable together.
  • Byte size: Allows you to define the size of the distribution in bytes
  • Issued: issuance date
  • Modified: last update date
  • Spatial resolution in meters: Minimum spatial separation resolvable in a dataset, measured in meters
  • Temporal resolution: minimum time resolvable in the dataset