ETL Data transform supported document formats

This page describes document formats supported by FinBIF Data warehouse ETL-process. New formats and features will be added on request.

For most external systems it might be best to send your data in your internal format. It is then up to us to parse your data as it best suites FinBIF DW data model.

Please contact info@laji.fi and we can advice on finer details of different formats, which are impossible to document here.

NameFormatExamplesSchemaDescription
GIS standardsXML/JSONgis-example.jsongis-vocabulary.xlsxProperties vocabulary for GIS data exchange formats
Simple Darwin Core (dwc)XML dwc.xmldwc.tdwg.org/termsSimple format; can be freely extended
DwC-ACSVdwca-example.csvdwc.tdwg.org/terms Simple format; for large quantities of occurrences; can be freely extended
Laji-ETLJSONlaji-etl.jsonlaji-etl-swagger.jsonFinBIF DW internal data model; most robust

GIS standards

FinBIF supports (or will support when needed) parsing all common GIS standards, including GeoJSON, GML. The fields documented in the gis-example.json and gis-vocabulary.xlsx define a common vocabulary for non-geospatial related properties, like taxon name, time and other information about the occurrence.

Each dataset is using a little bit different field names and data types. If we here at FinBIF would need to map all different field names to our internal data model, that would be too much work and impossible to maintain. Instead, we'd encourage data sharing partners to provide a data product (GIS layer) that uses the field names and data types defined in the documentation.

This vocabulary can be expanded to cover more fields as new needs arise with new datasets.

Simple Darwin Record / DwC-A

Simple Darwin Recordin and Simple Darwin Record Archive can only contain "flat" data. Each occurrence is independent, and they can't form a structure (as monitoring schemes usually have).

By default, dwc does not provide many fields, but dwc:dynamicProperties contains JSON that can in theory contain anything, including polygons and FinBIF DW datamodel definitions.

Currently there is limited support on how dwc:dynamicProperties are transferred to FinBIF DW internal datamodel . These capabilities can be extended as needed. They are not documented here (at the moment).

Laji-ETL JSON

This format can be used to transfer all kinds of occurrence data to FinBIF that we have support, because it is the internal data model for FinBIF DW.

The features include

  • Ability to provide a separate version for public and private sides of the DW
  • Annotations
  • Media
  • Quality issues

One way to generate Laji-ETL JSON is to use the following Java-library. The developer transforms data of the original data source to provided Java classes, which can be serialized into JSON, ready to be send to FinBIF API.

Usage example: laji-etl-models-example-v2.java