Checklist for starting data transfer
- Contact email@example.com. We are happy to discuss the following steps with you. We will also draft an information transfer agreement.
- We register your IT system to FinBIF. It is given a name and an identifier. If you will use api.laji.fi/datawarehouse/push, your access token will be linked with the IT system.
- Decide how your data will be divided into collections (datasets) in FinBIF. Each collection will need a globally unique URI identifier and metadata. We register your metadata descriptions to FinBIF or you provide an API to read the metadata in a machine readable format. (See more bellow)
- Decide what data to send to FinBIF (which occurrences, which data fields, etc). What do you want to be visible publicly as open data? What do you want to provide only for the private side of the data warehouse (usable only by government agencies, etc)?
- Decide which method you want to use to transport the data into FinBIF (push, pull, custom implementation, manual upload). How frequent data update interval do you need?
- If push or pull, decide what data format to use. This can either be one of the formats that we already support, or you can use your internal format (we will implement support to read your format). In both cases we can plan together how to best map your data into one of the existing formats or how we can best extract data from your format.
- Decide how URI identifiers for documents, occurrences, etc will be generated. (See more bellow)
- If your occurrence data contains media, these are provided with the occurrence data. (Links to thumbnails, full versions, including license, author etc. We do not receive or store the actual images, only links).
- How are you going to handle deletions? Usually it is easy to transfer inserts and updates, but keeping track of deletions and making sure FinBIF also receives them usually requires modifications to source systems.
For collections/datasets we need at least the following metadata:
- Globally unique URI identifier
- Contact person, email
- Data quality (see Data quality section of this documentation)
In addition, the following can be defined:
- Citation recommendation
- Special terms for data use
- Data embargo period (years): If you want the public version of the data to not be usable in full detail for a period of time, this can be defined in the metadata.
- Person(s) responsible for handling data requests about this dataset.
- Taxonomic coverage
- Temporal coverage
- Geographical coverage
- Coverage basis (more detailed, for example "Winter birds of Finland")
- Data quality description
- Type (one of https://schema.laji.fi/alt/MY.collectionTypes)
- Methods (such as census methods, tools, ...)
- % digitized (if some parts still for example in paper format)
- Location of data and backups
- Language (of the data)
- Url (to additional information)
Each document, gathering, unit, etc needs a globally unique URI identifier. If the original source does not provide URI identifiers, we must define how FinBIF generates the identifiers. For example we can use the identifier of the source and then add the identifier of the occurrence after it to form the identifier.
For example if the identifier of the source in FinBIF is https://tun.fi/KE.389 and the source hase numeric, increasing identifiers, a final URI for one occurrence could be https://tun.fi/KE.389/12345