#
Data glossary
- Ingest
- the process of bringing your data sources, through files or an API, into the Elasticsearch database at the heart of CIIM.
- Data integrity
- the state of your data being consistent with itself, and with the indexes that have been created to search it. (It doesn’t mean that your records are correct about the objects that they describe). If there’s a problem with your data integrity, there may be an error at one of the stages of your ingest.
- Record
- the set of data that describes an object or item in your collection. A search will create a set of records which can be filtered
- Field
- one part of a record that holds a single aspect of information about the object or item. Fields can be free text (like the title of an object), unique (like a collections identification number) or populated with terms from a pre-defined set (like the material that an object is made from).
- Template
- a way of presenting data from a record, selecting and grouping the fields to be presented, so that details of the record can be quickly scanned.
#
The four stages of an ingest
- 1. Extraction
- when CIIM takes the data out of your collections systems and databases, through an API, or a file upload.
- 2. Processing
- when CIIM examines the data from your collections systems and databases, and checks that it’s correct, and maps to the fields in your public collections search that you have defined.
- 3. Indexing
- when CIIM uses Elasticsearch to create an internal guide to your data, that allows it to search and return results very fast (a bit like Google)
- 4. Publication
- when CIIM makes your data available online, for your website users to search and retrieve.
#
Data statuses
Each individual record has a status which indicates whether it has been validated and published or not
- Public
- a record which passes validation, and is either an object and is expected to be in all endpoints.
- Unavailable
-
records which have either been :
- deleted automatically by CIIM processes
- found as a reference in another record by CIIM but without a real representation of the record.(In each of these cases, a stub is published to a special 'unavailable' index in endpoints so that it can be identified.)
- Invalid
- records which have failed a CIIM validation criteria and are never published.
- Retained
- records which have passed validation but are not primary data items (usually objects) and do not link to any other published primary data items.
- Recalled
- records which have been subject to a manual recall (usually because of sensitive or offensive data) and are never published.
- Private
- records which meet privacy rules and are never published.
- Draft
- records which have been created in the CIIM User Interface (usually related to bespoke functionality) and are never published.