# Flatten Lens

Flatten lenses are supported since 5.1.0.

Flattening or unnesting an array is the process of transforming a nested array into an array of lower dimensionality, by "pulling" each nested entry into its "outer" entry. In databases, specifically, it represents a function that takes a column containing an array and transforms it into a table that has each of the outer-most elements as one of its rows.

Flattening is need when we want to map values inside arrays.

In Ontop, a flatten lens is a type of lens over a single base relation that takes as an input the column that should be flattened, the names of columns that should be retained after flattening, and the name of the column that should be added containing the flattened output. Additionally, the name for a position column can be passed to the lens. When flattening, the position column will hold the index of the current row's flattened element in the original array.

For this example, we will first look at the table workers. This table has the following schema:

column type
worker_id integer
full_name string
role string
museum_id integer
titles array of strings
access_level integer

The column titles is an array of strings that contains all the degrees and titles the employee has received. Our goal is to populate the datatype properties :hasTitle defined from the :Worker class to any number of string literals taken from the table, and :preferredTitle, defined from the :Worker class to a single literal, which is the first entry of the titles array. To do so efficiently, we need to flatten the titles column.

The flatten lens has the following structure:

{
    "name": [String],
    "baseRelation": [String]
    "flattenedColumn": {
        "name": String,
        "datatype": String
    },
    "columns": {
        "kept": [String],
        "new": String,
        "position": String
    },
    "type": "FlattenLens"
}

The field kept takes a list of column names from the original table that should be retained after flattening. The field new takes the name of the column that contains the flattened output and the field position takes the name of the column that contains the index of the current output. The flattened column takes two arguments: name determines the name of the column that should be flattened, and datatype indicates the column's type. Depending on the SQL dialect, this can vary from ARRAY<T> or T[] where T is a different SQL data type to JSON or VARCHAR if arrays are represented and flattened in JSON format.

In DuckDB, the array data type is defined as T[], so in our specific case, the field datatype will take the value STRING[].

A possible lenses.json file for this task may look like this:

{
    "relations": [
        {
            "name": ["lenses", "flattened_titles"],
            "baseRelation": ["workers"],
            "flattenedColumn": {
                "name": "titles",
                "datatype": "STRING[]"
            },
            "columns": {
                "kept": [
                    "worker_id"
                ],
                "new": "title",
                "position": "index"
            }
        }
    ]
}

As we just need to assign a title to a :Worker individual, the only column that has to be kept is worker_id.

# Mapping

Now, we can already define the mapping entries for the wanted properties:

mappingId	MAPID-has-title
target		data:worker/{worker_id} :hasTitle {title} .
source		SELECT worker_id, title FROM lenses.flattened_titles;
mappingId	MAPID-preferred-title
target		data:worker/{worker_id} :preferredTitle {title} .
source		SELECT worker_id, title FROM lenses.flattened_titles WHERE index = 1;

NOTE

Instead of using a source query with a WHERE condition in the second mapping, we could also wrap a basic lens around the flatten lens that performs the filter operation, but we kept it like this for the sake of simplicity.

Let us now run a SPARQL query over the Ontop endpoint to test our mappings and lenses.

PREFIX : <http://example.org/museum_kg/>
SELECT ?name ?title ?prefTitle WHERE {
    ?worker a :Worker .
    ?worker :name ?name .
    ?worker :hasTitle ?title .
    ?worker :preferredTitle ?prefTitle
}

This should result in a list of all employees, together with their titles and their preferred title.

# Flattening other types of arrays

The flatten lens works on all types of arrays, and, depending on the capabilities of the dialect, can infer the output type of the flattened column. As a further exercise, try looking at the table museums.

column type
museum_id integer
name string
address string
yearly_income integer
yearly_spendings integer
ratings array of floats

The field ratings is an array of floating point numbers between 1 and 10. Try creating a flatten lens that can unnest this array to populate the datatype property :hasRating! Ontop will be able to automatically detect that the flattened column has the type FLOAT. Keep in mind that we do not necessarily need a "position" column for this use case.

WARNING

When arrays in a given dialect are defined as ARRAY<T>, T[], or similarly, Ontop is able to infer the data type of the output column after flattening. However, when this is not the case (either the array was provided as JSON or the array data type of the dialect is simply called ARRAY or similarly), Ontop cannot perform this inference. In those cases, it is suggested to put a basic lens over the flatten lens that explicitly performs a CAST on the output, to allow Ontop to know the column type once again.

WARNING

The level of support for the flatten lens depends strongly on the dialect. Please consult the flatten lens documentation page for more info on each supported dialect.

Last Updated: 11/26/2024, 1:08:23 PM