Import Data from a JSON File

JSON files provide a platform-independent way of specifying complex structured or unstructured data.  You can learn more about the JSON file structure in many places on the web, including https://towardsdatascience.com/an-introduction-to-json-c9acb464f43e.

A JSON file can be import into LityxIQ similar to other types of files. Whether creating a new Raw Dataset (https://support.lityxiq.com/319229-Create-New-Dataset) or editing an existing one (https://support.lityxiq.com/261835-Define-the-Source-Settings-for-a-Raw-Dataset), the options you will use are on the Data Source tab. After selecting the Location of the file, and the file itself, ensure the File Type is set to JSON. A number of options are then available for specifying how the data in the JSON file is to be interpreted by LityxIQ.  These options are described in detail below.</p>

As always, the Preview Data Source button will allow you to get a snapshot of the dataset. This is useful to understand how your JSON settings will be interpreted by LityxIQ.

 

JSON Format

This option allows you to specify the format of the JSON document.

  • JSON - This is the default format and should be used in the majority of cases.
  • JSONRows - This is a specific format in which data is returned in a relational format consisting of rows of data contained within primitive arrays. Column information is returned as well in a separate array. Note: Data Model does not apply when using this JSONFormat. An example of a JSONRows formatted file:

    { "dataset": { "column_names": ["Name","Age","Gender"],
    "data": [["John Doe", 37, "M"], ["David Thomas", 25, "M"]] } }

    The JSONPath property requires special syntax to identify the column and row paths. The syntax consists of specifying a path for each using a "column:" and "row:" prefix. Using the example above, the JSONPath would be set to: column:$.dataset.column_names;row:$.dataset.data. In the case that columns are returned in an object with additional data, an additional "columnname:" prefix can be specified to identify the path to the value containing the column name. For example:

    { "columns":[{"name":"first_name","type":"text"},{"name":"last_name","type":"text"}], "rows": [["John","Doe"],["David","Thomas"]] }

    In this example, JSONPath would be set to: column:$.columns;columnname:$.columns.name;row:$.rows
  • LDJSON (Line-Delimited JSON) - This format is used to parse line-delimited JSON files (also known as NDJSON or JSONLines). Line-delimited JSON files contain a separate JSON document on each line. An example LDJSON File might look like:

    {"Name": "John Doe", "Age": 37, "Gender": "M"} {"Name": "David Thomas", "Age": 25, "Gender": "M"} {"Name": "Susan Price", "Age": 35, "Gender": "F"}

    In this case, the JSONPath value is treated the same as when using the regular JSON format. The only difference is that the root path ($.) is always used (therefore treating all the lines of JSON as it is contained within an array). In the above example, the JSONPath will be "$.", which will return 3 rows containing the columns: Name, Age, and Gender.

 

JSONPath

This setting specifies the JSONPath of an array element that defines the separation of rows. It is the path of an element that repeats at the same height within the document (used to split the document into multiple rows). LityxIQ automatically finds the object arrays in the document and models them as rows. This parameter allows you to explicitly define the object arrays using XPaths in more complex situations. Note that multiple paths can be specified using a semicolon-separated list.

$.people;$.people.vehicles;$.people.vehicles.maintenance;

The Data Model setting governs how the nested object arrays are modeled as tables. A wildcard Path can also be used and is helpful in the case that the Paths are all at the same height but contain different names: $.feed.*

Data Model

This setting specifies the data model to use when parsing JSON documents and generating the database metadata. The driver splits JSON documents into rows based on the objects nested in arrays. Select a Data Model configuration to configure how the driver models nested object arrays into tables.

  • Document - Returns a single table representing a row for each top-level object. In this data model, any nested object arrays will not be flattened and will be returned as aggregates. Unless an JSONPath value is explicitly specified, the driver will identify and use the top-most object array found as the XPath.
  • FlattenedDocuments - Returns a single table representing a JOIN of the available documents in the file. In this data model, nested JSONPath values will act in the same manner as a SQL JOIN. Additionally, nested sibling JSONPath values (child paths at the same height), will be treated as a SQL CROSS JOIN. Unless explicitly specified, the driver will identify the JSONPath values available by parsing the file and identifying the available documents, including nested documents.
  • Relational - Returns multiple tables, one for each JSONPath value specified. In this data model, any nested documents (object arrays) will be returned as relational tables that contain a primary key and a foreign key that links to the parent table. Unless explicitly specified, the driver will identify the JSONPath values available by parsing the file and identifying the available documents (including nested documents).

 

Qualify Columns

This setting controls whether the driver will use relative column names. By default the driver will only qualify a column name as much as is necessary to make it unique. For example, in this document the driver will produce the columns "id" (referring to the company id) and "employee.id".

{ "company": [{ "id": "Smith Holdings", "employee": [ {"id": "George Smith"}, {"id": "Mike Johnson"} ] }] }

  • When this option is set to Parent, the driver uses a similar procedure to the one above. However, the driver will always qualify columns by one level so that their table name is included, even if the column name is unique. For example, the above document would generate the columns company.id and employee.id because both are unique when including their parent.
  • When this option is set to Full, the driver will qualify all column names with their full XPath. This generates longer column names but ensures that it is clear where each column name comes from within the document. For the example above, the driver would generate the columns json.company.id and json.company.employee.id.

 

Flatten Objects

 

  • Check the Flatten Objects box to flatten object properties into columns of their own. To generate the column name, the driver concatenates the property name onto the object name with a dot. For example, you can flatten the nested objects below at connection time:

    [ { "grade": "A", "score": 2 }, { "grade": "A", "score": 6 } ]

    When Flatten Objects is checked and Flatten Arrays (see below) is set to 1, the preceding array is flattened into the following table:

    Column Name Column Value grades.0.grade A grades.0.score 2

  • If the Flatten Objects box is unchecked, objects nested in arrays are returned as strings of JSON instead of their own columns in the dataset.

 

Flatten Arrays

By default, nested arrays are returned as strings of JSON. The Flatten Arrays setting can be used to flatten the elements of nested arrays into columns of their own. Set Flatten Arrays to the number of elements you want to return from nested arrays. The zero-based index is concatenated to the column name. Other elements are ignored. This is only recommended for arrays that are expected to be short. For example, you can return an arbitrary number of elements from an array of strings:

["FLOW-MATIC","LISP","COBOL"]

  • When FlattenArrays is set to 1, the preceding array is flattened into the following table:

    Column Name Column Value languages.0 FLOW-MATIC

  • Setting FlattenArrays to -1 will flatten all the elements of nested arrays.

 

Table

When a Relational data model is selected, use this to enter the name of the table you want to import. You can retrieve the list of available tables by clicking on the View Tables button.

Offset

You can specify an Offsetfrom where to start returning data. Use 0 to begin importing with the first row (meaning, no offset).

Row Limit

Enter the maximum number of rows to import. Enter 0 to import all rows.