Import Data from an XML File

XML files provide an unstructured and potentially very complex way to represent data.  You can learn about XML files in many places on the web, including https://www.w3schools.com/xml/.

An XML file can be import into LityxIQ similar to other types of files.  Whether creating a new Raw Dataset (https://support.lityxiq.com/319229-Create-New-Dataset) or editing an existing one (https://support.lityxiq.com/261835-Define-the-Source-Settings-for-a-Raw-Dataset), the options you will use are on the Data Source tab. After selecting the Location of the file, and the file itself, ensure the File Type is set to XML.  A number of options are then available for specifying how the data in the XML file is to be interpreted.  These options are described in detail below. 

As always, the Preview Data Source button will allow you to get a snapshot of the dataset.  This is useful to understand how your XML settings will be interpreted by LityxIQ.

 

 

XML Format

Specifies the format of the XML document.

  • XML
    • This is the default format and should be used in the majority of cases.
    • The element or attribute name containing each value is used as the column name for that value.
    • XMLTable
      • This format is for when the column name is separate from the data contained in that column.
      • This is useful when the element or attribute containing the data has a generic name (like "Value") instead of being specific to each column.
      • Note: Data Model does not apply when using this XMLFormat.
      • Example:

        <Report>
         <Table>
             <Row>
                <Value label="Customer">Mark Rodgers</Value>
                <Value label="SupportCost">89.28</Value>
                <Value label="ContractValue">299.99</Value>
             </Row>
          </Table>
        </Report>

XPath

  • This represents the Path of an element that repeats at the same height within the document (used to split the document into multiple rows).
  • The value of this option depends on the current XMLFormat. By default, LityxIQ automatically finds the object arrays in the document and models them as rows. This parameter allows you to explicitly define the object arrays using XPaths.
  • Multiple paths can be specified using a semicolon-separated list. For example: /people;/people/vehicles;/people/vehicles/maintenance;
  • A wildcard Path can also be used and is helpful in the case that the Paths are all at the same height but contain different names: /feed/*
  • When using the XMLTable XMLFormat, this option uses a special format. Each path is given with a prefix depending on the type of path. All of the following are required:
    • row: This is the XPath of the element that contains each row's data. Example: row:/Report/Table/Row
    • name: This is the XPath of the element containing the column name. Example: name:/Report/Table/Row/Value@label
    • value: This is the XPath of the element that contains each column's data inside the row. Example: value:/Report/Table/Row/Value
    • Each of these paths is separated by a semicolon, so the complete XPath for the above example is: row:/Report/Table/Row;name:/Report/Table/Row/Value@label;value:/Report/Table/Row/Value

Data Model

Specifies the data model to use when parsing XML documents and generating the database metadata. The driver splits XML documents into rows based on elements that repeat at the same level. By default, the driver projects columns over the properties of objects and returns arrays as XML aggregates.

An Object is any parent element that does not repeat at the same height. In the following example, maintenance is an object array, since each maintenance node has child elements

<maintenance>
  <date>07-17-2017</date>
  <desc>oil change</desc>
</maintenance>
<maintenance>
  <date>01-03-2018</date>
  <desc>new tires</desc>
</maintenance>

An Array is any element that repeats at the same height. In the following example, jobs is an Array:

<jobs>sales</jobs>
<jobs>marketing</jobs>

The options available for the Data Model are:

  • Document - Returns a single table representing a row for each top-level object. In this data model, any nested object arrays will not be flattened and will be returned as aggregates. Unless an XPath value is explicitly specified, the driver will identify and use the top-most object array found as the XPath.
  • FlattenedDocuments - Returns a single table representing a JOIN of the available documents in the file. In this data model, nested XPath values will act in the same manner as a SQL JOIN. Additionally, nested sibling XPath values (child paths at the same height), will be treated as a SQL CROSS JOIN. Unless explicitly specified, the driver will identify the XPath values available by parsing the file and identifying the available documents, including nested documents.
  • Relational - Returns multiple tables, one for each XPath value specified. In this data model, any nested documents (object arrays) will be returned as relational tables that contain a primary key and a foreign key that links to the parent table. Unless explicitly specified, the driver will identify the XPath values available by parsing the file and identifying the available documents (including nested documents).

 

Qualify Columns

This controls whether the driver will use relative column names. By default the driver will only qualify a column name as much as is necessary to make it unique. For example, in this document the driver will produce the columns id (referring to the company id) and employee.id:

<company>
<id>Smith Holdings</id>
<employees>
<employee>
<id>George Smith</id>
</employee>
<employee>
<id>Mike Johnson</id>
</employee>
</employees>
</company>

  • When this option is set to Parent, the driver uses a similar procedure to the one above. However, the driver will always qualify columns by one level so that their table name is included, even if the column name is unique. For example, the above document would generate the columns company.id and employee.id because both are unique when including their parent.
  • When this option is set to Full, the driver will qualify all column names with their full XPath. This generates longer column names but ensures that it is clear where each column name comes from within the document. For the example above, the driver would generate the columns company.id and company.employees.employee.id.

Flatten Objects

Check the Flatten Objects box to flatten object properties into columns of their own. Otherwise, objects nested in arrays are returned as strings of XML. For example, you can use this property to flatten the nested objects below:

<grades>
<grade>A</grade>
<score>2</score>
</grades>
<grades>
<grade>A</grade>
<score>6</score>
</grades>

To generate the column name, the driver concatenates the property name onto the object name with a dot. When Flatten Objects is checked and Flatten Arrays is set to 1, the preceding array is flattened into the following table:

Column Name Column Value
grades.0.grade A
grades.0.score 2

 

Flatten Arrays

By default, nested arrays are returned as strings of XML. The Flatten Arrays property can be used to flatten the elements of nested arrays into columns of their own. Set Flatten Arrays to the number of elements you want to return from nested arrays. The specified elements are returned as columns. The zero-based index is concatenated to the column name. Other elements are ignored.

For example, you can return an arbitrary number of elements from an array of strings:

<languages>FLOW-MATIC</languages>
<languages>LISP</languages>
<languages>COBOL</languages>

When FlattenArrays is set to 1, the preceding array is flattened into the following table:

Column Name Column Value
languages.0 FLOW-MATIC

 

Setting FlattenArrays to -1 will flatten all the elements of nested arrays.

 

Push Attributes

Click the Push Attributes check box to push any identified attributes on an XML tag as columns in the resulting dataset. Attributes will also be included in any aggregates generated. When unchecked, attributes will not be pushed as columns or in aggregates.

For example, the following tag will lead to Id and Name becoming columns in the dataset if Push Attributes is checked.

<Registrant Id="14579010" Name="Nicolas G.">...</Registrant>
 

 

Table

When a Relational data model is selected, use this to enter the name of the table you want to import. You can retrieve the list of available tables by clicking on the View Tables button.

 

Offset

You can specify an Offsetfrom where to start returning data. Use 0 to begin importing with the first row (meaning, no offset).

 

Row Limit

Enter the maximum number of rows to import. Enter 0 to import all rows.