Define the Source Settings for a Raw Dataset

To import a file or other external data into LityxIQ (to create a LityxIQ dataset), you must define the source settings for the dataset.  The dataset in LityxIQ is referred to as a Raw Dataset because it points to and imports data from an external raw data source.  That data source may be a file on an FTP site or S3, a database connection, a file uploaded directly into LityxIQ using the File Manager (see https://support.lityxiq.com/125826-Uploading-a-File), or data in a CRM system (among other possibilities).  If you plan on importing data from an external location, you will first need to setup a Connection to that location in LityxIQ (see https://support.lityxiq.com/277108-Data-Connections).  Now you are ready to begin setting up the raw dataset source settings.

0) If you haven't already created the new raw dataset, see https://support.lityxiq.com/319229-Create-New-Dataset

1) Select the dataset in the dataset list, then select Define Dataset Source from the Selected Dataset menu or after right clicking on the dataset.



2) The Source Definition dialog will open.  The settings on these tabs allow you to control how the data is imported as well as some post processing and advanced options.

 

Data Source Tab




The Data Source tab controls the main settings for the imported data source.  The settings available are dependent on the type of data source and/or file being imported.  The starting point is the Location box.  The options available here will always include File Manager, and will also include any Connections (FTP, Database, etc) that you have created or have permission to access.  Select from this list the location of the data you wish to import.

The tab also includes an ability to Preview the raw data prior to fully importing it.  After ensuring all of the settings on this tab are correct, click Preview Data Source to initiate a preview.  Note that not all data sources or types may be supported for previewing. 

All other settings on this tab are specific to the type of data you are importing.  Here are links to a variety of data and file type examples that describe these settings in more detail.

 

Dictionary Tab

The dictionary tab contains important settings that describe the data in the data source.  The dataset dictionary is a listing of the variable names and data types that are to be found in the data source and to be imported to LityxIQ.  The method for creating that dictionary is controlled by the Dictionary Load Options setting.  See below for more details.

Dictionary Load Options - this options controls how the dictionary will be created.  There are three methods available:

  • Create Automatically from Data - with this option, LityxIQ will look at a snapshot of the data in the data source and automatically determine the variable names and types using an intelligent AI process. 
    • With this option, the dataset dictionary as defined in LityxIQ will remain static over time.  If the raw data source changes (e.g., variables added or deleted, or have different data types), the LityxIQ dictionary will not change.  This may cause issues with future imports of the data.
    • However, if it is likely that the data source will not change, this is the best option to ensure a consistent dictionary.
    • In situations where no variable names are provided for some or all fields, LityxIQ will create names such as Column_xyz, with xyz replaced with a number.
    • When using this option, you must also click the Create Dictionary button (or Re-create Dictionary if there is already a dictionary in place) to start the process of loading the dictionary.
  • Create Dynamically Each Import - with this option, LityxIQ will always automatically create the dictionary each time the data is imported based on a review of the data in the data source at that time.
    • This option provides a very flexible way to account for data sources that may change structure from one time to the next.
    • The number of variables and the variable names and types are all determined directly from the data source on each import.  A potential issue with this is that if the data source does not provide consistent variable names
    • With this setting, it is not necessary to use the Create or Re-create Dictionary button because the dictionary will be automatically revised on each import.
  • Use a Dictionary File - this option lets you use a specially formatted file that describes the data dictionary.  The location of the file can be any file-based data connection you have access to, or the File Manager.
    • This option provides the most control over the data dictionary.  However, it requires additional effort to create the dictionary file.
    • When using this option and selecting the dictionary file, you must then click Create or Re-create Dictionary to read the file and create the dictionary.  The dictionary file selected will not be re-read each time the dataset is imported.  It is only read upon clicking the Create or Re-create Dictionary button.

Create / Re-create Dictionary Button - this button creates the dictionary automatically using an intelligent AI process based on data in the data source.  It is described within the above options.

View/Edit Dictionary Button - this button allows you to manually change the dictionary settings after it has been loaded.  See https://support.lityxiq.com/618334-View-and-Edit-a-Dataset-Dictionary for more help.

 

Filter Tab

For more information on how to use the filter dialog, see the following article: https://support.lityxiq.com/806706-Using-the-Filter-Dialog

 

Configuration Dataset Settings Tab

Using the configuration dataset settings tab is described here: https://support.lityxiq.com/110354-Configuration-Dataset-Settings-Options.

 

Advanced Tab

The advanced tab is described here: https://support.lityxiq.com/928057-Creating-Datasets---Advanced-Tab.

 


3) Exit the dialog:

  • Click Cancel to exit the dialog and cancel any changes you have made.
  • Click Save and Close to save your settings and close the dialog.
  • Click Save and Load Data Now to save the settings, close the dialog, and immediate load the data (a final confirmation screen will be displayed prior to data loading).