View and Edit a Dataset Dictionary

A dataset's dictionary in LityxIQ is a list of all the variables (fields) in the dataset, their names, and for each variable, the type of data it contains. 

For datasets created from raw data sources, the dictionary is most often created automatically by LityxIQ by analyzing the data source and automatically determining the variables and data types.  However, you always have the ability to change the dataset dictionary.  This includes changing variable names and data types.

For Derived Datasets, the dictionary is always automatically created by LityxIQ based on the context of how the variables are created.  In this case, you cannot manually edit the dataset dictionary, but can always view it.

For any dataset, you can view or edit the dictionary from the Selected Dataset menu.  In the case of derived datasets, you will only have the option to view the dictionary (not edit).

 

For raw datasets, you can also view or edit the dictionary from within the Define Source Settings dialog, as explained here: https://support.lityxiq.com/261835-Define-the-Source-Settings-for-a-Raw-Dataset.

Whether editing or simply viewing, the dialog that appears looks like the following:

This list of variables in the dataset can be scrolled, paginated, and sorted (by clicking on column headers).  The variables in the dictionary always have a defined ordering, which is determined by the Number column in the dictionary list, regardless of how you sort this list.  The columns of the dictionary list are explained below:

Number - the order of the variable in the dataset, starting with 0.  For raw datasets, this also reflects the expected order of the data in the raw data source itself.  For example, when importing a delimited CSV file, the variable with Number 0 is expected to be the first field found in the file.

Variable Name - the name of the variable as it will be shown to you, and as you will reference it, within LityxIQ.  When a dictionary is first automatically created from a raw data source, the variable name typically is created from the source itself (for example, the header row of a CSV file, or the field name in a SQL table).  See the following document for restrictions on variable names in LityxIQ: https://support.lityxiq.com/378119-Variable-Names.

Data Type and Format - the type of data stored in the variable.  See https://support.lityxiq.com/241966-Data-Types for more information.  Additionally, the type "ignore" can be use for raw datasets to indicate that the field should not be imported.  The "ignore" variables are still included as a part of the data dictionary in that case, since the dictionary is a mapping from the raw data source to LityxIQ.

Analysis Role - the role the variable is expected to play in analysis performed on the dataset.  Within LityxIQ, the chosen role will dictate how the variable will be allowed to be used in the analysis.  For example, the "continuous" role will allow a variable to be used as a predictor in predictive models, but may limit its use in aggregating the dataset.

Prefix, Suffix, Decimal Places - these related strictly to how the variable's values are displayed in results within LityxIQ.

 

For raw datasets, the dictionary list is in edit mode.  In this case, you can click in any cell (except in the Number column) to change its value.  For example, you can click the name of the variable and manually type a new name.  Clicking in a Data Type cell will provide a dropdown list of possible data types, which in turn populates the possible Formats and Analysis Roles that you can specify.  To save your changes after editing the dictionary, click the Save button, or click Cancel to cancel any changes you have made.