Define Finalization Settings for a Derived Dataset

The Finalize and QC step in a derived dataset is the last step in the derived dataset processing.  It is not required, but provides a wide range of functionality to complete the processing steps.  It includes the ability to:

  • Drop or Re-order the variables in the dataset before it is saved.
  • Add QC rules that are checked prior to the dataset processing being finalized.
  • Add a row number field to the dataset.
  • Add special settings related to configuration variables.
  • Other advanced settings.

The final step when creating a derived dataset is called the Finalize and QC step.  It is optional, but provides important functionality.  To edit the finalization settings, follow these steps.

1) Edit the derived dataset, and open the Finalize and QC panel.  The first time doing this for a derived dataset, it will look like below:

The panel displays any settings currently defined for the finalization and QC steps.  Clicking the Delete button will completely remove any current settings, if there are any.  Clicking the Edit button will open the dialog to edit the settings, which are described in more detail below.  The Setup Automated Insights button is explained in more detail here:


2) After clicking the Edit button, the Finalize dialog will open.  Each tab is explained below.


Drop and Reorder Fields Tab

This tab allows you to remove fields from the resulting dataset.  This is a good method for cleaning up a dataset to remove variables that might have been temporarily used along the way, or other unnecessary variables.  Simply uncheck any variable that you wish to drop.

You can also re-order the variables using drag-and-drop, or automatically order them in alphabetical order by clicking the "Order Alpha" link.  The main effect of re-ordering variables is the display order when browsing the dataset.


QC Tab

The QC rules tab is explained further here:


Row Numbers Tab

Use the Row Numbers tab to add an additional variable to the dataset holding row numbers.  The row numbers can be created over the entire dataset, or within partitions (or groups) of the dataset.  In this case, row numbering always starts over at 1 with each new partition.

Row Number Field Name - Enter the name to give to the newly created row number variable.  Leaving it blank will not create a row number variable.  If you enter the name of an existing variable in the dataset, you will receive an error.

Row Number Partitioning - Select the variable(s) that will define partitions (groups) within which Row Numbers will be determined.  If you select no variables, row number counting continues over the entire dataset.

Row Number Ordering within Partitions - Check the fields that you want to use to order the data within partitions.  This ordering will determine row numbers within each partition, starting over at 1 within each partition. You can drag-and-drop to re-order the selected variables to define the sorting priority.  At this time, sorting will always be done in ascending order. If you do not select variables for ordering, row numbers will be assigned within partitions in an undefined fashion, so while this is allowed, it is not recommended.

Nulls First?  - Set this checkbox if you want NULL values to be ordered before all other values within each partition. If you leave it unset, NULLs will be sorted to the bottom within partitions. This option only matters if you have selected at least one ordering variable.


Configuration Dataset Settings Tab

Using the configuration dataset settings tab is described here:


Advanced Tab

The advanced tab is described here:


3) Click Save to save the settings, or Cancel to cancel your changes.