Create a New Aggregation Field with Examples

The new aggregation field function will provide an aggregated value for every record in an existing dataset.  This is a distinct feature as opposed to the “aggregate” function, which can change the number of fields and number of records in an existing dataset.  See https://support.lityxiq.com/192161-New-Field-Aggregations---Concept-and-Comparisons for a more complete overview of New Field Aggregations.

1) In Data Manager select the dataset you wish to edit, and under the Selected Dataset tab click Edit Settings.
 

2) Under the New Fields panel, select Create New Aggregation Field.  Or, if as in the screenshot below (notice the first new field definition listed), you already have defined a new aggregation field, click its Edit icon.

 

3) There are two tabs with options for the new aggregation field.  The concepts underlying the available options are explained in detail here: https://support.lityxiq.com/192161-New-Field-Aggregations---Concept-and-Comparisons.

 

Partition and Window Tab

  • Choose Partitioning Variable – Select any number of partition variables, including the option to not select any.  Each unique combination of values of the selected partition variables will be treated completely independently in the computations.  If no variables are selected, the entire dataset is considered a single partition.
  • Select Ordering Variable – This will change the sort order of your output dataset.  You can sort using up to three variables.  The first variable is the primary sorting variable, while the second and third if used will break ties.  Most aggregation functions on the second tab require a sorting variable to be selected.
  • Ascending Order – Check this box if you want to have the aggregations sorted in ascending order of your variable value.  Keep it unchecked if you want it sorted in descending order.
  • Nulls Ranked Low - Checking this box will consider null values as the smallest possible values when ordering by that variable.  Note that if Ascending Order is checked, this means they will be first in the sorted list of values.  If Ascending Order is not checked, they will appear last in sorted order.
  • Window – This designates how the aggregation is represented in the new aggregation variable by record, based on the distinction within the partitioning variable.  There are three choices:
    1. “Always Use All Records in Partition” – This will generate identical values for all records within the partition grouping
    2. “Up to and Including Current Row” – This will compute the aggregation function output for a row based all rows in the partition up to and including the current row, considering the rows as sorted according to the chosen ordering variables.
    3. “Custom” – This option allows you to designate custom aggregation windows relative to the "current row".

 

Aggregations Tab

  • Select Variables on which to Compute Summaries - this drop-down selection box allows you to designate the variable(s) to be summarized.  Select one or multiple variables.  Some aggregation types (like Row Number) do not require a variable to be selected.
  • Select Aggregations to Compute - this drop-down selection box allows you to select the aggregation function(s) to apply.  A list of the available functions and their meaning can be found in the article: https://support.lityxiq.com/081990-New-Field-Aggregations---Function-List.
  • If the Weighted Average Aggregation Function was selected, you will also be asked to select one or more weighting variables.
  • New Variable Name Template - Provide a name template for the variable that results from applying an aggregation function to a variable.  If you have selected multiple fields to aggregate, the template code [@var] must be used somewhere in this name. If you have selected multiple aggregation functions, the template code [@stat] must be used somewhere in this name. If you have selected multiple weight/correlation variables, the template code [@wtvar] must be used somewhere in this name.  These pieces of template code will get replaced by appropriate variables, functions, or weight variables in concocting the resulting variable names that will be created.
  • Some aggregation functions require additional settings, such as Nth Value and Percentile.  If any of these are selected, additional options become available.  The example above shows the use of the "Percentile-Continuous" function, which requires entry of the percentile.
  • Once set, click the Save These Changes button at the top of the panel.  The settings for this set of aggregation variables and functions will appear in the left side of the window along with any other sets you have created.  The left panel also allows you to delete or edit existing definitions.
  • When finished, click Save at the bottom of the panel to save everything and return to the main View settings dialog.
  • Note that all of the aggregation variables defined on this tab will be created based on the same partition, ordering, and window settings on the first tab.  To create new field aggregations based on a different set of partition, ordering, or window settings, just start with another click of the "Create New Field Aggregation" button in the New Fields area of the derived dataset.

 

EXAMPLES

Here is a sample input file we will use to illustrate the results of this Lityx function. 

 

If we are looking to generate the total of all sales by territory, ordered by territory and using all records in the partition as the aggregation “window” we get the following results: 

Note that the value for “Sales Count Summary” is the same for every distinct value of the “Territory” Variable. 

Now, we will change the aggregation window to “Up to and Including Current Row” in order to generate a “running count” until the Territory value changes. 

In the following example, we returned to “using all records in the partition” as the aggregation window, still using “Territory” as the partitioning variable.   We changed the ordering variable to “street”.   This just changes the order of the records in the output dataset, while the summary count computation level (Territory) remains the same since Territory was designated as the portioning variable:

 

Finally, here is an example of how the user needs to consider the importance of the portioning variable.  If we switch the portioning variable to “Store Street” – note how streets in different locations are now grouped, since, for example, there is a “State” street in both Chicago and Boston.