This video demonstrates the steps listed below: https://www.youtube.com/watch?v=fAr6mu81N2g
The Aggregation step of a Derived Dataset includes an option to "pivot" (or "transpose") data from rows into columns of the result set. An Aggregation definition may include any number of pivoted variables, restricted only by the limit of 1600 total variables in the result set.
As an example, consider the following dataset having variables STATE, domain_name, and ZIP.
This is a small snapshot of a dataset with many others combinations of states, domains and zip codes.
Suppose we would like to aggregate this dataset by State (result will have one row per state), and at the same time, create variables in the result set for each different value of domain_name that, for each domain_name, represent two quantities:
- a "one hot" encoding binary variable that codes whether or not that domain name existed in the dataset for the given state.
- a count of the number of unique zip codes associated with the domain name in the state.
You can create this result using the Aggregation step in LityxIQ with the following settings.
On the Aggregation Variables tab, select STATE. This is no different than any aggregation defined in LityxIQ. The result leads to one row per unique state in the datset:
On the Summaries tab, create a summary definition as shown below:
In this setup, notice:
- The Use Pivot Variables checkbox is checked, and we have selected domain_name as the pivot variable. Note that you can select more than one pivot variable for each definition.
- We have also checked the Create Pivot OneHot Binaries box. This will accomplish our goal (1) above to create one binary variable for each domain that will have the value "1" if the domain appears at all in a State, and a "0" if it does not.
- We have used the standard aggregation summary functionality to request that Unique Values be counted based on the ZIP field. This accomplishes our goal (2) above. Note that one or more typical aggregations of any variable(s) can be selected here (e.g., sums and averages of five numeric fields).
- The Variable Name Template defines how the resulting variables will be named. The placeholder [@pivotvar] is replaced with the name of the pivot variable (necessary in the case of two or more selected pivot variables), and [@pivotvalue] is replaced dynamically with each unique value of the associated pivot variable.
After executing this, we get a result that looks like: