Here is a simple example of how new field aggregations work. This will give a few different possibilities for defining new fields this way.
Let's say we have a simple dataset with three variables: Name, State, and Age. The State "MD" has 5 rows in the dataset, while the State "VA" has 3 rows in the dataset. See below:
We want to create four new fields on this dataset using the concepts of New Field Aggregation. Conceptually, we want the following new fields:
- "Average Age By State" - for each row, we want to create a field that has the average value of age for that row's State.We want to define four new field aggregations:
- "Cumulative Average Age By State" - for each row, we want to create a field that has the average value of age only considering ages less than or equal to that row's Age, and in the same State as the row.
- "Next Youngest Age" - for each row, we want to create a field that has the Age of the next youngest person, considered across the entire dataset (not within States). This is the lagged value of age.
- "Name of the Third Youngest Person in the State" - for each row, we want to record the name of the third youngest person in the state.
Translating this into concepts related to defining new field aggregations, we can think about these new fields as in below:
The resulting dataset after applying these New Field Aggregation definitions would look like the following:
Some explanations:
- Note that for Average Age By State and for Name of the Third Youngest Person in Each State, the resulting value is the same for all rows in the State. That is due to the fact that window of operation was "all rows in the partition".
- There are NULL values in the Next Youngest Age variable because for the first row in each State (the youngest person), there is no prior youngest person.
- Note that if there are ties in the ordering variable, there is no pre-defined way that the ties will be broken. To have more control in this case, we can specify additional ordering variables to break ties. For example, two people have Age=30 in MD. We could specify say Name as the second ordering variable, which would serve to have a unique way to break that tie which we can count on each time we execute this.
For help with defining new field aggregations in the LityxIQ interface, see https://support.lityxiq.com/777879-Create-a-New-Aggregation-Field.