Below is a list of functions currently available when creating a New Field Aggregation. These are grouped into categories shown below. See https://support.lityxiq.com/192161-New-Field-Aggregations---Concept-and-Comparisons for more information regarding the concept of New Field Aggregations.
Basic Aggregation Functions
- Average - the average of the variable for all rows in the window.
- Weighted Average - the weighted average of the variable for all rows in the window. This requires also selecting a weighting variable.
- Sum - the sum of the variable for all rows in the window.
- Count - the count of all rows in the window (regardless of variable or null vs non-null values). Note: This function does not require a variable to be selected, and if multiple variables are selected, the Count is computed just once.
- Count Non-NULL - the count of all non-null values of the variable for all rows in the window.
- Count NULLs - the count of null values of the variable for all rows in the window
- Minimum - the minimum value of the variable for all rows in the window.
- Maximum - the maximum value of the variable for all rows in the window.
- Median - the median of the selected variable. This function does not require an order variable, and does not make computations with respect to the window (all records in the partition are always considered). This is equivalent to using the Percentile function described below and the setting "50" for 50th percentile. Note: The Aggregation Window is not used when computing this function.
- Z-Value - the Z-value for the row, computed relative to the chosen variable, and using all rows in the window.
- Sample Variance - the sample variance of the variable computed using all rows in the window.
- Population Variance - the population variance of the variable computed using all rows in the window.
- Sample Standard Deviation - the sample deviation of the variable computed using all rows in the window.
- Population Standard Deviation - the population deviation of the variable computed using all rows in the window.
- First Value - the first value of the variable within the ordered window. This is equivalent to the "Nth" value and specifying N=1 as the setting. Note: This function requires at least one Order variable to have been chosen.
- Last Value - the last value of the variable within the ordered window. Note: This function requires at least one Order variable to have been chosen.
- Nth Value - the Nth value of the variable within the ordered window. It is necessary to specify the value of N when you use this operation. For example, setting N=4 would result in the 4th value in the ordered window being the output. Note: This function requires at least one Order variable to have been chosen.
- Pct of Total - the percent of the total of the variable for this row compared to all rows in the partition. This function does not require an order variable and does not make computations with respect to the window. Note: The Aggregation Window is not used when computing this function.
Computing Lags
- Lag - this function determines the value in a row before or after the current row (based on the ordering rows in the window) based on the entered Lag amount. If a positive lag value is entered, it looks behind the current row that many rows. If a negative lag value is entered, it looks ahead that many rows. For example, enter 2 to retrieve the value from the observation that is two prior to the current observation based on the specified ordering and window. Entering "-4" will retrieve the value 4 rows after the current row based on the specified ordering and window. Entering 0 represents the current observation. In addition, you have an option to Include Nulls or not when considering a lagged value. If you choose not to include nulls, then null values are ignored when looking for a lagged value. Note: This function requires at least one Order variable to have been chosen, and the Aggregation Window is not used when computing this function.
Row Numbers
- Row Number - determines the row number within the ordered window. Note that if the ordering variable(s) creates tied records, the ordering of those tied records does not have a definitive ordering from one run to the next. Note: This function does not require a variable to be selected, and if multiple variables are selected, the Row Number is computed just once. The Aggregation Window is not used when computing row numbers. In addition, this function requires at least one Order variable to have been chosen, as this is what determines the row ordering to create row numbers.
Ranking and Distributions
Note: The Aggregation Window is not used when computing these functions.
- Cumulative Distribution - computes the cumulative distribution (percentile) of the selected variable for each row in the partition. The ordering of the variable for the sake of computing the cumulative distribution is always ascending. Ordering variables specified in the Ordering Variables area will be used to break ties. Specifically, this is computed as (count rows prior to and including the current row / total rows in the partition). The result will be a value in the range 0 < CumDist <= 1.
- Rank - determines the ranked value of the selected variable for each row in the partition. The ordering of the variable for the sake of computing the rank is always ascending. Ordering variables specified in the Ordering Variables area will be used to break ties. The ranking starts with Rank=1. Tied values of the variable have the same rank, and the next rank after tied values increments by the number of tied values (there can be a gap in ranks). For example, if two tied values have a rank of 8, the next value in order will receive a rank of 10. See Dense Rank for a different way of dealing with tied values.
- Dense Rank - same as Rank, except that there is no gap in ranks. If two tied values have a rank of 8, the next value in order will receive a rank of 9.
- Pct Rank - computes the rank percentage of the selected variable for each row in the partition. The ordering of the variable for the sake of computing the percentage ranks is always ascending. Ordering variables specified in the Ordering Variables area will be used to break ties. This function first computes the Rank for a row (as described in the Rank function above) and uses the rank to compute the Pct Rank output. The calculation of Pct Rank for a row is (Rank - 1) / (Number Rows in Partition - 1).
- Ntile - determines which group ("ntile") the selected variable falls into for each row in the partition. The ordering of the variable for the sake of computing the Ntiles is always ascending. Ordering variables specified in the Ordering Variables area will be used to break ties. You will also specify how many groups (ntiles) to create. For example, choose 4 to group the data into quartiles, and each row in the partition will be assigned one of the integers 1, 2, 3, or 4. If you choose 10 to group the data into deciles, each row in the partition will be assigned an integer from 1 through 10 inclusive. Note that for this function, ties are broken randomly before assigning the Ntile. The end-result is that the count of records assigned to each Ntile is forced to be as equal as possible (never different by more than one within a partition).
Each of these functions allows for two additional parameters to be provided:
- Sort Ascending - By default, these functions will sort the analysis variable in ascending order. Uncheck this box to sort in descending order instead.
- Nulls Rank Low - By default, nulls in the variable are considered "large" values for the sake of ordering. Check this box to have them considered as "low" values when sorting.
Percentiles
Note - for each Percentile functions, the order variables and window settings described above are not used as a part of any calculations. These functions operate within partitions, as do all new field aggregation functions, but then they only consider the selected variable in performing the ordering and computation. In particular, the aggregation window is also not used in the computations.
- Percentile Continuous - Computes the specified percentile of the variable within the partition based on a continuous distribution. For example, if 25 is entered as the percentile, the result for each row in the partition will be the interpolated value of the variable that falls at the 25th percentile of the ordered records in the partition (ordered based on the variable selected)..
- Percentile Discrete - Computes the specified percentile of the variable within the partition based on a discrete distribution. For example, the 25th percentile is computed as the value of the selected variable for the record that is the first ordered record above the 25th percentile.