New Field Aggregations - Function List

Below is a list of functions currently available when creating a New Field Aggregation.  These are grouped into categories:

Basic Aggregation Functions

  • Average - the average of the variable for all rows in the window.
  • Weighted Average - the weighted average of the variable for all rows in the window.  This requires also selecting a weighting variable.
  • Sum - the sum of the variable for all rows in the window.
  • Count - the count of all rows in the window (regardless of null or non-null values). 
  • Count Non-NULL - the count of all non-null values of the variable for all rows in the window.
  • Count NULLs - the count of null values of the variable for all rows in the window
  • Minimum - the minimum value of the variable for all rows in the window.
  • Maximum - the maximum value of the variable for all rows in the window.
  • Median - the median of the selected variable.  This function does not require an order variable, and does not make computations with respect to the window (all records in the partition are always considered).  This is equivalent to using the Percentile function described below and the setting "50" for 50th percentile.
  • Z-Value - the Z-value for the row, computed relative to the chosen variable, and using all rows in the window.
  • Sample Variance - the sample variance of the variable computed using all rows in the window.
  • Population Variance - the population variance of the variable computed using all rows in the window.
  • Sample Standard Deviation - the sample deviation of the variable computed using all rows in the window.
  • Population Standard Deviation - the population deviation of the variable computed using all rows in the window.
  • First Value - the first value of the variable within the ordered window.  This is equivalent to the "Nth" value and specifying N=1 as the setting.
  • Last Value - the last value of the variable within the ordered window.
  • Nth Value - the Nth value of the variable within the ordered window.  It is necessary to specify the value of N when you use this operation.  For example, setting N=4 would result in the 4th value in the ordered window being the output.
  • Pct of Total - the percent of the total of the variable for this row compared to all rows in the partition.  This function does not require an order variable and does not make computations with respect to the window.

Function for Lagging

  • Lag - this function determines the value in a row before or after the current row (based on the ordering rows in the window) based on the entered Lag amount.  If a positive lag value is entered, it looks behind the current row that many rows.  If a negative lag value is entered, it looks ahead that many rows.  For example, enter 2 to retrieve the value from the observation that is two prior to the current observation based on the specified ordering and window.  Entering "-4" will retrieve the value 4 rows after the current row based on the specified ordering and window.  Entering 0 represents the current observation.

Ranking and Distributions.

  • Row Number - determines the row number within the ordered window.  Note that if the ordering variable(s) creates tied records, the ordering of those tied records does not have a definitive ordering from one run to the next.
  • Cumulative Distribution - computes the cumulative distribution (percentile) each row in the partition, based on the ordering variables specified.  Specifically, this is computed as (count rows prior to and including the current row / total rows in the partition).  The result will be a value in the range 0 < CumDist <= 1.
  • Rank - determines the ranked value of each row in the partition, based on the ordering variables specified.  The ranking starts with Rank=1.  Tied values of the variable have the same rank, and the next rank after tied values increments by the number of tied values (there can be a gap in ranks).  For example, if two tied values have a rank of 8, the next value in order will receive a rank of 10.  See Dense Rank for a different way of dealing with tied values.
  • Dense Rank - same as Rank, except that there is no gap in ranks.  If two tied values have a rank of 8, the next value in order will receive a rank of 9.
  • Pct Rank - computes the rank percentage of each row in the partition, based on the ordering variables specified.  This function first computes the Rank for a row (as described in the Rank function above) and uses the rank to compute the Pct Rank output.  The calculation of Pct Rank for a row is (Rank - 1) / (Number Rows in Partition - 1).
  • Ntile - determines which group ("ntile") each row in the partition falls into, based on the ordering variables specified.  You will also specify how many groups (ntiles) to create.  For example, choose 4 to group the data into quartiles, and each row in the partition will be assigned one of the integers 1, 2, 3, or 4.  If you choose 10 to group the data into deciles, each row in the partition will be assigned an integer from 1 through 10 inclusive.

Percentiles

Note - for each Percentile functions, the order variables and window settings described above are not considered.  These functions operate within partitions as do all new field aggregation functions, but it then they only considers the selected variable in performing the ordering and computation.

  • Percentile Continuous - Computes the specified percentile of the variable within the partition based on a continuous distribution.  For example, if 25 is entered as the percentile, the result for each row in the partition will be the interpolated value of the variable that falls at the 25th percentile of the ordered records in the partition (ordered based on the variable selected)..
  • Percentile Discrete - Computes the specified percentile of the variable within the partition based on a discrete distribution.  For example, the 25th percentile is computed as the value of the selected variable for the record that is the first ordered record above the 25th percentile.