This article describes the various mathematical calculations available to perform on tables and variables, as well as the options available.
In addition to all the in-built calculations - from tables to advanced analysis - you can create your own custom variables, calculations using tables, and calculations using cells of tables.
New variables can be inserted into data sets, allowing you to create bespoke variables. These variables will automatically update when you refresh the underlying data (e.g., by recoding input variables or updating your data file).
Calculations can be performed using tables as inputs. This means there is no need to export the data into Excel for basic calculations (for more information, see Bespoke Analyses).
Examples
Displayr: How to Perform Mathematical Calculations on Tables and How to Perform Mathematical Calculations on Variables
Q: All functions can be found within the Select Automate > Browse Online Library > Calculation menu. Worked examples for variables are in How to Compute Statistics for Variables (e.g., Average, Maximum, Number of Responses)
This article contains the following sections:
-
Mathematical calculation details
- Any of > Any of, Any of Each Column, Any of Each Row
- Average > Average, Average Each Column, Average Each Row
- Count > Count, Count Each Column, Count Each Row
- Divide
- First/Last
- Maximum > Maximum, Maximum Each Column, Maximum Each Row
- Minimum > Minimum, Minimum Each Column, Minimum Each Row
- Multiply
- None of > None of, None of Each Column, None of Each Row
- Standard Deviation > Standard Deviation, Standard Deviation Each Column, Standard Deviation Each Row
- Subtract
- Sum > Sum, Sum Each Column, Sum Each Row
- Variance > Variance > Variance Each Column, Variance Each Row
- Options
- Technical details
Mathematical calculation details
Any of > Any of, Any of Each Column, Any of Each Row
This tool is used to determine if any of the entries, cells within each column, or cells within each row in tables or variable values match certain conditions:
- Are greater than, or less than, a certain number.
- Are greater than or equal to, or less than or equal to, a certain number.
- Are within a specified range of numbers.
- Are missing values (NA).
- Are identical to a specific category label.
- Satisfy a combination of conditions or ranges such as those listed above.
If variables are selected then this tool will create a new variable that tells you if any of the selected variables match the condition, for each case.
Average > Average, Average Each Column, Average Each Row
This feature can be used to calculate the average (arithmetic mean) of values in a table, the average of values between two or more tables, the average of each column or row in a table, or the average value of two or more variables for each case.
Consider the following two tables of sales data.
After selecting the two tables and running this feature, the following output will be added to the page containing the elementwise average.
By default, the row for CAN is not included in the output because there is no matching entry in the second input table and the average is still computed for the AUS - 2018 cell even though there is missing data for one input. This can be changed using the settings in the Object Inspector when the output is selected.
Count > Count, Count Each Column, Count Each Row
This tool is used to count the number of cells, cells within each column, or cells within each row in a table whose values match certain conditions:
- Are greater than, or less than, a certain number.
- Are greater than or equal to, or less than or equal to, a certain number.
- Are within a specified range of numbers.
- Are missing values (NA).
- Are identical to a specific category label.
- Satisfy a combination of conditions or ranges such as those listed above.
If variables are selected then this tool will create a new variable that counts the number of matches for each case in the data set.
Divide
This tool divides the numbers in one table or output by those in another. You may also choose either the denominator or numerator of the division to be a single number. If variables are selected, then new variable(s) will be created by dividing values in the first set by values in the second for each case.
First/Last
The First and Last functions work exactly the same, it is just whether the information being subset is at the beginning or end of the data. The following information is for First, but can also apply to Last vice versa. First: Extracts the first rows or columns (e.g. first 2 rows or first 5 columns) of selected summary tables, crosstabs or R vectors, R matrices, R arrays, R data frames, R lists or R tables, and saves them as a new R outputs. Alternatively, if the input data is labeled with dates or date-times, this feature is able to extract the first date or time periods in the data, e.g., first 3 years, first week, first 10 hours etc. A new R output is created for each selected output and this R output contains controls which allow settings to be specified (see Options section below).
Shown below are the outputs from extracting the first and then last 3 rows from a summary table of a Pick One question for Age:
Shown below is the output from extracting the first 1 calendar month from an R vector that is labeled with dates starting from November 20, 2020. If instead a non-calendar period was specified (see Calendar periods in the Options section below), all dates up to and including December 19, 2020 would be extracted.
Maximum > Maximum, Maximum Each Column, Maximum Each Row
This tool can be used to:
- Find the maximum value in a table.
- Compute the maximum value for each column or row in a table.
- Find the elementwise maximum value among matching cells in two or more tables.
- Find the maximum value for each case among two or more variables.
Minimum > Minimum, Minimum Each Column, Minimum Each Row
This tool can be used to:
- Find the minimum value in a table.
- Compute the minimum value for each column or row in a table.
- Find the elementwise minimum value among matching cells in two or more tables.
- Find the minimum value for each case among two or more variables.
Multiply
This tool multiplies the numbers in one table or output by those in another, or to multiply by a single number. If variables are selected, it creates new variable(s) which contain the multiplied values for each case.
None of > None of, None of Each Column, None of Each Row
This tool is used to determine when none of the entries in cells, each column, or each row of tables or variables match certain conditions:
- Are greater than, or less than, a certain number.
- Are greater than or equal to, or less than or equal to, a certain number.
- Are within a specified range of numbers.
- Are missing values (NA).
- Are identical to a specific category label.
- Satisfy a combination of conditions or ranges such as those listed above.
If variables are selected then this tool will create a new variable that tells you when none of the selected variables match the condition, for each case.
Standard Deviation > Standard Deviation, Standard Deviation Each Column, Standard Deviation Each Row
This tool can be used to compute the standard deviation of the numbers in a table, the numbers in each column/row of a table or variable, the standard deviation of matching elements of several tables, or the case-wise standard deviation of several variables in your data set. The result will depend on which items in your document you select before using this tool:
- If a single table is selected, the result will show the standard deviation of the numbers in that table.
- If two or more tables are selected, the result will show the standard deviation of the matching parts of those tables. Matching is done based on common row and column labels.
- If a single variable is selected, the result will show the standard deviation of the numbers of the variable.
- If two or more variables are selected, this tool will create a new variable that contains the standard deviation of those variables for each case.
Subtract
This tool subtracts the numbers in one table or output from those in another. You may also choose to subtract a single number or to subtract from a single number. If variables are selected, new variable(s) are added to the data set which contain the subtracted values for each case.
Sum > Sum, Sum Each Column, Sum Each Row
This feature can be used to calculate the sum of values in a table, the sum of values between two or more tables, the sum of the values in two or more variables for each case, or the sum of each column or row in an input table of data.
Variance > Variance > Variance Each Column, Variance Each Row
This tool can be used to compute the variance of the numbers in a table, the variance of the numbers in each column, the variance of the numbers in each row of a table, the variance of matching elements of several tables, or the case-wise variance of several variables in your data set. The result will depend on which items in your document you select before using this tool:
- If a single table is selected, the result will show the variance of the numbers in that table.
- If two or more tables are selected, the result will show the variance of the matching parts of those tables. Matching is done based on common row and column labels.
- If two or more variables are selected, this tool will create a new variable that contains the variance of those variables for each case.
Options
Note that not all fields listed below are available or appropriate for all calculations.
For most calculations the following are available:
Input - The tables to be used in the calculation.
Divide the - This option is available for the divide function. Choose whether you wish the numerator of the division to be an output containing one or more numbers, or a single value. When Output is selected, you can choose an output in your document. When Single numeric value is selected you can then type in the desired value.
Multiply the - This option is available for the multiply function. Choose whether you wish the first term of the multiplication to be an output containing one or more numbers, or a single value. When Output is selected, you can choose an output in your document. When Single numeric value is selected you can then type in the desired value.
by the - This option is available for the divide and multiply functions. Choose whether you wish the denominator of the division/multiplication to be an output containing one or more numbers, or a single value. When Output is selected, you can choose an output in your document. When Single numeric value is selected you can then type in the desired value.
From the - This option is available for the subtract function. Choose whether you wish the first term in the subtraction to be an output containing one or more numbers, or a single value. When Output is selected, you can choose an output in your document. When Single numeric value is selected you can then type in the desired value.
subtract the - This option is available for the subtract function. Choose whether you wish the second term in the subtraction to be an output containing one or more numbers, or a single value. When Output is selected, you can choose an output in your document. When Single numeric value is selected you can then type in the desired value.
Values to count - This option is available for the any of and count functions. Enter ranges, single values, or inequalities to include in this calculation. Allowed options include:
- Single values, like 10 - include values exactly equal to a single value, e.g. values which are equal to 10.
- Sequences of values, like 1, 2, 3.5 - include values that are exactly equal to any of the entered values.
- Ranges of values, like [1-10] - include all values that fall within a given range. For example, the range listed to the left will match values between 1 and 10, inclusive.
- Inequalities, like >5.3 - include all values which satisfy the inequality For example, the inequality to the left will match all values greater than 5.3. This works for greater than (>), greater than or equal to (>=), less than (<), and less than or equal to (<=).
- Missing values, NA - match all values which are missing.
- Infinities, Inf, -Inf - include all values which are infinity or negative infinity.
Categorical Labels - When one or more of your inputs is a variable with categories, you can type in the labels of the categories that you wish to match.
Calculate for inputs with incomplete data - If this option is checked, then any missing values in any of the inputs will be ignored in the calculation. If unchecked, then missing values are not removed before calculation and will propagate as missing values in the output. If a dropdown is present:
- Yes (show warning) - Any missing values in any of the inputs will be ignored in the calculation. If a missing value is present in the input then a warning that missing values are being removed from the calculation will be presented to the user.
- Yes - The calculation will have the same behavior as above, except the warning about missing values being removed from the calculation will not be given.
- No - Missing values are not removed before calculation and will propagate as missing values in the output.
Variance formula / Standard Deviation formula - This option is shown for the variance or standard deviation functions. It allows you to choose whether the Population or Sample formula is used to compute the Variance or Standard Deviation (see technical details below).
Automatically match elements - Only shown when there are multiple inputs to Input. This controls how and whether matching is done between the labels of the inputs. The default, "Yes - hide unmatched", will look for matching labels in the rows and columns of the inputs before proceeding with the calculation, and any rows/columns that are not contained in all the inputs will not be included in the output. See the Example. For a full description of the matching algorithm, see the Technical Details. "Yes - show unmatched" will also perform matching, but any unmatched rows (columns) will appear in the output as rows (columns) of all missing values. Selecting "No" for this option will cause any labels in the data to be ignored and not perform any matching. Selecting "Custom" will bring up two additional controls that allow for specifying the matching behavior for rows and columns separately.
Match rows - Only shown if Automatically match elements is set to "Custom". Specifies the matching behavior when comparing row labels of the inputs. "Yes - show unmatched" and "Yes - hide unmatched" look for exact matches in the row labels in the inputs. "Fuzzy - show unmatched" and "Fuzzy - hide unmatched" perform fuzzy matching so that labels that differ only by a single character are considered to be a match.
Match columns - Only shown if Automatically match elements is set to "Custom". The options are the same as Match rows, but control the matching between columns.
Rows to exclude - Here you can type in row labels that should be excluded from the calculation.
Columns to exclude - As above, but for columns.
For First/Last calculations the following fields are available:
Data - The data output from which rows or columns are to be extracted. This can be a summary table, crosstab or R vector, R matrix, R array, R data frame, R list, or R table.
Unit - Specify whether to extract last rows, columns, or date-time periods (year, quarter, month, week, day, hour, minute, second) from the data. If the data is one-dimensional, "Row" should be selected for this option.
Keep - The number of units (specified in the previous control) to extract.
Calendar periods - Whether to consider calendar periods. When calendar periods are considered, a year is counted as the 12-month period from January to December; a quarter is one of the following 3-month periods: January-March, April-June, July-September, October-December; a month is the period from the first to last day in the 12 months in the calendar; a week is the 7 day period from Sunday to Saturday; a day is the 24 hour period from midnight to midnight; and hours, minutes and seconds are the fixed periods as indicated on a clock. For example, if the last 2 calendar years are to be kept and the latest date in the data is Nov 6, 2020, all data labeled in 2019 and 2020 will be kept. When non-calendar periods are considered instead, all periods are compared against a reference date-time, which is the latest date-time in the labels. For periods that do not have a fixed duration, i.e., years, quarters, and months, a period is defined using the day and month of the reference date-time. For example, if the reference date-time is Nov 6 2020 12:34:56pm, and the last 2 years of data are to be kept, all data down to (but not including) Nov 6 2018 12:34:56pm would be retained. If the reference date-time falls on a day for which the corresponding end-of-period date-time does not exist, then all actual date-times up to this mock end-of-period date-time are considered. For example, 1 year before Feb 29 2020 12:34:56pm includes all date-times down to and including March 1, 2019.
Technical details
When there are multiple inputs, inputs that contain only a single row (column) may be recycled to a matrix/table with the same number of rows (columns) as the other inputs. For example, if the supplied inputs are a table with three rows and two columns and another table with two rows and a single column, the single column will be expanded by rows into a table with three rows and two columns with each row identical to the original column.
When Automatically match elements is set to Yes - show unmatched or Yes - hide unmatched, both exact matches and fuzzy matches (as described above) are considered, and the order of elements may be permuted so that the names match. It also may transpose an input if, for example, the column names of one input match the row names of another input.
When Calculate for inputs with incomplete data is checked and the input to the calculation consists entirely of missing values, then the returned output value is set to missing data.
For variance or standard deviation functions, the default option is to compute the variance or standard deviation using the sample variance formula rather than the population variance formula. You have the option to choose between these two formulas so that you can apply whichever is relevant to your calculation. Please note that this default is different to Insert Ready-Made Formulas/Ready-Made New Variables > Variance and Insert Ready-Made Formulas/Ready-Made New Variables > Standard Deviation which always uses the population formula. In both cases, the standard deviation is the square root of the variance, and the two variance formulas are:
Sample Variance
\[\begin{align} \sigma_{sample}^2=\frac{\sum^n_{i=1}(x_i -\frac{\sum^n_{i=1}x_i }{n})^2}{n-1} \end{align}\]
Population Variance
\[\begin{align} \sigma_{population}^2=\frac{\sum^n_{i=1}(x_i -\frac{\sum^n_{i=1}x_i }{n})^2}{n} \end{align}\]