A Driver Analysis models the relationship between a dependent variable and one or more independent variables and quantifies the importance of each of the independent variables in predicting the dependent variable relative to the other independent variables.
Interpretation
Driver analysis computes an estimate of the importance of various independent variables in predicting a dependent variable. Most commonly, the dependent variable measures preference or usage of a particular brand (or brands), and the independent variables measure characteristics of this brand (or brands). For example, the dependent variable may be a measure of overall satisfaction and the independent variables may be measurements of satisfaction with bank fees, efficiency, friendliness, wait times, etc.
Variable statistics
Importance score the magnitude of the importance coefficient indicates the contribution each independent variable has in explaining the overall outcome variable relative to the other independent variables in the model. These importance scores are scaled to be a proportion of 100 to allow an easier numeric scale to interpret.
Raw score the magnitude of the raw importance contribution the independent variable has to the overall outcome variable. This raw importance is the contribution of the independent variable has in explaining the model Rsquared relative to the other variables.
The coefficient is colored if the variable is statistically significant at the 5% level.
Standard Error measures the accuracy of an estimate. The smaller the standard error, the more accurate the predictions.
tstatistic the estimate divided by the standard error. The magnitude (either positive or negative) indicates the significance of the variable. The values are highlighted based on their magnitude.
pvalue expresses the tstatistic as a probability. A pvalue under 0.05 means that the variable is statistically significant at the 5% level; a pvalue under 0.01 means that the variable is statistically significant at the 1% level. Pvalues under 0.05 are shown in bold.
Overall statistics
n the sample size of the model
Rsquared assess the goodness of fit of the model. A larger number indicates that the model captures more of the variation in the dependent variable.
See also Regression Diagnostics.
Create a Linear Regression Model in Displayr
With unstacked data, the process is similar to a standard Regression model.
 1. Go to Anything > Advanced Analysis > Regression > Driver Analysis
 2. Under Inputs > Outcome, select your dependent variable
 3. Under Inputs > Predictor(s), select your independent variables
Stacked data can be handled by:
 1. Go to Anything > Advanced Analysis > Regression > Driver Analysis
 2. Check the 'Allow stacked data' control to allow stacked data.
 2. Under Inputs > Outcome, select a single dependent variable, if stacked it would have a multi structure.
 3. Under Inputs > Predictor(s), select your independent variable set, this should have a grid structure that suitably matches the outcome variable above.
See Question Types for more information on grid and multitype structures.
Object Inspector Options
Outcome The variable to be predicted by the predictor variables.
Predictors The variable(s) to predict the outcome.
Algorithm The fitting algorithm. Defaults to Regression but may be changed to other machine learning methods.
Type: You can use this option to toggle between different types of regression models, but note that certain types are not appropriate for certain types of outcome variable.

 Linear Appropriate for a continuous outcome variable. See Regression  Linear Regression.
 Binary Logit Appropriate if the outcome is binary (i.e. falls in one of two categories). See Regression  Binary Logit.
 Ordered Logit Appropriate for a discrete outcome where the categories have a natural order (e.g. Low, Medium, High). See Regression  Ordered Logit.
 Multinomial Logit Appropriate for a discrete outcome with unordered categories. See Regression  Multinomial Logit.
 Poisson Appropriate for count outcomes (i.e. outcomes that take only positive integer values). See Regression  Poisson Regression.
 QuasiPoisson Appropriate for count outcomes. See Regression  QuasiPoisson Regression.
 NBD Appropriate for count outcomes. See Regression  NBD Regression.
Robust standard errors Computes standard errors that are robust to violations of the assumption of constant variance (i.e., heteroscedasticity). See Robust Standard Errors. This is only available when Type is Linear.
Missing data See Missing Data Options.
Output

 Summary The default; as shown in the example above.
 Detail Typical R output, some additional information compared to Summary, but without the pretty formatting.
 ANOVA Analysis of variance table containing the results of Chisquared likelihood ratio tests for each predictor.
 Relative Importance Analysis The results of a relative importance analysis (also known as Johnson's relative weights). See here and the references for more information. This option is not available for Multinomial Logit. Note that categorical predictors are not converted to be numeric, unlike in Driver (Importance) Analysis  Relative Importance Analysis.
 Shapley Regression See here and the references for more information. This option is only available for Linear Regression. Note that categorical predictors are not converted to be numeric, unlike in Driver (Importance) Analysis  Shapley.
 Jaccard Coefficient Computes the relative importance of the predictor variables against the outcome variable with the Jaccard Coefficients. See Driver (Importance_ Analysis  Jaccard Coefficient. This option requires both binary variables for the outcome variable and the predictor variables.
 Correlation Computes the relative importance of the predictor variables against the outcome variable via the bivariate Pearson product moment correlations. See Driver (Importance) Analysis  Correlation and references therein for more information.
 Effects Plot Plots the relationship between each of the Predictors and the Outcome. Not available for Multinomial Logit.
Correction The multiple comparisons correction applied when computing the pvalues of the posthoc comparisons.
Variable names Displays Variable Names in the output instead of labels.
Absolute importance scores Whether the absolute value of Relative Importance Analysis scores should be displayed.
Auxiliary variables Variables to be used when imputing missing values (in addition to all the other variables in the model).
Weight. Where a weight has been set for the R Output, it will automatically applied when the model is estimated. By default, the weight is assumed to be a sampling weight, and the standard errors are estimated using Taylor series linearization (by contrast, in the Legacy Regression, weight calibration is used). See Weights, Effective Sample Size and Design Effects.
Filter The data is automatically filtered using any filters prior to estimating the model.
Crosstab Interaction Optional variable to test for interaction with other variables in the model. The interaction variable is treated as a categorical variable. Coefficients in the table are computed by creating separate regressions for each level of the interaction variable. To evaluate whether a coefficient is significantly higher (blue) or lower (red), we perform a ttest of the coefficient compared to the coefficient using the remaining data as described in Driver Analysis. Pvalues are corrected for multiple comparisons across the whole table (excluding the NET column). The Pvalue in the subtitle is calculated using a likelihood ratio test between the pooled model with no interaction variable, and a model where all predictors interact with the interaction variable.
Automated outlier removal percentage A numeric value between 0 and 50 (including 0 but not 50) is used to specify the percentage of the data that is removed from analysis due to outliers. All regression types except for the case of Multinomial Logit support this feature. If a zerovalue is selected for this input control then no outlier removal is performed and a standard regression output for the entire (possibly filtered) dataset is applied. If a nonzero value is selected for this option then the regression model is fitted twice. The first regression model uses the entire dataset (after filters have been applied) and identifies the observations that generate the largest residuals. The userspecified percent of cases in the data that have the largest residuals are then removed. The regression model is refitted on this reduced dataset and output returned. The specific residual used varies depending on the regression Type.

 Linear: The studentized residual in an unweighted regression and the Pearson residual in a weighted regression. The Pearson residual in the weighted case adjusts appropriately for the provided survey weights.

 Binary Logit and Ordered Logit: A type of surrogate residual from the sure R package (see Greenwell, McCarthy, Boehmke and Liu (2018)^{[1]} for more details). In Binary Logit it uses the resids function with the jitter parametrization. In Ordered Logit it uses the resids function with the latent parametrization to exploit the ordered logit structure.

 NBD Regression, Poisson Regression: A studentized deviance residual in an unweighted regression and the Pearson residual in a weighted regression.

 QuasiPoisson Regression: A type of quasideviance residual via the rstudent function in an unweighted regression and the Pearson residual in a weighted regression.
The studentized residual computes the distance between the observed and fitted value for each point and standardizes (adjusts) based on the influence and an externally adjusted variance calculation . The studentized deviance residual computes the contribution the fitted point has to the likelihood and standardizes (adjusts) based on the influence of the point and an externally adjusted variance calculation (see rstudent function in R and Davison and Snell (1991)^{[2]} for more details). The Pearson residual in the weighted case computes the distance between the observed and fitted value and adjusts appropriately for the provided survey weights. See rstudent function in R and Davison and Snell (1991) for more details of the specifics of the calculations.
Note that this feature is not supported when using the Multiple imputation option for handling Missing data.
Stack data Whether the input data should be stacked before analysis. Stacking can be desirable when each individual in the data set has multiple cases and an aggregate model is desired. More information is available at Stacking Data Files. If this option is chosen then the Outcome needs to be a single Question that has a Multitype structure suitable for regression such as a Pick One  Multi, Pick Any, or Number  Multi. Similarly, the Predictor(s) need to be a single Question that has a Grid type structure such as a Pick Any  Grid or a Number  Grid. In the process of stacking, the data reduction is inspected. Any constructed NETs are removed unless comprised of source values that are mutually exclusive to other codes, such as the result of merging two categories.
Random seed Seed used to initialize the (pseudo)random number generator for the model fitting algorithm. Different seeds may lead to slightly different answers, but should normally not make a large difference.
Increase allowed output size Check this box if you encounter a warning message "The R output had size XXX MB, exceeding the 128 MB limit..." and you need to reference the output elsewhere in your document; e.g., to save predicted values to a Data Set or examine diagnostics.
Maximum allowed size for output (MB). This control only appears if Increase allowed output size is checked. Use it to set the maximum allowed size for the regression output in Megabytes. The warning referred to above about the R output size will state the minimum size you need to increase to to return the full output. Note that having very many large outputs in one document or page may slow down the performance of your document and increase load times.
Additional options are available by editing the code.
DIAGNOSTICS
Plot  Cook's Distance Creates a line/rug plot showing Cook's Distance for each observation.
Plot  Cook's Distance vs Leverage Creates a scatterplot showing Cook's distance vs leverage for each observation.
Plot  Influence Index Creates index plots of studentized residuals, hat values, and Cook's distance.
Multicollinearity Table (VIF) Creates a table containing variance inflation factors (VIF) to diagnose multicollinearity.
Plot  Normal QQ Creates a normal QuantileQuantile (QQ) plot to reveal departures of the residuals from normality.
PredictionAccuracy Table Creates a table showing the observed and predicted values, as a heatmap.
Test Residual Heteroscedasticity Conducts a heteroscedasticity test on the residuals.
Test Residual Normality (ShapiroWilk) Conducts a ShapiroWilk test of normality on the (deviance) residuals.
Plot  Residuals vs Fitted Creates a scatterplot of residuals versus fitted values.
Plot  Residuals vs Leverage Creates a plot of residuals versus leverage values.
Plot  ScaleLocation Creates a plot of the square root of the absolute standardized residuals by fitted values.
Test Residual Serial Correlation (DurbinWatson) Conducts a DurbinWatson test of serial correlation (autocorrelation) on the residuals.
SAVE VARIABLE(S)
Fitted Values Creates a new variable containing fitted values for each case in the data.
Predicted Values Creates a new variable containing predicted values for each case in the data.
Residuals Creates a new variable containing residual values for each case in the data.
Additional Properties
When using this feature you can obtain additional information that is stored by the R code which produces the output.
 To do so, select Create > R Output.
 In the R CODE, paste: item = YourReferenceName
 Replace YourReferenceName with the reference name of your item. Find this in the Report tree or by selecting the item and then going to Properties > General > Name from the object inspector on the right.
 Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.
For a more indepth discussion on extracting information from objects in R, check out our blog post here.
Properties which may be of interest are:
 Summary outputs from the regression model:

 item$summary$coefficients # summary regression outputs
More information
Acknowledgements
See Regression  Generalized Linear Model.
Technical Details
There are two main approaches offered to determine the importance of variables in a Driver analysis, Shapley regression and Relative Importance Analysis. Both techniques consider a way to decompose the contribution each predictor variable (driver) has towards the outcome variable. Shapley regression^{[3]} is localised to Linear regression and is an approach that uses an exhaustive search of all possible linear regression models to compute the contribution each predictor variable has in the Rsquare statistic. Relative Importance Analysis^{[4]} takes a different approach to use an orthogonal representation of the predictor variables and reconciles this with the original variables to determine their contribution. An overview comparing the two approaches is given in the two posts ^{[5]} and ^{[6]}.
In the case of Relative importance analysis, the original predictor variables are standardized and then transformed to be orthogonal. In particular, the predictors are first standardized to have mean zero and variance equal to one (score standardized) and then transformed to be orthogonal. The orthogonal transformation is based on the singular value decomposition and takes the following structure. Assuming the predictor variables have been score standardized are represented via a matrix \(\small X\) which has the Singular Value Decomposition \(\small X = U\sum V^t \). The orthogonal representation of \(\small X\) is defined in \(\small Z = U\sum V^t \). The importance of these orthogonal predictors can be measured in the case of linear regression by regressing on the outcome variable \(\small Y\) with the estimates \(\small \beta = (Z^TZ)^{1} Z^TY\). Due to the orthogonal and thus uncorrelated structure of \(\small Z\), the \(\small\beta\) values represent the relative importance of each of the predictors in the \(\small Z\) representation. To link the orthogonal \(\small Z\) variables back to the original \(\small X\) variables, define \(\small \Lambda = (Z^TZ)^{1} Z^TX\). This is used to determine the relative importance of the original predictors with the quantity \(\small \epsilon = (\Lambda^2\beta^2)\) where the square operator is used elementwise. The significance of the relative importance values is examined with a ttest by computing the standard error of the \(\small\epsilon\). This is estimated with \(\small\widehat{\sigma_\epsilon}\) where the standard error of the \(\small i\)th element of \(\small\epsilon\) is defined \(\small \widehat\sigma_{\epsilon,i} = \sqrt{\sum^{i}_{j=1}(\Lambda_{ij}\sigma_j^4 (2+4 \frac{\beta_j^2}{\sigma_j^2})}\)
where \(\small\sigma_j\) denotes the standard error of the \(\small j\)th regression coefficient. The above approach assumed a standard multiple regression. In the case of a Generalized Linear Model, the above approach is still used with the estimation of the \(\small\beta\) coefficients using the appropriate GLM regressed on the orthogonal \(\small Z\) variables.
References
 Greenwell, B., M., McCarthy, A., J., Boehmke, B., C., & Liu, D. (2018). Residuals and Diagnostics for Binary and Ordinal Regression Models: An Introduction to the sure Package. The R Journal, 10(1), 381. https://doi.org/10.32614/rj2018004
 Davison, A. C. and Snell, E. J. (1991) Residuals and diagnostics. In: Statistical Theory and Modelling. In Honour of Sir David Cox, FRS, eds. Hinkley, D. V., Reid, N. and Snell, E. J., Chapman & Hall.
 Bock, T., "What is Shapley Value Regression?" [Blog post]. Accessed from [1]
 Johnson J. W. (2000). A Heuristic Method for Estimating the Relative Weight of Predictor Variables in Multiple Regression. Multivariate behavioral research, 35(1), 1–19. https://doi.org/10.1207/S15327906MBR3501_1
 Yap, J., "The Difference Between Shapley Regression and Relative Weights" [Blog post]. Accessed from [2]
 Yap, J., "When to Use Relative Weights Over Shapley" [Blog post]. Accessed from [3]
Next
How to Do Driver Analysis in Displayr
How To Do Driver Analysis in Q