Note: while the following article uses Q terminology with regard to Experiment questions, the content applies to Experiments in Displayr as well.
Q has precise specifications for setting up the data for Experiment questions. Generally, the data will need to be set up in one of three ways: using advanced variable creation routines in Q (e.g., JavaScript), by setting up the variables in Excel and importing them, or by providing precise specifications to the company generating the original data file.
Usually an Experiment question requires dozens or perhaps hundreds of variables. The data is setup with the variables that contain measures of preference (e.g., choices, ratings, rankings) at the top. Each attribute then appears one after another. Examples of the setup and analysis of a number of common experimental designs are described in the Experiment Case Studies
Once the variables have been prepared, an experiment is set up by selecting the relevant variables, right-clicking and selecting Set Question, changing the Question Type to Experiment and pressing OK. The rest of this page describes how the variables need to be set up.
Note that you should ensure that the variables created for an Experiment have consistent source values as recoding them using the Value Attributes will have no effect. For example, if you have an attribute Package Size which has a level 2 Litres with a source value of 3, then this level should always have a source value of 3 in other variables relating to that attribute.
Scale and distribution
As elsewhere in Q, the program automatically selects the appropriate statistical distributions based upon the type of data. In the case of an Experiment, the key issue is what Question Type the dependent variables would have had if they were not modeled as a part of an Experiment question.
Pick One (Categorical): Choice-Based Conjoint / Discrete Choice Analysis
If the dependent variables’ Variable Type is Categorical, Q models the data using the multinomial distribution (i.e., mixed logit).
A single variable is required for each task that is presented to consumers (a task is equivalent to the concept of a choice set in choice modeling). This variable needs to have values that correspond to the alternatives in the choice set, where a 1 represents the first alternative, a 2 the second alternative, etc.
Ranking: Traditional Conjoint Analysis
For an example, see Ranking-Based Conjoint in Discount.Q.
If the dependent variables have a Variable Type of Ordered Categorical, Q estimates the exploded logit model, adjusting for ties. Rankings are determined from the values of the variable; higher values indicate greater preference. These values can be modified by editing the variables’ Value Attributes (i.e., recoding) or directly changing values in the Data tab. Rankings may be tied. The values used to rank do not need to be consistent across respondents (e.g., one respondent may have ranks of 10, 9 and 8, while another may have ranks of 4.2, 3.2 and 1.1).
The interpretation of the parameters from the ranking model is identical to that of a standard choice model. The choice probabilities that can be computed from the utilities are the probabilities that an alternative will be ranked first. A separate variable is required for each alternative that is ranked (e.g., if five alternatives are ranked, then they must be represented by five variables).
Each variable requires a different label (e.g., A, B, C, D
and E
). Where multiple ranking tasks are required, common variable labels are required to be used across the variables. For example, if there are six tasks of three alternatives and respondents are asked to rank the alternatives in each task, the labels would need to be something like A, B, C, A, B, C, A, B, C, A, B, C, A, B, C, A, B, C
.
The variables corresponding to alternatives’ attributes need to be laid out in the same order, with one attribute after another, and the same labels for all the variables within each attribute.
Pick One data can be modeled as Ranking data by recoding the Pick One question into multiple variables, one for each alternative, where the chosen alternative is given a higher value than the value assigned to the other alternatives (i.e., so that it is a ranking with ties, where one alternative is ranked highest). This approach will produce the same results as if analyzed as a Pick One question (i.e., it is only an issue of data layout; the statistical models are the same). This approach is useful where experimental data contains both Pick One and Ranking questions.
Choice Experiment with Rankings
This is for an experiment which is identical to a choice-based conjoint, except that people are asked to rank the alternatives rather than choose their preferred alternative.
The attributes are set up as with a standard choice-based conjoint experiment. For the outcome (dependendent) variables:
- There is one variable for each alternative in each question. That is, if there are 10 choice questions with four alternatives, there are 40 alternatives.
- The variable labels need to be consistent for the alternatives. E.g., if there were three alternatives and four questions, their labels, in order, would be A, B, C, A, B, C, A, B, C, A, B, C. For an example, see Repeated Ranking-Based Conjoint Discount.Q.
- The values are the rankings (I higher number indicates higher preference).
Best-Worst Choice Experiments
Best-worst conjoint experiments, which present consumers with choice sets containing hypothetical alternatives constructed from attributes, are set up in the same way as for Choice Experiment with Rankings, except that a 1 is used for the best, -1 for the worst, and 0 for the rest.
Numbers (numeric): Metric Conjoint Analysis / Ratings-Based Conjoint
For an example, see Rating-based Conjoint in Discount.Q.
Where the preference measures have Variable Type of Numeric, Q estimates models using the general linear model (i.e., linear regression), where the residuals are assumed to be normally distributed, and intercepts and parameters representing the standard error of the residuals are estimated.
The labels of the dependent variables and independent variables need to follow the same structure as for rankings. The intercept is automatically computed (i.e., an attribute of constants is not required).
With numeric experiments, there must only be a single choice set and all respondents must provide valid data on all alternatives (i.e., no missing data is permitted).
It is sometimes advisable to employ the ranking model even if the data might ordinarily be interpreted as being numeric (e.g., purchase probability scales, purchase quantities, 10 point ratings), as:
- The ratings may not possess interval-scale properties within and between respondents (which is an assumption of interpreting the data as numeric).
- There may be a need to create a choice simulator.
- If estimating mixture models (e.g., latent class analysis), treating the data as ranks ensures that the resulting model is not primarily driven by the intercept.
- The ranking model is more parsimonious.
Choice Experiments with Availability Designs
Availability designs are labeled experiments where respondents see different subsets of alternatives from task to task. For example, a paired comparison experiment showing three alternatives compared pairwise in three choice sets (i.e., A vs B, B vs C and A vs C) is set up in the same way as with ranking experiments. Alternatives that are not shown still need to be represented by variables for both the dependent variables and the attributes. For example, variables representing the pairwise comparisons of the three alternatives would be presented as A, B, C, A, B, C, A, B and C, even though the bold alternatives are never shown to respondents (and the corresponding variables should only contain NaNs). Availability designs involving categorical (i.e., Pick One) dependent variables need to be converted into rankings using the method described in the earlier section on rankings.
Constant Sum Choice Experiments
Constant sum experiments present consumers with choice sets containing hypothetical alternatives and require the respondents to allocate objects – usually tokens, future purchases or dollars – to the alternatives. In Q, these are set up in the same way as for repeated rankings (described previously), except that the Variable Type should be setup as Money. Where a respondent does not see an alternative, it should have a value of NaN. Respondents are not required to have allocated the same number of objects (and thus this model can be used to model purchasing behavior).
Attributes
Each attribute is presented as a series of variables, where they are in the order of the alternatives of the choice set. For example, in Eggs.Q
, rows 84 to 107 contain the variables for the attribute called Alternative. Variable 84 contains the attribute level for the first alternative in the first choice set, variable 22 for the second alternative in the first choice set, etc.
All variables for an attribute must have the same Label in the Variables and Questions tab.
Categorical attributes
If the variables for an attribute have Variable Type of Categorical, Q analyses the attribute as categorical, using dummy coding. For a given attribute variable, a separate category is estimated for each Value in the variable’s Value Attributes. The name of the attribute level is taken from the Label.
Attribute levels can be collapsed by dragging and dropping in the Outputs Tab.
Numeric attributes
If the Variable Type is Numeric, Q estimates a single numeric coefficient for the attribute. When evaluating whether or not to treat a variable as Numeric, Q only uses the Variable Type of the first variable of the attribute.
It is highly desirable to code any numeric attributes so that the resulting utility estimates are broadly on the same scale. This is for two reasons. First, it avoids numerical precision issues. For example, if your price data is in thousands, you will avoid numerical precision issue by coding numeric attributes in tens-of-thousands (e.g., a price of 5,000 would be coded as 0.5). Second, it makes it easier to compare coefficients between numeric and categorical attributes; perhaps the best way of achieving this is to scale variables so that they have a mean of 0 and a standard deviation of 0.5 (i.e., divide each variable by a 2 standard deviations), as this makes categorical variables and numeric variables have approximately the same variance (see Andrew Gelman and Jennifer Hill (2007): Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge.).
Attributes can be changed from categorical to numeric by right-clicking on the attribute name in the Outputs Tab and selecting Convert to Categorical Attribute or Convert to Numeric Attribute.
‘None of these’ alternative(s) and alternative-specific designs
Often experimental designs contain an alternative with no attributes. Or, they may contain different attributes for different alternatives. When setting up questions containing attributes in Q you need to assign a variable for every attribute for each alternative, even if the attribute is irrelevant to the alternative. Where an attribute is irrelevant for an alternative, create an extra level for the attribute with a Value of NaN for all respondents. Ensure that this value is also marked as missing data in the Value Attributes dialog box. For an example, see Brand Price Trade-Off Experiment.
Alternatives (Alternative Specific Constants)
Alternative specific constants are entered as another attribute. An example of this is in the Eggs.Q
study (i.e., A, B and C represent the alternative specific constants).
Missing values in Experiments
Missing values in the dependent variable result in observations from experiments being excluded. Missing values in the independent variables are treated as having coefficients as 0 (note: this makes it relatively easy to setup experiments with different attributes for different alternatives but can also means that additional care is required when setting up experiments).