A key property of a variable set or question is the structure. The Variable Set or Question Structure determines:
- How Tables are created and manipulated, see Reference Table below.
- How variables appear when used in R code.
The Variable Set/Question Structure is automatically inferred when the data is imported. It can be modified in Displayr by selecting a Variable Set in the Data Sets tree and either:
- Combining or splitting variables into/from a Variable Set (Split or Combine).
- Modifying it in the object inspector under Properties > GENERAL > Structure.
It can be modified in Q on the Variables and Questions tab by:
- Combining or splitting variables into/from a Question using Set Question.
- Modifying the Variable Type or Question Type drop-downs.
Variable Set/Question Structures vary, non-exhaustively, on the following dimensions:
- The properties of the variables: Text, Binary, Nominal, Ordinal, Numeric, Date.
- The number of variables in the set/question: one or more than one.
- Whether the variables within a set/question are organized in a two-dimensional structure (i.e., a grid) or not.
- Whether the variables contain structural dependencies (i.e., where the meaning of the values in one of the variables is structurally related to the meanings of another of the variables).
Examples of the appropriate structure given a type of survey question are below. Where names in Displayr and Q differ, Displayr's name is listed first, Q's second:
An overview of how each structure appears in a Displayr table is under Reference Table below.
Single Variable
Text
A single variable containing text (or, numeric data that is interpreted as text). For example, data obtained from a question like:
Please enter the name of the last soft drink you bought.
_____________
Nominal / Pick One
A single variable that contains unordered, mutually exclusive, and exhaustive categories (i.e., has a nominal measurement scale). For example, data generated by the following question:
Are you...
o Male
o Female
Whereas a Text Variable Set stores the data as text, a Nominal Variable Set has both Value Attributes and Data Reductions.
Ordinal / Pick One
A single variable that contains ordered, mutually exclusive, and exhaustive categories (i.e., has a ordinal measurement scale). For example, data generated by the following question:
How old are you?
o Under 30
o 30 to 50
o 50 or more
For most purposes, an Ordinal Variable Set is identical to a Nominal Variable Set. The only difference is that some statistical tests will take the ordering into account. In Q, both question types are Pick One questions with different Variable Type settings.
Numeric / Number
A numeric variable (i.e., it has an interval or ratio measurement scale). For example, data that represents the temperature at a given point in time.
Date/Time
A numeric variable where the values represent times and/or dates. It contains the number of milliseconds since 1/1/1970.
JavaScript variables have special in-built functions for manipulating date questions (e.g., use Q.Year/Month/Day/Hour/Second() to extract bits of a date or time, and Q.YearDif/MonthDif/WeekDif/DayDif/HourDif/MinuteDif/SecondDif() to compare two of them).
Date/Time variables can be converted to different time scales (e.g., months, weeks, minutes) by clicking on the variable and pressing Date/Time in the Object Inspector in Displayr or clicking the Values button on the Variables and Questions tab in Q.
Multiple variables
Text - Multi
Multiple related variables that contain text, e.g. generated from a question like:
Please type in the names of your three favorite soft drinks
1.____________
2.____________
3.____________
Binary - Multi / Pick Any
There are only two non-missing values in each variable. Where the variable originally contains more than two categories, they are combined (see Value Attributes). This is the main way that non-mutually exclusive categories are represented in a Data Set (see also Binary - Multi (Compact) below). Common examples of Binary - Multi Variable Sets / Pick Any Questions are lists of products purchased by people in a customer database, and responses to multiple response questions in surveys, such as:
Which of the following have you bought in the past week? Tick all that apply.
[] Coke
[] Pepsi
[] Fanta
[] None of these
Note that a row in a Data Set can have three possible values in a variable in a Binary - Multiple Variable Set / Pick Any Question: the value that corresponds to a category being applicable or being selected (1), the value that corresponds to it not being selected (0), and a missing value category, which is represented as a NaN in the data.
Nominal - Multi / Pick One - Multi
A set of categorical variables sharing the same scale points, where the scale points are mutually exclusive and unordered.
Which meal did you eat most recently at each of these restaurants?
Breakfast | Lunch | Dinner | |
---|---|---|---|
McDonald's | o | o | o |
Burger King | o | o | o |
Wendy's | o | o | o |
Ordinal - Multi / Pick One - Multi
A set of categorical variables sharing the same scale points, where the scale points are mutually exclusive and ordered.
In the vast majority of instances, Ordinal - Multi data is analyzed in the same way as Nominal - Multi data. In Q, both question types are Pick One - Multi questions with different Variable Type settings.
How would you rate your satisfaction with your most recent meal at each of these restaurants?
Low | Medium | High | |
---|---|---|---|
McDonald's | o | o | o |
Burger King | o | o | o |
Wendy's | o | o | o |
Numeric - Multi / Number - Multi
A series of numeric variables measured on the same scale. For example:
Next to the brands below, please indicate how many times you have purchased them in the past week.
Coke ___
Pepsi ___
Fanta ___
Grid
Binary - Grid / Pick Any - Grid
This is a generalization of a Binary - Multi Variable Set / Pick Any Question where the variables can be thought of as being ordered in two dimensions. For example, the data generated from a series of related questions such as:
Which of these brands are fun?
[] Coke | [] Pepsi | [] Fanta |
Which of these brands are sexy?
[] Coke | [] Pepsi | [] Fanta |
Which of these brands are masculine?
[] Coke | [] Pepsi | [] Fanta |
Displayr and Q infer the structure of the grid by inspecting the variables' labels at the time of importing the data. Where Displayr or Q cannot discern the structure of the data, this can be set when changing the Variable Set structure / Question type.
Numeric - Grid / Number - Grid
This is a generalization of a Numeric - Multi Variable Set / Number - Multi Question, where the variables can be ordered in two dimensions. For example, the data generated by:
In the past month, how many economy flights did you take on...
Qantas ___ United ___ SAS ___
In the past month, how many business class flights did you take on...
Qantas ___ United ___ SAS ___
Displayr and Q infer the structure of the grid by inspecting the variables' labels at the time of importing the data. Where Displayr or Q cannot discern the structure of the data, this can be set by changing the Variable Set structure.
Structural dependencies
Binary - Multi (Compact) / Pick Any - Compact
The same underlying data as a Binary - Multi Variable Set / Pick Any Question, except that is stored in a max-multi format. That is, the first variable contains the first response, the second variable contains the second response, etc. This format should only be used to represent multiple response data when there are truly huge code frames (e.g., thousands of options). It is generally inferior to a Nominal structure as it is unwieldy for data manipulation (e.g., for use in formulas) and it cannot accommodate the notion of missing data.
Ranking
Multiple numeric variables that represent a ranking, where the highest number is most preferred and ties are permitted. For example:
Rank the following brands according to how much you like them... Place a 3 next to the brand you like most, a 2 in your next preferred brand and a 1 next to your least preferred brand.
Coke ____
Pepsi ____
Fanta ____
Note that if your question uses lowest numbers as indicating alternatives being more preferred you will need to reverse the values assigned to each rank.
Experiment
This question type is used to represent the various different types of experiments, from randomized experiments (Fully randomized experiments through to Conjoint Analysis and Choice Modeling) (see Experiments in Q).
Which of these would you buy?
Coke | Pepsi | Fanta |
$2.00 | $2.10 | $1.80 |
o | o | o |
Reference Table
Structure | Shown in Displayr Data Sets tree | What is shown in a Table | Example |
Nominal / Pick One | Category proportions | ||
Ordinal / Pick One | Ordered category proportions |
||
Numeric / Number | Average | ||
Text | Raw text |
|
|
Date/Time (stored in a YYYY/MM/DD or similar format) |
Proportion in each aggregated date | ||
Binary-Multi / Pick Any (commonly used for multi-select questions and Top 2 boxes) |
Proportion selected a particular response(s) for a variable (such as Aware) |
||
Binary-Multi (Compact) / Pick Any - Compact (multi-select data in max-multi format where each variable is a selection number) |
Proportion selected a response | ||
Nominal-Multi / Pick One - Multi (commonly used to group brands to show in the same table) |
Proportion of category selected for each variable | ||
Ordinal-Multi / Pick One - Multi (commonly used for ratings across brands) |
Proportion of category selected for each variable | ||
Numeric - Multi / Number - Multi (commonly used for numeric answers across brands) |
Average of each variable | ||
Binary - Grid / Pick Any - Grid (commonly used to group multi-selects across brands) |
Proportion selected each pair of attributes | ||
Numeric - Grid / Number - Grid |
Average of each pair of attributes | ||
Ranking |
Probability % of item being chosen as first (based on coefficient from logit model) | ||
Experiment
|
Coefficient from Experiment |