How Variable Sets/Questions are Automatically Grouped and Labeled – Technical Documentation

When a data file is loaded, Q and Displayr will automatically identify the structure of the variables, group them, and assign variables to the correct variable set type (or question type). This includes automatically fixing a number of common problems in the data file and adding labels to variable sets if appropriate. This article goes into detail about how that is done.

If you have AI enabled in Displayr, it will name variable sets in a smarter, more sensible way when combining variables. See Displayr AI for more information.

Default settings

The starting assumption is that variables are categorical (Nominal/Pick One/single response). If the variable contains text information, it's assumed that the variable set should have a Text structure. Otherwise, a Numeric structure.

Data file metadata

Better-quality data files contain metadata indicating certain properties of the data. For example, an SPSS .sav data file can contain Multiple Response Sets, which Q/Displayr interprets as instructions for grouping variables into questions or variable sets.

Pattern matching

When importing SPSS data files, internal checks run to see if the data file has been set up in the best possible way. The better a data file has been set up, the less time it takes to manipulate variable sets, and the analysis will be much easier and faster. Sticking to the data file specifications is strongly recommended to ensure minimal work getting started with your analysis, as this additionally aids in the correct grouping and structure of the imported variables in your data set from the outset.

Labeling

At the same time, the variable sets are analyzed, the associated labels will also be identified. Labels are determined using the following criteria:

The label is taken directly from your data file if a variable is determined not to be related to other variables (e.g., Nominal, Number, or Pick One variable sets).
If your data file contains information about variable sets consisting of multiple variables (e.g., Binary - Multi or Pick Any), then the settings in the data file will be honored. This is particularly relevant to SPSS .sav files that contain information about multiple response sets (as SPSS calls them), which can have a defined label.
Failing the above, the name is inferred from common structures in the labels. For example, if two variables are grouped into a variable set, and one has a label:

Q5a. Bank of America

and the other

Q5b. Citibank

Then Q5 will be inferred as a commonality and will use "Q5" in the variable set label. If it can also be inferred that the variables have numeric sequences, e.g., "Q5_1" and "Q5_2", then this will be indicated by "Q5X" where X replaces the sequence of numbers.

Articles in this section

Default settings

Data file metadata

Pattern matching

Labeling

Related articles