A model that has been fitted to a set of data can be used to predict the outcome variable of either the same data set, or a different data set provided that the data include the same prediction variables that were used to fit the model, see How to Save the Predicted Values from a Machine Learning Model.
The built-in automations for how this is done implements a predict() function which performs 2 basic steps:
- A
CheckPredictionVariables
function tests whether all fitted variables are included in the prediction data set and returns an error if they are not. It also generates a warning for cases where a categorical prediction variable takes a new class (known as a factor level in R) that was not used for fitting, and sets the new classes to NA. - Call the predict method of the underlying R package and return predictions for the full set of data before any
subset
filter. The predictions may be NA due to new factor levels and depending on the treatment of missing data.
See Also
How to Assign Respondents to Clusters/Segments in a New Data File