Main Content

Create dummy variables

returns a matrix `D`

= dummyvar(`group`

)`D`

containing zeros and ones, whose columns are
dummy variables for the grouping variables in
`group`

. Each column of `group`

is a
single grouping variable, with values indicating category levels. The rows of
`group`

represent observations across all variables.

Use dummy variables in regression analysis and ANOVA to indicate values of categorical predictors.

`dummyvar`

treats`NaN`

values and undefined categorical levels in`group`

as missing data and returns`NaN`

values in`D`

.If a column of ones is introduced in the matrix

`D`

, then the resulting matrix`X = [ones(size(D,1),1) D]`

is rank deficient. If`group`

has multiple columns, then the matrix`D`

itself is rank deficient because dummy variables produced from any column of`group`

always sum to a column of ones. Regression and ANOVA calculations often address this issue by eliminating one dummy variable (implicitly setting the coefficients for dropped columns to zero) from each group of dummy variables produced by a column of`group`

.If

`group`

is a numeric vector with levels that do not correspond exactly to the integers`1:max(group)`

, first convert the data to a categorical vector by using`categorical`

. You can then pass the result to`dummyvar`

. For an example, see Create Dummy Variables from Multiple Grouping Variables.

Alternatively, use `onehotencode`

to encode data labels. Consider using
`onehotencode`

instead of `dummyvar`

in these
cases:

To encode a table of categorical data labels

To specify the dimension to expand for encoding the data labels

`regress`

| `anova1`

| `grp2idx`

| `categories`

| `onehotencode`

| `onehotdecode`