Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
dummy 1 dummy A Stata Command to Create Dummy Variables and Interactions from Categorical Variables The file dummy.ado implements a command that creates dummy variables from categorical variables and interactions from categorical and continuous variables. Place the file in Stata’s working directory (folder), or in the folder that contains Stata’s ado files. The command can be used to: 1. Create a set of dummy variables from a categorical variable. 2. Create two or three way interaction dummy variables from categorical variables. 3. Create two, three, or four way interactions from a single continuous variable and one, two, or three categorical variables. The dummy command has the following features, which must be understood in order to use it successfully. 1. The command uses the reference category method, and does not create dummy variables for reference categories or their interactions. Thus, the full list of created variables can be used as independent variables in a regression. 2. By default, the reference category is the one with the lowest coded value. The reference category can be changed with the omit characteristic, described below. 3. Stata must know whether a variable being operated on by the command is a categorical variable or a continuous variable. By default, the command assumes that variables are continuous. To inform the command that a variable is categorical, its factor characteristic must be specified as category, as described below. 4. Names for dummy variables and interactions are formed from abbreviations of the variables and, for categorical variables, their coded values. By default, the abbreviation for a variable is its first letter. You can change the abbreviation used with the abrev characteristic, described below. You must use this characteristic if two variables operated on by the command begin with the same letter, or if the name that would be created by the command is already taken by an existing variable. 5. Optionally, you can specify a minimum frequency for a category that is required for a dummy variable to be created for that category. This can be useful if you want to lump categories with very few observations together with the reference category. 6. The command creates labels for the variables it creates from the value labels of the categorical variables, if present. Thus, providing value labels for categorical variables is a good way to document the variables that the command creates. dummy 2 Command Syntax for Variable Characteristics Before giving Stata the dummy command, you must first specify the characteristics of the variables on which it operates. To learn more about characteristics of variables, type help char at the command line. For categorical variables, it is required that you specify the “category” factor characteristic for each categorical variable. This is done by: char varname[factor]category where varname is the categorical variable. For example, if drug is a categorical variable, type: char drug[factor]category If you want to change the default reference category, use the omit characteristic. char varname[omit]# where varname is the categorical variable, and # is the value of the category that you want to make the reference category. For example, if drug is a categorical variable that takes the values 1, 2, 3, 4, and you want to make the category with the value 3 the reference category, type: char drug[omit]3 If you want to change the default abbreviation for a variable, you must supply one or more abbreviation characteristics for the variable. The syntax of the abbreviation characteristic is: char varname[abrev#]name where varname is the variable you want to make the abbreviation for, name is the abbreviation, and # is the number of characters in name. name must conform to the requirements of a variable name in Stata, with one additional consideration, that it should be short. This is because a category number, plus other abbreviations and category numbers it if it is involved in an interaction, will be appended. This full name must conform to the maximum length of eight characters. You can supply additional abbreviations. The dummy command will attempt to use the longest abbreviation it can. For example, char drug[abrev1]g dummy drug will result in dummy variables with names g1, g2, and g4 being created, given that we specified that category number three would be the reference category. dummy 3 Command Syntax for the dummy Command The syntax for the dummy command is: dummy varlist, minfreq(#) 1 2 3 varlist is a variable list of up to four existing categorical and continuous variable names, with a maximum of one continuous variable and a maximum of three categorical variables. If you specify only one variable, it must be a categorical variable, and a set of dummies will be created for it. If you specify two variable names, a set of variables for the two way interaction will be created. If you specify three variable names, a set of variables for the three way interaction will be created, and so on. In the minimum frequency option, minfreq, “#” is a number that specifies the minimum frequency a category must have in order for a dummy variable to be created for it. The option can be abbreviated by “m(#)”. If the option is omitted, it is identical to specifying minfreq(1). The numbers “1 2 3” may be specified if interactions are requested. For example, when three way interactions are created, the command must first create main effects dummies, and then two way interactions, before the three way interaction can be created. Normally, the command would then delete the main effects and two way interaction variables. However, specifying the “1” option keeps the main effects dummies that were created, and specifying the “2” option keeps the two way interaction variables that were created. The “3” option would only be specified if varlist contained four variables. Example As an example, consider the “fully” interactive model estimated with the sysage.dta data set in Lab 7. The dummy command could be used to more easily construct the variables needed in the regression model in the following manner. First, although it is not necessary, provide the drug and disease categorical variables with value labels. (Type help label for information on value labels.) label label label label define define values values drug 1 "drug 1" 2 "drug 2" 3 "drug 3" 4 "drug 4" disease 1 "disease 1" 2 "disease 2" 3 "disease 3" drug drug disease disease Next, specify the characteristics of the variables. Since the variables drug and disease begin with the same letter, one of them must be abbreviated. char drug[factor]category char disease[factor]category char drug[abrev1]g Lastly, create the dummy variables and interactions in one fell swoop. dummy drug disease age, 1 2 dummy 4 Troubleshooting Here are some common error messages and their “fixes”. dummy drug Error : This combination not allowed Fix: Stata thinks drug is a continuous, rather than a categorical variable. You forgot to specify the category factor characteristic for drug, i.e., “char drug[factor]category”. dummy drug Error : d2 already exists. Fix: When Stata attempted to create the dummy variable d2 for the second drug category, it found a pre-existing variable with the name “d2”. Either drop the variable d2 and issue the dummy command again, or use the abbreviation characteristic to give drug an abbreviation other than “d”, e.g., “char drug[abrev1]g” or “char drug[abrev2]dr”. END