11 Creating new variables
11 Creating new variables generate and replace. This chapter shows the basics of creating and modifying variables in Stata. We saw how to work.
11 Creating new variables
If. Stata says nothing about missing values then no missing values were generated. • You can use generate to set the storage type of the new variable as it is
Recode categorical variables
not meet any of the conditions of the rules are left unchanged generate(newvar) specifies the names of the variables that will contain the transformed ...
Obtain predictions residuals
after estimation
Stata Multiple-Imputation Reference Manual
Generate/replace and register passive variables 289 Below we briefly summarize the conditions under which the repeated-imputation inference from the.
Test linear hypotheses after estimation
Joint test that the coefficients on all variables x* are equal to 0 test each condition separately ... conditions with multiple equality operators.
Recode categorical variables
not meet any of the conditions of the rules are left unchanged generate(newvar) specifies the names of the variables that will contain the transformed ...
STATAMULTIPLE-IMPUTATION
REFERENCEMANUAL
RELEASE18®A Stata Press Publication
StataCorp LLC
College Station, Texas
®Copyright
c1985-2023 StataCorp LLC
All rights reserved
Version 18
Published by Stata Press, 4905 Lakeway Drive, College Station, Texas 77845Typeset in T
EXISBN-10: 1-59718-391-1
ISBN-13: 978-1-59718-391-8
This manual is protected by copyright. All rights are reserved. No part of this manual may be reproduced, stored
in a retrieval system, or transcribed, in any form or by any means-electronic, mechanical, photocopy, recording, or
otherwise-without the prior written permission of StataCorp LLC unless permitted subject to the terms and conditions
of a license granted to you by StataCorp LLC to use the software and documentation. No license, express or implied,
by estoppel or otherwise, to any intellectual property rights is granted by this document.StataCorp provides this manual "as is" without warranty of any kind, either expressed or implied, including, but
not limited to, the implied warranties of merchantability and fitness for a particular purpose. StataCorp may make
improvements and/or changes in the product(s) and the program(s) described in this manual at any time and without
notice.The software described in this manual is furnished under a license agreement or nondisclosure agreement. The software
may be copied only in accordance with the terms of the agreement. It is against the law to copy the software onto
DVD, CD, disk, diskette, tape, or any other medium for any purpose other than backup or archival purposes.
The automobile dataset appearing on the accompanying media is Copyright c1979 by Consumers Union of U.S.,
Inc., Yonkers, NY 10703-1057 and is reproduced by permission from CONSUMER REPORTS, April 1979. Stata,, Stata Press, Mata,, and NetCourse are registered trademarks of StataCorp LLC.Stata and Stata Press are registered trademarks with the World Intellectual Property Organization of the United Nations.
NetCourseNow is a trademark of StataCorp LLC.
Other brand and product names are registered trademarks or trademarks of their respective companies. For copyright information about the software, typehelp copyrightwithin Stata.The suggested citation for this software is
StataCorp. 2023.Stata: Release 18. Statistical Software. College Station, TX: StataCorp LLC.Contents
Intro substantive
. . . . . . . . . . . . . . . . . . . . . . . . . Introduction to multiple-imputation analysis 1 Intro. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to mi 15
Estimation
. . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation commands for use with mi estimate 22
mi add. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add imputations from another mi dataset 25
mi append. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Append mi data 28
mi convert. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Change style of mi data 31
mi copy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cop ymi flongsep data 34
mi describe. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Describe mi data 36
mi erase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erase mi datasets 40
mi estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation using multiple imputations 41
mi estimate using . . . . . . . . . . . . . . . . . Estimation using pre viouslysa vedestimation results 70 mi estimate postestimation . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mi estimate 79 mi expand. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expand mi data 81
mi export. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export mi data 83
mi export ice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export mi data to ice format 85
mi export nhanes1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export mi data to NHANES format 87 mi extract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extract original or imputed data from mi data 90
mi import. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import data into mi 92
mi import flong. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import flong-lik edata into mi 95
mi import flongsep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import flongsep-lik edata into mi 98 mi import ice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import ice-format data into mi 102
mi import nhanes1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import NHANES-format data into mi 106 mi import wide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import wide-lik edata into mi 111
mi impute. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute missing v alues114
mi impute chained . . . . . . . . . . . . . . . . . . . . . Impute missing v aluesusing chained equations 140 mi impute intreg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using interv alre gression169
mi impute logit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using logistic re gression178
mi impute mlogit . . . . . . . . . . . . . . . . . . . . . . . . Impute using multinomial logistic re gression184 mi impute monotone . . . . . . . . . . . . . . . . . . . . . . . . . Im putemissing v aluesin monotone data 190 mi impute mvn . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using multi variatenormal re gression207 mi impute nbreg . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using ne gativebinomial re gression234 mi impute ologit . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using ordered logistic re gression240 mi impute pmm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using predicti vemean matching 245 mi impute poisson. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using Poisson re gression251
mi impute regress. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using linear re gression256
mi impute truncreg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using truncated re gression262mi impute usermethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User-defined imputation methods 268
mi merge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mer gemi data 283
mi misstable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T abulatepattern of missing v alues287
mi passive. . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate/replace and re gisterpassi vev ariables289
mi predict. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Obtain multiple-imputation predictions 294
mi ptrace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Load parameter -tracefile into Stata 308
mi rename. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rename v ariable311
mi replace0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Replace original data 314
mi reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reset imputed or passi vev ariables316
mi reshape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reshape mi data 319
i ii Contents mi select. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programmer" salternati veto mi e xtract321
mi set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declare multiple-imputation data 323
mi stsplit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Split and j ointime-span records for mi data 327
mi test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T esth ypothesesafter mi estimate 330
mi update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ensure that mi data are consistent 339
mi varying. . . . . . . . . . . . . . . . . . . . . . . . . . . . Identify v ariablesthat v aryacross imputations 342
mi xeq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ex ecutecommand(s) on indi vidualimputations 345
mi XXXset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declare mi data to be svy ,st, ts, xt, etc. 348
noupdate option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The noupdate option 350
Styles
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dataset styles 352
Technical
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Details for programmers 360
Workflow
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suggested w orkflow372
Glossary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Subject and author index
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Cross-referencing the documentation
When reading this manual, you will find references to other Stata manuals, for example, [U] 27 Overview of Stata estimation commands;[ R]regress; and[ D]reshape. The first ex- ample is a reference to chapter 27,Overview of Stata estimation commands, in theUser"s Guide; the second is a reference to theregressentry in theBase Reference Manual; and the third is a reference to thereshapeentry in theData Management Reference Manual. All the manuals in the Stata Documentation have a shorthand notation: [GSM]Getting Started with Stata for Mac [GSU]Getting Started with Stata for Unix [GSW]Getting Started with Stata for Windows [U]Stata User"s Guide [R]Stata Base Reference Manual [ADAPT]Stata Adaptive Designs: Group Sequential Trials Reference Manual [BAYES]Stata Bayesian Analysis Reference Manual [BMA]Stata Bayesian Model Averaging Reference Manual [CAUSAL]Stata Causal Inference and Treatment-Effects Estimation Reference Manual [CM]Stata Choice Models Reference Manual [D]Stata Data Management Reference Manual [DSGE]Stata Dynamic Stochastic General Equilibrium Models Reference Manual [ERM]Stata Extended Regression Models Reference Manual [FMM]Stata Finite Mixture Models Reference Manual [FN]Stata Functions Reference Manual [G]Stata Graphics Reference Manual [IRT]Stata Item Response Theory Reference Manual [LASSO]Stata Lasso Reference Manual [XT]Stata Longitudinal-Data/Panel-Data Reference Manual [META]Stata Meta-Analysis Reference Manual [ME]Stata Multilevel Mixed-Effects Reference Manual [MI]Stata Multiple-Imputation Reference Manual [MV]Stata Multivariate Statistics Reference Manual [PSS]Stata Power, Precision, and Sample-Size Reference Manual [P]Stata Programming Reference Manual [RPT]Stata Reporting Reference Manual [SP]Stata Spatial Autoregressive Models Reference Manual [SEM]Stata Structural Equation Modeling Reference Manual [SVY]Stata Survey Data Reference Manual [ST]Stata Survival Analysis Reference Manual [TABLES]Stata Customizable Tables and Collected Results Reference Manual [TS]Stata Time-Series Reference Manual [I]Stata Index [M]Mata Reference Manual iii TitleIntro substantive -Introduction to multiple-imputation analysisDescriptionRemar ksand e xamplesRef erencesAlso see
Description
Missing data arise frequently. Various procedures have been suggested in the literature over the last several decades to deal with missing data (for example,Anderson
1957Hartle yand Hocking
1971Rubin 1972
1987
]; and
Dempster ,Laird, and Rubin
1977]). The technique of multiple imputation, which originated in early 1970 in application to survey nonresponse ( Rubin 1976
), has gained popularity increasingly over the years as indicated by literature (for example, Rubin 1976
1987
1996
Little
1992Meng 1994
Schafer
1997v anBuuren, Boshuizen, and Knook 1999
Little and Rubin
2020Carlin et al.
2003Ro yston
20042005a
2005b
2007
2009
Reiter and
Raghunathan
2007Carlin, Galati, and Ro yston
2008Ro yston,Carlin, and White
2009White,
Royston, and Wood
2011]; and
Carpenter and K enward
2013This entry presents a general introduction to multiple imputation and describes relevant statistical terminology used throughout the manual. The discussion here, as well as other statistical entries in this manual, is based on the concepts developed in Rubin 1987
) and
Schafer
1997Remarks and examples
Remarks are presented under the following headings:Motivating example
What is multiple imputation?
Theory underlying multiple imputation
How large should M be?
Assumptions about missing data
Patterns of missing data
Proper imputation methods
Analysis of multiply imputed data
A brief introduction to MI using Stata
Summary
We will use the following definitions and notation. An imputation represents one set of plausible values for missing data, and so multiple imputations represent multiple sets of plausible values. With a slight abuse of the terminology, we will use the termimputationto mean the data where missing values are replaced with one set of plausible values. We useMto refer to the number of imputations andmto refer to each individual imputation; that is,m=1 means the first imputation,m=2 means the second imputation, and so on.Motivating example
Consider a fictional case-control study examining a relationship between smoking and heart attacks. 12Intr osubstantive - Intr oductionto m ultiple-imputationanal ysis
. use https://www.stata-press.com/data/r18/mheart0 (Fictional heart attack data; BMI missing) . describe Contains data from https://www.stata-press.com/data/r18/mheart0.dtaObservations: 154 Fictional heart attack data;
BMI missing
Variables: 9 19 Jun 2022 10:50Variable Storage Display Value name type format label Variable labelattack byte %9.0g Outcome (heart attack) smokes byte %9.0g Current smoker age float %9.0g Age, in years bmi float %9.0g Body mass index, kg/m^2 female byte %9.0g Gender hsgrad byte %9.0g High school graduate marstatus byte %9.0g mar Marital status: single, married, divorced alcohol byte %24.0g alc Alcohol consumption: none, <2 drinks/day, >=2 drinks/day hightar byte %9.0g Smokes high tar cigarettesSorted by: In addition to the primary variablesattackandsmokes, the dataset contains information about subjects" ages, body mass indexes (BMIs), genders, educational statuses, marital statuses, alcohol consumptions, and the types of cigarettes smoked (low/high tar). We will use logistic regression to study the relationship betweenattack, recording heart attacks, andsmokes: . logit attack smokes age bmi hsgrad femaleIteration 0: Log likelihood = -91.359017
Iteration 1: Log likelihood = -79.374749
Iteration 2: Log likelihood = -79.342218
Iteration 3: Log likelihood = -79.34221
Logistic regression Number of obs = 132
LR chi2(5) = 24.03
Prob > chi2 = 0.0002
Log likelihood = -79.34221 Pseudo R2 = 0.1315attackCoefficient Std. err. z P>|z| [95% conf. interval]
smokes1.544053 .3998329 3.86 0.000 .7603945 2.327711 age.026112 .017042 1.53 0.125 -.0072898 .0595137 bmi.1129938 .0500061 2.26 0.024 .0149837 .211004 hsgrad.4048251 .4446019 0.91 0.363 -.4665786 1.276229 female.2255301 .4527558 0.50 0.618 -.6618549 1.112915 _cons-5.408398 1.810603 -2.99 0.003 -8.957115 -1.85968 The above analysis used 132 observations out of the available 154 because some of the covariates contain missing values. Let"s examine the data for missing values, something we could have done first: Intro substantive- Intr oductionto m ultiple-imputationanal ysis3 . misstable summarizeObs<.Unique
VariableObs=. Obs>. Obs<.values Min Max
bmi22 132132 17.22643 38.24214 We discover thatbmiis missing in 22 observations. Our analysis ignored the information about the other covariates in these 22 observations. Can we somehow preserve this information in the analysis? The answer is yes, and one solution is to use multiple imputation.What is multiple imputation?
Multiple imputation (MI) is a flexible, simulation-based statistical technique for handling missing data. Multiple imputation consists of three steps:1.Imputation step.Mimputations (completed datasets) are generated under some chosen
imputation model.2.Completed-data analysis (estimation) step. The desired analysis is performed separately on
each imputationm=1,:::,M. This is called completed-data analysis and is the primary analysis to be performed once missing data have been imputed.3.Pooling step. The results obtained fromMcompleted-data analyses are combined into a
single multiple-imputation result. The completed-data analysis step and the pooling step can be combined and thought of generally as the analysis step. MIas a missing-data technique has two appealing main features: 1) the ability to perform a wide variety of completed-data analyses using existing statistical methods; and 2) separation of theimputation step from the analysis step. We discuss these two features in more detail in what follows.
Among other commonly used missing-data techniques that allow a variety of completed-dataanalyses are complete-case analysis or listwise (casewise) deletion, available-case analysis, and single-
imputation methods. Although these procedures share one ofMI"s appealing properties, they lack some ofMI"s statistical properties.quotesdbs_dbs14.pdfusesText_20[PDF] state primary nomination paper
[PDF] state representative district map
[PDF] state teaching certificate
[PDF] state the characteristics of oral language
[PDF] states that recognize federal tax treaties
[PDF] static method in java
[PDF] static utility methods in java
[PDF] station france bleu lorraine nancy
[PDF] station radio france bleu paris
[PDF] stationnement gratuit lille
[PDF] statista food delivery industry
[PDF] statistical report sample pdf
[PDF] statistics canada international students
[PDF] statistics class 10 full chapter