Stata Multiple-Imputation Reference Manual PDF

11 Creating new variables

11 Creating new variables generate and replace. This chapter shows the basics of creating and modifying variables in Stata. We saw how to work.

11 Creating new variables

If. Stata says nothing about missing values then no missing values were generated. • You can use generate to set the storage type of the new variable as it is

Drop variables or observations

Recode categorical variables

not meet any of the conditions of the rules are left unchanged generate(newvar) specifies the names of the variables that will contain the transformed ...

Stata Multiple-Imputation Reference Manual

Generate/replace and register passive variables 289 Below we briefly summarize the conditions under which the repeated-imputation inference from the.

Test linear hypotheses after estimation

Joint test that the coefficients on all variables x* are equal to 0 test each condition separately ... conditions with multiple equality operators.

Recode categorical variables

not meet any of the conditions of the rules are left unchanged generate(newvar) specifies the names of the variables that will contain the transformed ...

Stata Customizable Tables and Collected Results Reference Manual

STATAMULTIPLE-IMPUTATION

REFERENCEMANUAL

RELEASE18®A Stata Press Publication

StataCorp LLC

College Station, Texas

®Copyright

1985-2023 StataCorp LLC

All rights reserved

Version 18

Published by Stata Press, 4905 Lakeway Drive, College Station, Texas 77845

Typeset in T

ISBN-10: 1-59718-391-1

ISBN-13: 978-1-59718-391-8

This manual is protected by copyright. All rights are reserved. No part of this manual may be reproduced, stored

in a retrieval system, or transcribed, in any form or by any means-electronic, mechanical, photocopy, recording, or

otherwise-without the prior written permission of StataCorp LLC unless permitted subject to the terms and conditions

of a license granted to you by StataCorp LLC to use the software and documentation. No license, express or implied,

by estoppel or otherwise, to any intellectual property rights is granted by this document.

StataCorp provides this manual "as is" without warranty of any kind, either expressed or implied, including, but

not limited to, the implied warranties of merchantability and fitness for a particular purpose. StataCorp may make

improvements and/or changes in the product(s) and the program(s) described in this manual at any time and without

notice.

The software described in this manual is furnished under a license agreement or nondisclosure agreement. The software

may be copied only in accordance with the terms of the agreement. It is against the law to copy the software onto

DVD, CD, disk, diskette, tape, or any other medium for any purpose other than backup or archival purposes.

The automobile dataset appearing on the accompanying media is Copyright c

1979 by Consumers Union of U.S.,

Inc., Yonkers, NY 10703-1057 and is reproduced by permission from CONSUMER REPORTS, April 1979. Stata,, Stata Press, Mata,, and NetCourse are registered trademarks of StataCorp LLC.

Stata and Stata Press are registered trademarks with the World Intellectual Property Organization of the United Nations.

NetCourseNow is a trademark of StataCorp LLC.

Other brand and product names are registered trademarks or trademarks of their respective companies. For copyright information about the software, typehelp copyrightwithin Stata.

The suggested citation for this software is

StataCorp. 2023.Stata: Release 18. Statistical Software. College Station, TX: StataCorp LLC.

Intro substantive

. . . . . . . . . . . . . . . . . . . . . . . . . Introduction to multiple-imputation analysis 1 Intro

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to mi 15

Estimation

. . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation commands for use with mi estimate 22

mi add

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add imputations from another mi dataset 25

mi append

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Append mi data 28

mi convert

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Change style of mi data 31

mi copy

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cop ymi flongsep data 34

mi describe

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Describe mi data 36

mi erase

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erase mi datasets 40

mi estimate

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation using multiple imputations 41

mi estimate using . . . . . . . . . . . . . . . . . Estimation using pre viouslysa vedestimation results 70 mi estimate postestimation . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mi estimate 79 mi expand

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expand mi data 81

mi export

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export mi data 83

mi export ice

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export mi data to ice format 85

mi export nhanes1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export mi data to NHANES format 87 mi extract

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extract original or imputed data from mi data 90

mi import

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import data into mi 92

mi import flong

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import flong-lik edata into mi 95

mi import flongsep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import flongsep-lik edata into mi 98 mi import ice

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import ice-format data into mi 102

mi import nhanes1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import NHANES-format data into mi 106 mi import wide

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import wide-lik edata into mi 111

mi impute

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute missing v alues114

mi impute chained . . . . . . . . . . . . . . . . . . . . . Impute missing v aluesusing chained equations 140 mi impute intreg

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using interv alre gression169

mi impute logit

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using logistic re gression178

mi impute mlogit . . . . . . . . . . . . . . . . . . . . . . . . Impute using multinomial logistic re gression184 mi impute monotone . . . . . . . . . . . . . . . . . . . . . . . . . Im putemissing v aluesin monotone data 190 mi impute mvn . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using multi variatenormal re gression207 mi impute nbreg . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using ne gativebinomial re gression234 mi impute ologit . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using ordered logistic re gression240 mi impute pmm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using predicti vemean matching 245 mi impute poisson

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using Poisson re gression251

mi impute regress

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using linear re gression256

mi impute truncreg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impute using truncated re gression262

mi impute usermethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User-defined imputation methods 268

mi merge

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mer gemi data 283

mi misstable

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T abulatepattern of missing v alues287

mi passive

. . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate/replace and re gisterpassi vev ariables289

mi predict

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Obtain multiple-imputation predictions 294

mi ptrace

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Load parameter -tracefile into Stata 308

mi rename

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rename v ariable311

mi replace0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Replace original data 314

mi reset

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reset imputed or passi vev ariables316

mi reshape

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reshape mi data 319

i ii Contents mi select

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programmer" salternati veto mi e xtract321

mi set

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declare multiple-imputation data 323

mi stsplit

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Split and j ointime-span records for mi data 327

mi test

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T esth ypothesesafter mi estimate 330

mi update

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ensure that mi data are consistent 339

mi varying

. . . . . . . . . . . . . . . . . . . . . . . . . . . . Identify v ariablesthat v aryacross imputations 342

mi xeq

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ex ecutecommand(s) on indi vidualimputations 345

mi XXXset

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declare mi data to be svy ,st, ts, xt, etc. 348

noupdate option

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The noupdate option 350

Styles

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dataset styles 352

Technical

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Details for programmers 360

Workflow

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suggested w orkflow372

Glossary

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

Subject and author index

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389

Cross-referencing the documentation

When reading this manual, you will find references to other Stata manuals, for example, [U] 27 Overview of Stata estimation commands;[ R]regress; and[ D]reshape. The first ex- ample is a reference to chapter 27,Overview of Stata estimation commands, in theUser"s Guide; the second is a reference to theregressentry in theBase Reference Manual; and the third is a reference to thereshapeentry in theData Management Reference Manual. All the manuals in the Stata Documentation have a shorthand notation: [GSM]Getting Started with Stata for Mac [GSU]Getting Started with Stata for Unix [GSW]Getting Started with Stata for Windows [U]Stata User"s Guide [R]Stata Base Reference Manual [ADAPT]Stata Adaptive Designs: Group Sequential Trials Reference Manual [BAYES]Stata Bayesian Analysis Reference Manual [BMA]Stata Bayesian Model Averaging Reference Manual [CAUSAL]Stata Causal Inference and Treatment-Effects Estimation Reference Manual [CM]Stata Choice Models Reference Manual [D]Stata Data Management Reference Manual [DSGE]Stata Dynamic Stochastic General Equilibrium Models Reference Manual [ERM]Stata Extended Regression Models Reference Manual [FMM]Stata Finite Mixture Models Reference Manual [FN]Stata Functions Reference Manual [G]Stata Graphics Reference Manual [IRT]Stata Item Response Theory Reference Manual [LASSO]Stata Lasso Reference Manual [XT]Stata Longitudinal-Data/Panel-Data Reference Manual [META]Stata Meta-Analysis Reference Manual [ME]Stata Multilevel Mixed-Effects Reference Manual [MI]Stata Multiple-Imputation Reference Manual [MV]Stata Multivariate Statistics Reference Manual [PSS]Stata Power, Precision, and Sample-Size Reference Manual [P]Stata Programming Reference Manual [RPT]Stata Reporting Reference Manual [SP]Stata Spatial Autoregressive Models Reference Manual [SEM]Stata Structural Equation Modeling Reference Manual [SVY]Stata Survey Data Reference Manual [ST]Stata Survival Analysis Reference Manual [TABLES]Stata Customizable Tables and Collected Results Reference Manual [TS]Stata Time-Series Reference Manual [I]Stata Index [M]Mata Reference Manual iii Title

Intro substantive -Introduction to multiple-imputation analysisDescriptionRemar ksand e xamplesRef erencesAlso see

Description

Missing data arise frequently. Various procedures have been suggested in the literature over the last several decades to deal with missing data (for example,

Anderson

1957

Hartle yand Hocking

1971
Rubin 1972
1987
]; and

Dempster ,Laird, and Rubin

1977
]). The technique of multiple imputation, which originated in early 1970 in application to survey nonresponse ( Rubin 1976
), has gained popularity increasingly over the years as indicated by literature (for example, Rubin 1976
1987
1996

Little

1992
Meng 1994

Schafer

1997
v anBuuren, Boshuizen, and Knook 1999

Little and Rubin

2020

Carlin et al.

2003

Ro yston

2004
2005a
2005b
2007
2009

Reiter and

Raghunathan

2007

Carlin, Galati, and Ro yston

2008

Ro yston,Carlin, and White

2009

White,

Royston, and Wood

2011
]; and

Carpenter and K enward

2013
This entry presents a general introduction to multiple imputation and describes relevant statistical terminology used throughout the manual. The discussion here, as well as other statistical entries in this manual, is based on the concepts developed in Rubin 1987
) and

Schafer

1997

Remarks and examples

Remarks are presented under the following headings:

Motivating example

What is multiple imputation?

Theory underlying multiple imputation

How large should M be?

Assumptions about missing data

Patterns of missing data

Proper imputation methods

Analysis of multiply imputed data

A brief introduction to MI using Stata

Summary

We will use the following definitions and notation. An imputation represents one set of plausible values for missing data, and so multiple imputations represent multiple sets of plausible values. With a slight abuse of the terminology, we will use the termimputationto mean the data where missing values are replaced with one set of plausible values. We useMto refer to the number of imputations andmto refer to each individual imputation; that is,m=1 means the first imputation,m=2 means the second imputation, and so on.

Motivating example

Consider a fictional case-control study examining a relationship between smoking and heart attacks. 1

2Intr osubstantive - Intr oductionto m ultiple-imputationanal ysis

. use https://www.stata-press.com/data/r18/mheart0 (Fictional heart attack data; BMI missing) . describe Contains data from https://www.stata-press.com/data/r18/mheart0.dta

Observations: 154 Fictional heart attack data;

BMI missing

Variables: 9 19 Jun 2022 10:50Variable Storage Display Value name type format label Variable labelattack byte %9.0g Outcome (heart attack) smokes byte %9.0g Current smoker age float %9.0g Age, in years bmi float %9.0g Body mass index, kg/m^2 female byte %9.0g Gender hsgrad byte %9.0g High school graduate marstatus byte %9.0g mar Marital status: single, married, divorced alcohol byte %24.0g alc Alcohol consumption: none, <2 drinks/day, >=2 drinks/day hightar byte %9.0g Smokes high tar cigarettesSorted by: In addition to the primary variablesattackandsmokes, the dataset contains information about subjects" ages, body mass indexes (BMIs), genders, educational statuses, marital statuses, alcohol consumptions, and the types of cigarettes smoked (low/high tar). We will use logistic regression to study the relationship betweenattack, recording heart attacks, andsmokes: . logit attack smokes age bmi hsgrad female

Iteration 0: Log likelihood = -91.359017

Iteration 1: Log likelihood = -79.374749

Iteration 2: Log likelihood = -79.342218

Iteration 3: Log likelihood = -79.34221

Logistic regression Number of obs = 132

LR chi2(5) = 24.03

Prob > chi2 = 0.0002

Log likelihood = -79.34221 Pseudo R2 = 0.1315attackCoefficient Std. err. z P>|z| [95% conf. interval]

smokes1.544053 .3998329 3.86 0.000 .7603945 2.327711 age.026112 .017042 1.53 0.125 -.0072898 .0595137 bmi.1129938 .0500061 2.26 0.024 .0149837 .211004 hsgrad.4048251 .4446019 0.91 0.363 -.4665786 1.276229 female.2255301 .4527558 0.50 0.618 -.6618549 1.112915 _cons-5.408398 1.810603 -2.99 0.003 -8.957115 -1.85968 The above analysis used 132 observations out of the available 154 because some of the covariates contain missing values. Let"s examine the data for missing values, something we could have done first: Intro substantive- Intr oductionto m ultiple-imputationanal ysis3 . misstable summarize

Obs<.Unique

VariableObs=. Obs>. Obs<.values Min Max

bmi22 132132 17.22643 38.24214 We discover thatbmiis missing in 22 observations. Our analysis ignored the information about the other covariates in these 22 observations. Can we somehow preserve this information in the analysis? The answer is yes, and one solution is to use multiple imputation.

What is multiple imputation?

Multiple imputation (MI) is a flexible, simulation-based statistical technique for handling missing data. Multiple imputation consists of three steps:

1.Imputation step.Mimputations (completed datasets) are generated under some chosen

imputation model.

2.Completed-data analysis (estimation) step. The desired analysis is performed separately on

each imputationm=1,:::,M. This is called completed-data analysis and is the primary analysis to be performed once missing data have been imputed.

3.Pooling step. The results obtained fromMcompleted-data analyses are combined into a

single multiple-imputation result. The completed-data analysis step and the pooling step can be combined and thought of generally as the analysis step. MIas a missing-data technique has two appealing main features: 1) the ability to perform a wide variety of completed-data analyses using existing statistical methods; and 2) separation of the

imputation step from the analysis step. We discuss these two features in more detail in what follows.

Among other commonly used missing-data techniques that allow a variety of completed-data

analyses are complete-case analysis or listwise (casewise) deletion, available-case analysis, and single-

imputation methods. Although these procedures share one ofMI"s appealing properties, they lack some ofMI"s statistical properties.quotesdbs_dbs14.pdfusesText_20

[PDF] state of climate change 2019

[PDF] state primary nomination paper

[PDF] state representative district map

[PDF] state teaching certificate

[PDF] state the characteristics of oral language

[PDF] states that recognize federal tax treaties

[PDF] static method in java

[PDF] static utility methods in java

[PDF] station france bleu lorraine nancy

[PDF] station radio france bleu paris

[PDF] stationnement gratuit lille

[PDF] statista food delivery industry

[PDF] statistical report sample pdf

[PDF] statistics canada international students

[PDF] statistics class 10 full chapter

[PDF] Stata Multiple-Imputation Reference Manual

STATAMULTIPLE-IMPUTATION

REFERENCEMANUAL

RELEASE18®A Stata Press Publication

StataCorp LLC

College Station, Texas

®Copyright

1985-2023 StataCorp LLC

All rights reserved

Version 18

Typeset in T

ISBN-10: 1-59718-391-1

ISBN-13: 978-1-59718-391-8

1979 by Consumers Union of U.S.,

NetCourseNow is a trademark of StataCorp LLC.

The suggested citation for this software is

Contents

Intro substantive

Estimation

Styles

Technical

Workflow

Glossary

Subject and author index

Cross-referencing the documentation

Description

Anderson

Hartle yand Hocking

Dempster ,Laird, and Rubin

Little

Schafer

Little and Rubin

Carlin et al.

Ro yston

Reiter and

Raghunathan

Carlin, Galati, and Ro yston

Ro yston,Carlin, and White

White,

Royston, and Wood

Carpenter and K enward

Schafer

Remarks and examples

Motivating example

What is multiple imputation?

Theory underlying multiple imputation

How large should M be?

Assumptions about missing data

Patterns of missing data

Proper imputation methods

Analysis of multiply imputed data

A brief introduction to MI using Stata

Summary

Motivating example

2Intr osubstantive - Intr oductionto m ultiple-imputationanal ysis

Observations: 154 Fictional heart attack data;

BMI missing

Iteration 0: Log likelihood = -91.359017

Iteration 1: Log likelihood = -79.374749

Iteration 2: Log likelihood = -79.342218

Iteration 3: Log likelihood = -79.34221

Logistic regression Number of obs = 132

LR chi2(5) = 24.03

Prob > chi2 = 0.0002

Obs<.Unique

VariableObs=. Obs>. Obs<.values Min Max

What is multiple imputation?

1.Imputation step.Mimputations (completed datasets) are generated under some chosen

2.Completed-data analysis (estimation) step. The desired analysis is performed separately on

3.Pooling step. The results obtained fromMcompleted-data analyses are combined into a