[PDF] Multiple Imputation for Missing Data: Concepts and - SAS Support PDF multipleimputation.pdf

The paper presents SAS®procedures, PROC MI and PROC MIANALYZE, for creating multiple im- putations for incomplete multivariate data and for analyzing

[PDF] 265-2010: An Introduction to Multiple Imputation of - SAS Support

imputation using PROC MI, analysis of imputed data sets using SAS analysis impute missing data, the focus of this paper is use of multiple imputation methods

[PDF] Multiple Imputation of Missing Data Using SAS® - SAS Support

Central to this book is the method of multiple imputation (MI) for item missing data Supported by the SAS PROC MI and PROC MIANALYZE procedures, MI is

[PDF] for Multiple Imputation and Analysis of Longitudinal Data - SAS

15 avr 2018 · See the SAS/STAT PROC MI documentation, Rubin (1987), Schafer (1997), or Raghunathan (2016) for more on these topics Page 2 2 MULTIPLE IMPUTATION

[PDF] Utilisation de la procédure MI et MIANALYZE Pour limputation

3 nov 2016 · Etc M Nguile-Makao Ph D Imputation multiple/Proc MI MIANALYZE SAS 9 3

[PDF] Multiple imputation as a valid way of dealing with - LexJansen

It presents SAS (PROC MI and PROC MIANALYZE) and R (MICE package) procedures for creating multiple imputations for incomplete multivariate data, analyzes

[PDF] Missing Data Techniques with SAS - IDRE Stats

data, focusing on multiple imputation 2 Issues that Implementation of SAS Proc MI procedure Multiple values are imputed rather than a single value to

[PDF] Imputation de données manquantes sous SAS Le 14 octobre 2019

Niveau du cours SAS Base, savoir programmer en code SAS Connaître les modèles linéaires (régression linéaire multiple, régression logistique) Joindre le

[PDF] imputation multiple sous r

[PDF] imputation par la médiane

[PDF] imputation rationnelle controle de gestion

[PDF] imputation rationnelle définition

[PDF] imputation rationnelle des charges fixes definition

[PDF] imputation rationnelle des charges fixes exercices corrigés

[PDF] imran hosein books arabic pdf

[PDF] imran hosein francais 2017

[PDF] imran hosein pdf francais

[PDF] imt

[PDF] imt orange

[PDF] imt paris

[PDF] imt pole emploi

[PDF] in company worksheets

[PDF] in windows vista

Multiple Imputation for Missing Data: Concepts and New

Development (Version 9.0)Yang C. Yuan, SAS Institute Inc., Rockville, MDAbstractMultiple imputation provides a useful strategy for dealing

with data sets with missing values. Instead of filling in a single value for each missing value, Rubin"s (1987) multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed by using standard procedures for com- plete data and combining the results from these analyses. No matter which complete-data analysis is used, the pro- cess of combining results from different imputed data sets is essentially the same. This results in valid statistical in- ferences that properly reflect the uncertainty due to missing values. This paper reviews methods for analyzing missing data, in- cluding basic concepts and applications of multiple impu- tation techniques. The paper presents SASâprocedures, PROC MI and PROC MIANALYZE, for creating multiple im- putations for incomplete multivariate data and for analyzing results from multiply imputed data sets. The MI and MIANALYZE procedures, which were intro- duced as experimental software in Releases 8.1 and 8.2, are production software in Version 9.0. The syntax and examples in this paper apply to Version 9.0. The follow- ing enhancements have been made to the MI procedure in Version 9.0:•A new REGPMM option in the MONOTONE state- ment and a new PMM option in the MCMC statement request the predicted mean matching method for im- putation. This method imputes an observed value which is closest to the predicted value from the simu- lated regression model for each missing value.•A flexible model specification in the MONOTONE statement allows a different set of covariates to be specified for each imputed variable. The following changes and enhancements have been made to the MIANALYZE procedure in Version 9.0:•A new MODELEFFECTS statement allows you to specify the effects in the data set to be analyzed.

This statement replaces the VAR statement, which

was used in Releases 8.1 and 8.2.•A new STDERR statement provides standard er- rors associated with effects in the MODELEFFECTS statement. The statement can be used for univari- ate inference when the input DATA= data set con- tains both parameter estimates and standard errors as variables.•A new TEST statement tests linear hypotheses about the parameters. This paper also describes new experimental features in Version 9.0 for specification of classification variables in the MI and MIANALYZE procedures.IntroductionMost SAS statistical procedures exclude observations with any missing variable values from the analysis. These obser- vations are called incomplete cases. While using only com- plete cases has its simplicity, you lose information in the incomplete cases. This approach also ignores the possi- ble systematic difference between the complete cases and incomplete cases, and the resulting inference may not be applicable to the population of all cases, especially with a smaller number of complete cases. Some SAS procedures use all the available cases in an analysis, that is, cases with available information. For ex- ample, PROC CORR estimates a variable mean by using all cases with nonmissing values on this variable, ignor- ing the possible missing values in other variables. PROC CORR also estimates a correlation by using all cases with nonmissing values for this pair of variables. This may make better use of the available data, but the resulting correlation matrix may not be positive definite. Another strategy is single imputation, in which you substi- tute a value for each missing value. Standard statistical pro- cedures for complete data analysis can then be used with the filled-in data set. For example, each missing value can be imputed from the variable mean of the complete cases. This approach treats missing values as if they were known in the complete-data analyses. Single imputation does not reflect the uncertainty about the predictions of the unknown missing values, and the resulting estimated variances of the parameter estimates will be biased toward zero.1 Instead of filling in a single value for each missing value, a multiple imputation procedure (Rubin 1987) replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. The multiply imputed data sets are then analyzed by using standard pro- cedures for complete data and combining the results from these analyses. No matter which complete-data analysis is used, the process of combining results from different data sets is essentially the same. Multiple imputation does not attempt to estimate each miss- ing value through simulated values but rather to represent a random sample of the missing values. This process results in valid statistical inferences that properly reflect the uncer- tainty due to missing values; for example, valid confidence intervals for parameters.

Multiple imputation inference involves three distinct phases:•The missing data are filled inmtimes to generatem

complete data sets.•Themcomplete data sets are analyzed by using standard procedures.•The results from themcomplete data sets are com- bined for the inference. The MI procedure in the SAS/STAT Software is a multi- ple imputation procedure that creates multiply imputed data sets for incompletep-dimensional multivariate data. It uses methods that incorporate appropriate variability across the mimputations. Once themcomplete data sets are ana- lyzed by using standard procedures, the MIANALYZE pro- cedure can be used to generate valid statistical inferences about these parameters by combining results from them

complete data sets.Ignorable Missing-Data MechanismLetYbe then×pmatrix of complete data, which is not

fully observed, and denote the observed part ofYbyYobs and the missing part byYmis. The SAS multiple imputation procedures assume that the missing data are missing at random (MAR), that is, the probability that an observation is missing may depend onYobs, but not onYmis(Rubin 1976;

1987, p. 53).

For example, consider a trivariate data set with variablesY1 andY2fully observed, and a variableY3that has missing values. MAR assumes that the probability thatY3is missing for an individual may be related to the individual"s values of variablesY1andY2, but not to its value ofY3. On the other hand, if a complete case and an incomplete case for Y

3with exactly the same values for variablesY1andY2have

systematically different values, then there exists a response bias forY3and it is not MAR. The MAR assumption is not the same as missing com- pletely at random (MCAR), which is a special case of MAR. With MCAR, the missing data values are a simple random sample of all data values; the missingness does not depend

on the values of any variables in the data set.Furthermore, these SAS procedures also assume that the

parametersθof the data model and the parametersφof the missing data indicators are distinct. That is, knowing the val- ues ofθdoes not provide any additional information about φ, and vice versa. If both MAR and distinctness assump- tions are satisfied, the missing-data mechanism is said to be ignorable.Imputation MechanismsThis section describes three methods that are available in the MI procedure. The method of choice depends on the type of missing data pattern. For monotone missing data patterns, either a parametric regression method that as- sumes multivariate normality or a nonparametric method that uses propensity scores is appropriate. For an arbitrary missing data pattern, a Markov chain Monte Carlo (MCMC) method (Schafer 1997) that assumes multivariate normality can be used. A data set is said to have a monotone missing pattern when the event that a variableYjis missing for the individuali implies that all subsequent variablesYk,k>j, are all missing for the individuali. When you have a monotone missing data pattern, you have greater flexibility in your choice of strategies. For example, you can implement a regression model without involving iterations as in MCMC. When you have an arbitrary missing data pattern, you can often use the MCMC method, which creates multiple impu- tations by using simulations from a Bayesian prediction dis- tribution for normal data. Another way to handle a data set with an arbitrary missing data pattern is to use the MCMC approach to impute enough values to make the missing data pattern monotone. Then, you can use a more flexible impu- tation method.Regression MethodIn the regression method, a regression model is fitted for each variable with missing values. Based on the resulting model, a new regression model is then drawn and is used to impute the missing values for the variable (Rubin 1987, pp.

166-167.) Since the data set has a monotone missing data

pattern, the process is repeated sequentially for variables with missing values. That is, for a variableYjwith missing values, a modelYj=β0+β1X1+β2X2+...+βkXk is fitted using observations with observed values for the vari- ableYjand its covariatesX1,X2, ...,Xk. The fitted model includes the regression parameter esti- mates ˆβ= (ˆβ0,ˆβ1,...,ˆβk)and the associated covariance matrixˆσ2jVj, whereVjis the usualX?Xinverse matrix de- rived from the intercept and covariatesX1,X2, ...,Xk.2 The following steps are used to generate imputed values for each imputation:

1. New parametersβ?= (β?0,β?1,...,β?(k))andσ2?jare

drawn from the posterior predictive distribution of the pa- rameters. That is, they are simulated from(ˆβ0,ˆβ1,...,ˆβk),

2j, andVj. The variance is drawn asσ2?j= ˆσ2j(nj-k-1)/g

wheregis aχ2nj-k-1random variate andnjis the number of nonmissing observations forYj. The regression coeffi- cients are drawn asβ?=ˆβ+σ?jV?hjZ whereV?hjis the upper triangular matrix in the Cholesky decomposition,Vj=V?hjVhj, andZis a vector ofk+ 1 independent random normal variates.

2. The missing values are then replaced byβ?0+β?1x1+β?2x2+...+β?(k)xk+ziσ?j

wherex1,x2,...,xkare the values of the covariates andzi is a simulated normal deviate. Note that the predictive mean matching method can also be used for imputation. It is similar to the regression method except that for each missing value, it imputes an observed value which is closest to the predicted value from the sim- ulated regression model (Rubin 1987, p. 168). The pre- dictive mean matching method ensures that imputed values are plausible and may be more appropriate than the regres- sion method if the normality assumption is violated (Horton

and Lipsitz 2001, p. 246).Example: Regression MethodThis example uses the regression method to impute miss-

ing values for all variables in a data set with a monotone missing pattern. The data setFish1used here is a modi- fied version of theFishdata set described in the SAS/STAT documentation for the STEPDISC procedure. The data setFish1data set contains three measurements for a single species of fish: the length from the nose of the fish to the beginning of the tail (Length1), the length from the nose to the notch of the tail (Length2), and the length from the nose to the end of the tail (Length3). Some values have been set to missing, so that the data set has a mono- tone missing pattern in variablesLength1,Length2, and

Length3.*---------------Data on Fish Measurements-------------*| The Fish1 data set contains only one species of || fish and the three length measurements. Some values || have been set to missing and the resulting data set || has a monotone missing pattern in variables || Length1, Length2, and Length3. |*-----------------------------------------------------*;data Fish1;input Length1 Length2 Length3 @@;datalines;23.2 25.4 30.0 24.0 26.3 31.223.9 26.5 31.1 26.3 29.0 33.526.5 29.0 . 26.8 29.7 34.726.8 . . 27.6 30.0 35.027.6 30.0 35.1 28.5 30.7 36.228.4 31.0 36.2 28.7 . .29.1 31.5 . 29.5 32.0 37.329.4 32.0 37.2 29.4 32.0 37.230.4 33.0 38.3 30.4 33.0 38.530.9 33.5 38.6 31.0 33.5 38.731.3 34.0 39.5 31.4 34.0 39.231.5 34.5 . 31.8 35.0 40.631.9 35.0 40.5 31.8 35.0 40.932.0 35.0 40.6 32.7 36.0 41.532.8 36.0 41.6 33.5 37.0 42.635.0 38.5 44.1 35.0 38.5 44.036.2 39.5 45.3 37.4 41.0 45.938.0 41.0 46.5;The following statements invoke the MI procedure and re-

quotesdbs_dbs14.pdfusesText_20