In the field of data quality, imputation is the most used method for handling missing data. The performance of imputation techniques is influenced by various factors, especially when data represent only a sample of population, for example the survey design characteristics. In this paper, we compare the results of different multiple imputation methods in terms of final estimates when outliers occur in a dataset. Consequently, in order to evaluate the influence of outliers on the performance of these methods, the procedure is applied before and after that we have identified and removed them. For this purpose, missing data were simulated on data coming from sample ISTAT annual survey on Small and Medium Enterprises. MAR mechanism is assumed for missing data. The methods are based on the multiple imputation through the Markov Chain Monte Carlo (MCMC), the propensity score and the mixture models. The results highlight the strong influence of data characteristics on final estimates.
Influence of Outliers on Some Multiple Imputation Methods
QUINTANO, Claudio;CASTELLANO, Rosalia;ROCCA, Antonella
2010-01-01
Abstract
In the field of data quality, imputation is the most used method for handling missing data. The performance of imputation techniques is influenced by various factors, especially when data represent only a sample of population, for example the survey design characteristics. In this paper, we compare the results of different multiple imputation methods in terms of final estimates when outliers occur in a dataset. Consequently, in order to evaluate the influence of outliers on the performance of these methods, the procedure is applied before and after that we have identified and removed them. For this purpose, missing data were simulated on data coming from sample ISTAT annual survey on Small and Medium Enterprises. MAR mechanism is assumed for missing data. The methods are based on the multiple imputation through the Markov Chain Monte Carlo (MCMC), the propensity score and the mixture models. The results highlight the strong influence of data characteristics on final estimates.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.