In these last years, the attention given to quality in statistical surveys is increased. With reference to non responses, various imputation techniques have been proposed to fill the dataset. In fact, imputation techniques provide a useful strategy for dealing with data sets with missing values. In this work, after a wide overview on data quality concerns and dimensions, the Author presents the results of an experimentation of missing data simulation to sample data coming from ISTAT survey on Small and Medium Enterprises, Arts and Professions. In particular, three different multiple imputation methods are applied and compared considering their capacity to estimate mean, median and variance population. They are: Markov Chain Monte Carlo (MCMC) method, the propensity score and mixture models. Moreover, the application of these methods is repeated after outlier elimination from dataset and these new results are compared with the previous ones. The comparison highlights how data characteristics influence the final estimates; in fact, for normally distributed data, simplier methods, like MCMC, are sufficient to obtain good results. Otherwise, more complex models, as those based on mixture models, result more adequate.
Complex Surveys and Economic Data Quality: An Application of Multiple Imputation to Small and Medium Enterprises Survey
ROCCA, Antonella
2010-01-01
Abstract
In these last years, the attention given to quality in statistical surveys is increased. With reference to non responses, various imputation techniques have been proposed to fill the dataset. In fact, imputation techniques provide a useful strategy for dealing with data sets with missing values. In this work, after a wide overview on data quality concerns and dimensions, the Author presents the results of an experimentation of missing data simulation to sample data coming from ISTAT survey on Small and Medium Enterprises, Arts and Professions. In particular, three different multiple imputation methods are applied and compared considering their capacity to estimate mean, median and variance population. They are: Markov Chain Monte Carlo (MCMC) method, the propensity score and mixture models. Moreover, the application of these methods is repeated after outlier elimination from dataset and these new results are compared with the previous ones. The comparison highlights how data characteristics influence the final estimates; in fact, for normally distributed data, simplier methods, like MCMC, are sufficient to obtain good results. Otherwise, more complex models, as those based on mixture models, result more adequate.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.