In these last years, the attention given to quality in statistical surveys is increased. With reference to non responses, various imputation techniques have been proposed to fill the dataset. In fact, imputation techniques provide a useful strategy for dealing with data sets with missing values. In this work, after a wide overview on data quality concerns and dimensions, the Author presents the results of an experimentation of missing data simulation to sample data coming from ISTAT survey on Small and Medium Enterprises, Arts and Professions. In particular, three different multiple imputation methods are applied and compared considering their capacity to estimate mean, median and variance population. They are: Markov Chain Monte Carlo (MCMC) method, the propensity score and mixture models. Moreover, the application of these methods is repeated after outlier elimination from dataset and these new results are compared with the previous ones. The comparison highlights how data characteristics influence the final estimates; in fact, for normally distributed data, simplier methods, like MCMC, are sufficient to obtain good results. Otherwise, more complex models, as those based on mixture models, result more adequate.

Complex Surveys and Economic Data Quality: An Application of Multiple Imputation to Small and Medium Enterprises Survey

ROCCA, Antonella
2010

Abstract

In these last years, the attention given to quality in statistical surveys is increased. With reference to non responses, various imputation techniques have been proposed to fill the dataset. In fact, imputation techniques provide a useful strategy for dealing with data sets with missing values. In this work, after a wide overview on data quality concerns and dimensions, the Author presents the results of an experimentation of missing data simulation to sample data coming from ISTAT survey on Small and Medium Enterprises, Arts and Professions. In particular, three different multiple imputation methods are applied and compared considering their capacity to estimate mean, median and variance population. They are: Markov Chain Monte Carlo (MCMC) method, the propensity score and mixture models. Moreover, the application of these methods is repeated after outlier elimination from dataset and these new results are compared with the previous ones. The comparison highlights how data characteristics influence the final estimates; in fact, for normally distributed data, simplier methods, like MCMC, are sufficient to obtain good results. Otherwise, more complex models, as those based on mixture models, result more adequate.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11367/24218
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact