Aggregate administrative data to adjust selection bias in estimates from nonprobability samples
- Modesto Escobar Mercado Director
Universidad de defensa: Universidad de Salamanca
Fecha de defensa: 14 de octubre de 2021
- Vidal Díaz de Rada Igúzquiza Presidente/a
- Araceli Mateos Díaz Secretaria
- María José Hierro Hernández Vocal
Tipo: Tesis
Resumen
In recent years, the concurrence of two phenomena has revitalised the methodological debate about inference from nonprobability samples. On the one hand, probability samples increasingly suffer from nonresponse and noncoverage errors, increasing survey costs and leading to biased estimates. On the other hand, the emergence and expansion of the Internet have led to an exponential growth in the use of web surveys with samples recruited using nonprobability methods. Inference from nonprobability samples requires an explicit or implicit model that explains the selection mechanism with respect to the target variable. This thesis explores an intersection between the need to reduce selection bias in the estimates from nonprobability samples and the opportunity to explain the selection mechanism emerging from newly available aggregate administrative data. To this end, this thesis encompasses three papers that present statistical simulations and two methodological applications using a set of face-to-face and two web surveys conducted in Spain. The first paper uses statistical simulations to explore the conditions under which aggregated data as contextual variables and population totals can reduce or remove selection bias from the estimates. The second paper explores adding sociodemographic and past vote auxiliary variables to the weighting as well as using multiple imputation to improve the quality of the estimates using the pre and post-election surveys of the Centro de Invest-gaciones Sociológicas (CIS) that combine probability selection methods with quotas. The third article tests the effect of including aggregate administrative data at the municipality level to tackle selection bias and improve the quality of the survey estimates using two surveys from an experimental panel of internet users sponsored by the Association for Media Research (AIMC). The results show that aggregate administrative data is insufficient to correct selection bias in survey estimates, especially when used as contextual variables. The results also suggest that the aggregate nature of the data is the main impediment to control for selection bias in the estimates.