Combining Probability and Nonprobability Samples: Methods and Comparisons | Lecture notes Literature

Estimation Methods for Nonprobability Samples

with a Companion Probability Sample

Michael Yang, Nada Ganesh, Edward Mulrow, and Vicki Pineau

NORC at the University of Chicago, 4350 East-West Highway. 8th Floor, Bethesda, MD 20814

Abstract

Probability sampling has been the standard basis for inference from a sample to a target population. In the

era of big data and increasing data collection costs, however, there has been growing demand for estimation

methods to combine probability and nonprobability samples in order to improve the cost efficiency of

survey estimation without loss of statistical accuracy (or perhaps even with improvements in statistical

accuracy). An array of methods for combining probability and nonprobability samples are found in the

literature, which we have classified into the following methodological groups: calibration, statistical

matching, super-population modeling, and propensity-based weighting. In addition, NORC researchers

have developed a hybrid calibration method that incorporates “borrowed strength” methods from small area

estimation in order to explicitly account for bias associated with the nonprobability sample. We compare

and contrast the nonprobability weights and estimates derived from all the methods from food allergies

survey data, which were collected via both a probability sample and a nonprobability sample.

Key Words: probability sample, nonprobability sample, fit-for-purpose

1. Introduction

While probability sampling remains the gold standard for survey estimation, often the incidence rate for a

study’s target population is so low that complete sampling frames are not available, or probability sampling

methods for surveying the target population are too expensive. Thus, there has been growing demand for

methods that use nonprobability samples and methods that combine probability and nonprobability samples

in order to improve the statistical and cost efficiency of survey estimation.

Nonprobability samples may provide a lower cost alternative to probability samples; however, the quality

of the data is oftentimes low, and in particular estimates based on nonprobability samples may be biased.

A well thought out approach to using nonprobability samples, alone or in conjunction with probability

samples, should provide a way to assess the quality of the data and determine its fitness for use. This paper

reports some preliminary results into our research about estimation methods based on both a probability

and a nonprobability sample. Specifically, using data collected by NORC, this paper compares the

distribution of the nonprobability sampling weights (all of which are effectively modeled weights) and the

weighted estimates under five different estimation methods. Although the probability sample may be used

in modeling the nonprobability sample weights, the comparisons reported here are based on the

nonprobability sample alone. Further results based on the combination of probability and nonprobability

samples will be reported in a subsequent paper.

2. Methods Investigated

Researchers have proposed and experimented with a range of estimation methods based on nonprobability

samples for decades. More recently, there has been increased interest in estimation methods that use both

probability and nonprobability samples. We conducted a literature review to identify and delineate methods

reported in journals, workshops, and conferences. Our focus was on reported studies in recent years that

Combining Probability and Nonprobability Samples: Methods and Comparisons, Lecture notes of Literature