SSRS Solutions

SSRS Encipher Nonprob

Improving Weighting for Nonprobability-only Samples

Set,Of,Colorful,Metal,Tuning,Forks,,Sound,Healing,And,Acutonic

By leveraging the SSRS Opinion Panel and advanced predictive modeling techniques, Encipher Nonprob allows the SSRS Calibration Item Bank and a version of the SSRS Stepwise Calibration algorithm to be applied to nonprobability-only samples.

As with Encipher Hybrid, Encipher Nonprob requires that a small selection of calibrators from the Calibration Item Bank—customized to the topic of the study—be included on the study questionnaire.

The Encipher Nonprob solution adjusts for unmeasured bias in the nonprobability sample in two ways:

Benchmarking Surveys

First, SSRS periodically administers “benchmarking surveys” to produce probability-based benchmarks for the calibrators from our Encipher Calibration Item Bank. These benchmarking surveys are administered to samples from the SSRS Opinion Panel, a mixed-mode probability panel recruited via nationally representative address-based sampling (ABS) and random digit dialing (RDD). This gives our weighting team access to probability-based benchmarks for the items in our Calibration Item Bank. These benchmarks are used as weighting targets for nonprobability-only studies that do not themselves include a parallel probability sample.

Model-Assisted Stepwise Calibration (MASC)

The second component of Encipher Nonprob is a modified version of Stepwise Calibration referred to as Model-Assisted Stepwise Calibration (MASC). MASC uses random forest models to predict the direction and magnitude of selection bias in key outcomes. It then applies the Stepwise Calibration algorithm to choose a calibration model that minimizes this predicted bias. For additional information on MASC and how it differs from the hybrid version of Stepwise Calibration.

Validation Study of the SSRS Encipher Nonprob Approach

Study Design

To validate the Encipher Nonprob methodology, we used several surveys that were fielded simultaneously to (1) probability samples selected from the SSRS Opinion Panel and (2) nonprobability samples obtained from an opt-in panel vendor.

These surveys included calibrators from the Calibration Item Bank as well as some selected outcome items covering various topic areas. All items were administered to both the probability and nonprobability samples, allowing us to assess the effectiveness of calibration at reducing selection bias in the outcomes.

In the results that follow, we compare estimated outcomes between four hypothetical designs:

  • Probability: using only the probability sample from the SSRS Opinion Panel, weighted on standard demographics. This provides the benchmark against which we measure selection bias in the nonprobability designs.
  • Nonprobability – demo: using only the nonprobability sample, weighted only on standard demographics (i.e., not using Encipher Nonprob).
  • Nonprobability – Encipher: using only the nonprobability sample, applying Encipher Nonprob to calibrate both on standard demographics and on non-demographic calibrators.
  • Pseudo-hybrid: using only the nonprobability sample but applying the hybrid version of Stepwise Calibration—that is, optimizing on the actual bias in outcomes, rather than on the model-based prediction of bias. We call this a “pseudo-hybrid” design because (unlike in a true hybrid design) we still produce the final estimates only using the nonprobability sample; but we use the probability-based estimates to help identify the calibration model that minimizes bias in the nonprobability estimates. We include this design as a point of comparison to assess how well the Encipher Nonprob solution (which assumes that we do not have probability-based data for outcomes) replicates the informational advantage afforded by a hybrid design (in which we do have probability-based data for outcomes)

Results: Example Outcome

Example Outcome: E-cigarette Use

Figure 1 shows one of the outcome estimates—the percent of adults who use e-cigarettes or other vaping products—under each of the four designs.

If we relied solely on standard demographic weighting, a nonprobability sample (Nonprobability – demo) would overestimate this percentage by about 6 percentage points, relative to the probability-based benchmark. Encipher Nonprob reduces this selection bias to about 3 percentage points, offering a substantial improvement over simple demographic-only weighting of the nonprobability sample.

Figure 1: Percent using e-cigarettes, by sampling/weighting methodology

NOTE: Error bars show the 95% confidence interval around the Probability estimate.

As illustrated by the Pseudo-hybrid results, we could further reduce selection bias by using the hybrid version of our calibration procedure, which optimizes on the actual rather than predicted bias. This additional bias reduction illustrates the value of the parallel probability sample within a hybrid design (relative to a nonprobability-only design). Specifically, we can develop a more effective weighting model when we have a probability-based benchmark that allows us to measure the actual selection bias in the outcomes. However, if this is not an option, the Encipher Nonprob solution, optimizing on the model-based prediction of bias, still improves upon simple demographic-only weighting.

Results: All Outcomes

Figure 2 generalizes these results, plotting the observed selection bias (i.e., the difference from the probability-based benchmark) for all outcomes collected in the validation study. The figure reports the mean and (due to the presence of one outlier) median selection bias across all outcomes. Overall, although not quite as effective as a hybrid design, Encipher Nonprob reduces the median selection bias by over 30% relative to standard demographic-only weighting.

Figure 2: Selection bias in validation study outcomes, by sampling/weighting methodology

NOTE: Vertical line shows the mean bias across outcomes. Data labels report the mean and (in parentheses) median bias across outcomes. Shading indicates statistical significance: light-shaded estimates are within the 95% confidence bounds of the corresponding Probability estimate, while dark-shaded estimates are outside the 95% confidence bounds.

Conclusion

The validation study reported here demonstrates that, by leveraging SSRS benchmarking samples and advanced predictive modeling techniques, Encipher Nonprob can improve upon standard demographic-only weighting for studies that rely entirely on nonprobability samples.

That said, this study also reinforces that hybrid designs should usually be preferred over nonprobability-only designs where feasible, since the availability of probability-based estimates for key study outcomes allows for the development of more effective calibration models. In a real-world study, without a parallel probability sample in which outcome items were asked, we ultimately cannot know whether the selected calibration model succeeded at meaningfully reducing selection bias.

SSRS’s team of methodologists and data scientists can provide consultation to help researchers consider the tradeoffs among the available options for working with nonprobability samples. For many studies, the SSRS Opinion Panel is a feasible source of affordable probability-based samples, making a full hybrid design accessible at a reasonable price point. For researchers who do not wish to administer a full questionnaire to a parallel probability sample, but still want to take advantage of the benefits of a hybrid design, another option would be to run a small sub-selection of critical outcomes and topic-customized calibration items on the SSRS Opinion Panel Omnibus.

For more information about how Encipher Hybrid or Encipher Nonprob could be useful for your study, contact us today.

Study Design

To validate the Encipher Nonprob methodology, we used several surveys that were fielded simultaneously to (1) probability samples selected from the SSRS Opinion Panel and (2) nonprobability samples obtained from an opt-in panel vendor.

These surveys included calibrators from the Calibration Item Bank as well as some selected outcome items covering various topic areas. All items were administered to both the probability and nonprobability samples, allowing us to assess the effectiveness of calibration at reducing selection bias in the outcomes.

In the results that follow, we compare estimated outcomes between four hypothetical designs:

  • Probability: using only the probability sample from the SSRS Opinion Panel, weighted on standard demographics. This provides the benchmark against which we measure selection bias in the nonprobability designs.
  • Nonprobability – demo: using only the nonprobability sample, weighted only on standard demographics (i.e., not using Encipher Nonprob).
  • Nonprobability – Encipher: using only the nonprobability sample, applying Encipher Nonprob to calibrate both on standard demographics and on non-demographic calibrators.
  • Pseudo-hybrid: using only the nonprobability sample but applying the hybrid version of Stepwise Calibration—that is, optimizing on the actual bias in outcomes, rather than on the model-based prediction of bias. We call this a “pseudo-hybrid” design because (unlike in a true hybrid design) we still produce the final estimates only using the nonprobability sample; but we use the probability-based estimates to help identify the calibration model that minimizes bias in the nonprobability estimates. We include this design as a point of comparison to assess how well the Encipher Nonprob solution (which assumes that we do not have probability-based data for outcomes) replicates the informational advantage afforded by a hybrid design (in which we do have probability-based data for outcomes)

Results: Example Outcome

Example Outcome: E-cigarette Use

Figure 1 shows one of the outcome estimates—the percent of adults who use e-cigarettes or other vaping products—under each of the four designs.

If we relied solely on standard demographic weighting, a nonprobability sample (Nonprobability – demo) would overestimate this percentage by about 6 percentage points, relative to the probability-based benchmark. Encipher Nonprob reduces this selection bias to about 3 percentage points, offering a substantial improvement over simple demographic-only weighting of the nonprobability sample.

Figure 1: Percent using e-cigarettes, by sampling/weighting methodology

NOTE: Error bars show the 95% confidence interval around the Probability estimate.

As illustrated by the Pseudo-hybrid results, we could further reduce selection bias by using the hybrid version of our calibration procedure, which optimizes on the actual rather than predicted bias. This additional bias reduction illustrates the value of the parallel probability sample within a hybrid design (relative to a nonprobability-only design). Specifically, we can develop a more effective weighting model when we have a probability-based benchmark that allows us to measure the actual selection bias in the outcomes. However, if this is not an option, the Encipher Nonprob solution, optimizing on the model-based prediction of bias, still improves upon simple demographic-only weighting.

Results: All Outcomes

Figure 2 generalizes these results, plotting the observed selection bias (i.e., the difference from the probability-based benchmark) for all outcomes collected in the validation study. The figure reports the mean and (due to the presence of one outlier) median selection bias across all outcomes. Overall, although not quite as effective as a hybrid design, Encipher Nonprob reduces the median selection bias by over 30% relative to standard demographic-only weighting.

Figure 2: Selection bias in validation study outcomes, by sampling/weighting methodology

NOTE: Vertical line shows the mean bias across outcomes. Data labels report the mean and (in parentheses) median bias across outcomes. Shading indicates statistical significance: light-shaded estimates are within the 95% confidence bounds of the corresponding Probability estimate, while dark-shaded estimates are outside the 95% confidence bounds.

Conclusion

The validation study reported here demonstrates that, by leveraging SSRS benchmarking samples and advanced predictive modeling techniques, Encipher Nonprob can improve upon standard demographic-only weighting for studies that rely entirely on nonprobability samples.

That said, this study also reinforces that hybrid designs should usually be preferred over nonprobability-only designs where feasible, since the availability of probability-based estimates for key study outcomes allows for the development of more effective calibration models. In a real-world study, without a parallel probability sample in which outcome items were asked, we ultimately cannot know whether the selected calibration model succeeded at meaningfully reducing selection bias.

SSRS’s team of methodologists and data scientists can provide consultation to help researchers consider the tradeoffs among the available options for working with nonprobability samples. For many studies, the SSRS Opinion Panel is a feasible source of affordable probability-based samples, making a full hybrid design accessible at a reasonable price point. For researchers who do not wish to administer a full questionnaire to a parallel probability sample, but still want to take advantage of the benefits of a hybrid design, another option would be to run a small sub-selection of critical outcomes and topic-customized calibration items on the SSRS Opinion Panel Omnibus.

For more information about how Encipher Hybrid or Encipher Nonprob could be useful for your study, contact us today.

Get all the details of Encipher Nonprob in our whitepaper.

Learn more about how Encipher Nonprob reduces the median selection bias by over 30% relative to standard demographic-only weighting of nonprobability-only samples.

Read It

Another Option: Encipher Hybrid

A Validated Methodology for Blending Probability and Nonprobability Samples

Encipher Hybrid is a unique and sophisticated method that leverages study-specific outcomes, advanced modeling techniques, and customized non-demographic measures to produce weighting margins that are optimized for reducing selection bias in key study outcomes.

Learn More

We're all a little MADS here. Meet the minds behind Encipher.

The SSRS Methods, Analytics, and Data Science (MADS) team created Encipher. They conceptualize and manage survey data collection from design to dissemination, and have extensive expertise in methodological experimentation, sampling, weighting, data collection planning, monitoring, adaptive and tailored design, data editing and imputation, documentation, reporting, and quantitative analysis. Get to know the MADS team.

Meet MADS

Methods, analytics, and data science is our thing.

Our highly experienced SSRS Methods, Analytics, and Data Science (MADS) team conceptualizes and manages survey data collection from design to dissemination. Let's start a conversation about how we can help with your next project!

Let's Talk