Capture-recapture

From
Jump to: navigation, search

Capture-recapture techniques are widely used in field epidemiology to estimate population sizes and evaluate the sensitivity of surveillance systems. These techniques, initially developed for wildlife population studies, have been successfully applied in various public health settings to assess the completeness of data collection and monitor disease incidence. In this article, we will discuss the basics of capture-recapture methods, along with examples illustrating their application in estimating population size and evaluating surveillance system sensitivity.

The Basics

Capture-recapture methods involve two or more independent data sources (also known as 'capture' and 'recapture' sources) that record the same target population or event. The core principle is to compare the overlap between these data sources, allowing researchers to estimate the total population size or event count that may not have been captured by either source alone.

There are several capture-recapture models, including the basic two-source model, the Lincoln-Petersen index, and more advanced multi-source models. While the two-source model is the simplest, it assumes that the probability of being captured in each data source is constant and independent of other sources, which may not always be realistic.

Example: Estimating Population Size

Suppose a large number of refugees have arrived at a temporary settlement due to an ongoing conflict in their home country. The camp authorities and humanitarian organizations need to estimate the size of the refugee population to ensure the adequate distribution of resources, such as food, water, and medical supplies. However, due to the chaotic nature of the situation and the lack of a centralized registration system, it is difficult to obtain an accurate count of the refugee population. To estimate the size of the refugee population, epidemiologists and aid organizations decide to use capture-recapture techniques, utilizing two independent data sources:

  • Source A: Health clinic registration data
  • Source B: Food distribution registration data

These data sources are chosen because they are likely to capture a significant proportion of the refugee population, as most individuals will require healthcare and food at some point. However, each data source may miss some individuals, making the capture-recapture technique a suitable method for estimating the total population.

Data Collection

  • Source A (health clinic registration) has recorded 8,000 individuals.
  • Source B (food distribution registration) has recorded 10,000 individuals.
  • Among these individuals, 5,000 are present in both registration systems.

Estimating the Refugee Population Size

Using the basic two-source capture-recapture model, the total refugee population size (N) can be estimated with the formula:

N = (n1 * n2) / m

where n1 is the number of individuals identified in Source A, n2 is the number of individuals identified in Source B, and m is the number of individuals common to both sources.

N = (8,000 * 10,000) / 5,000 = 16,000

Thus, the estimated total size of the refugee population is 16,000.

Example: Evaluating the Sensitivity of a Surveillance System

Capture-recapture methods can also be used to evaluate the sensitivity of a surveillance system. Sensitivity refers to the ability of the system to correctly identify and report cases of a disease or event.

Suppose a public health agency wants to evaluate the sensitivity of its tuberculosis (TB) surveillance system. The agency has two independent sources of TB case reports: the mandatory reporting system (Source A) and a laboratory-based reporting system (Source B).

  • Source A reports 1,000 TB cases.
  • Source B reports 800 TB cases.
  • Among these cases, 700 are common to both sources.

Using the two-source capture-recapture model, the estimated total number of TB cases in the population (N) is:

N = (n1 * n2) / m = (1,000 * 800) / 700 = 1,142.86 (approximately 1,143 cases)

To evaluate the sensitivity of the surveillance system (Source A), the proportion of reported cases in Source A relative to the estimated total cases is calculated:

Sensitivity = n1 / N = 1,000 / 1,143 ≈ 0.875 (87.5%)

This analysis indicates that the TB surveillance system (Source A) has a sensitivity of 87.5%.

Contributors