Quality checking

From
Jump to: navigation, search

The first step in analyzing surveillance data is assessing its quality by detecting entry errors, inconsistent data, and incomplete reporting. This is achieved by computing the frequency distributions of the variables in the data set. Reviewing these frequency distributions allows detection and correcting data entry errors and missing fields.

It is not uncommon to notice a round digit attraction on numeric fields such as age (ages ending in 0 and 5 being more represented than expected) or dates (day 01, 15, 10, and 20 being overrepresented compared to other days of the month). Such a lack of precision in the data cannot be corrected at the time of analysis but needs to be considered when interpreting data plotted by age or date.

When several date fields are part of the data set, such as date of onset, admission, confirmation or notification, calculation of delays between these sequential steps may highlight data entry errors (e.g. large delays due to an error on the year) or inconsistencies (e.g. negative delays due to confirmation occurring before onset).

Distribution frequencies by diseases and age or sex may contribute to detecting additional errors (e.g. neonatal tetanus among adults).

Not all errors can be corrected at the time of analysis. However, it is crucial to get a good understanding of the quality of the data and its limitation before analyzing and interpreting results.

To design an effective surveillance system, it is necessary to define for each disease the surveillance indicators best suited to trigger signals and which value of the indicator (threshold) is considered abnormal or unusual.

Indicators can be expressed as absolute numbers (usually appropriate for rare diseases with immediate notification), as proportions of notifications for a disease (proportional morbidity in the absence of denominators) or as incidence rates (weekly notification of the number of cases using population as the denominator, in case of common disease).

Indicators must be defined in terms of time and place (e.g. number of cases/week/district).

Thresholds are values of indicators above which the disease pattern is considered as abnormal or unusual and may require a public health intervention. For most epidemic-prone diseases under the immediate notification, the threshold is set to 1 as a single case requires a public health intervention (e.g. AFP, rabies, plague...). For more common diseases, thresholds can be set on the rate observed over a given time period (e.g., meningitis in Africa), or based on an increase in comparison with baseline data (e.g., influenza-like illness). Methods for setting thresholds are presented in the chapter Methods for setting thresholds in time series analysis.

At this stage, it is also important to define indicators to monitor better the surveillance process (e.g., timeliness, completeness).

Credits

FEM Editor 2007

  • Denis Coulombier

Original FEM Authors

  • Christophe Paquet
  • Arnold Tarantola
  • Philippe Quenel
  • Nada Ghosn

FEM Contributors

  • Denis Coulombier
  • Arnold Bosman
  • Vladimir Prikazsky

Contributors