Which indicator to map?
Count of cases Counts are used to display the burden of the disease in the population. This helps policy makers and control programme managers to target programmes and allocate resources to areas most affected. However, expressing indicators as count of cases does not allow identifying areas with increased risk of transmission as the population varies across geographical areas.
Crude rates Crude rates are a summary measure of the incidence of a disease in a population. They are calculated by dividing the number of cases (or deaths) of a disease having occurred in a certain period (often one year) by the average population in the area during the period. Rates are expressed per 1000 or 100,000 inhabitants, according to the frequency of the disease. Rates allow comparison between geographical areas by accounting for varying population size.
In outbreak investigations, rates are usually expressed for the epidemic period and referred to as attack rates.
Age and/or sex specific rates Crude rates may be confounded by age and/or sex if the distribution of the disease is known to be associated with age and/or sex and if the population structure by age and/or sex varies across geographical areas. In some countries for instance, tuberculosis is known to occur at an increased rate among elderly people and elderly people are more represented in rural areas than in urban areas. Summarizing the incidence of tuberculosis using a crude rate will tend to over-represent rural areas with large elderly population while the risk of being infected, at a specific age, is not necessarily higher.
Mapping age and/or sex-specific rates controls for these potential confounders. However, maps cannot easily represent several age and/or sex specific rates in a single display and need to be repeated to reflect all age and/or sex groups.
Standardized rates Visual inspection of age and/or sex specific rates across geographical areas is a pre-requisite to mapping data. Whenever there are large variations of rates between age and/or sex categories, summarizing the incidence through standardized rates may not be indicated. However, there are instances where such a summary incidence is useful to assess risks of transmission across geographical areas, after controlling for age and/or sex potential confounders. This is achieved by a method called standardization of rates.
The use of crude rates when age-specific incidence and population structure differs, as in Table 1, can result in the overall crude rate in district B being greater than that in district A (5.0 vs. 4.8) while age-specific rates in district B are both smaller than in district A (6.9 vs. 7.0 and 2.5 vs. 3.1). This paradox, called Simpson paradox, results from the confusion induced by age.
Table 1: Distribution of cases, population and rates of a disease by age group, in 2 hypothetical districts
District A | Cases | Population | Rate* | District B | Cases | Population | Rate* | |
---|---|---|---|---|---|---|---|---|
0-39 years | 42 | 600 | 7.0 | 0-39 years | 55 | 800 | 6.9 | |
40 years & + | 25 | 800 | 3.1 | 40 years & + | 15 | 600 | 2.5 | |
Total | 67 | 1 400,000 | 4.8 | Total | 70 | 1 400,000 | 5.0 | |
* cases/100,000 |
In these instances, standardization of rates is the technique required to control this confounder if a single summary incidence value is desired.
Direct standardization
Direct standardization consists of weighing age-specific rates by applying them to a reference population. Age-specific rates from district A and B are applied to a reference population for calculating age and/or sex-standardized rates. Controlling for age confounder by direct standardization as presented in Table 2 shows that district B has an age-standardized rate smaller than district A, as expected when inspecting age-specific rates for both districts. The reference population can be an external population used at country level, such as the country population, for standardizing several indicators, or some international reference populations to allow for international comparisons. It can be the average population in the 2 districts, as in our example, if the objective is simply to compare the 2 areas.
Table 2: Calculation of age-standardized rates in 2 hypothetical districts by direct standardization
District A
District B
Age group
Reference population
Observed rate
Expected cases
Observed rate
Expected cases
0-39 years
1400000
7,0
98
6,9
96
40 years & +
1400000
3,1
44
2,5
35
Total
2800000
142
131
Age-standardized rate
5,1
4,7
Indirect standardization
When the age distribution of the cases is not available in district A and B, or if age-specific rates are unstable in relation with small figures, indirect standardization is indicated. It consists of applying reference age-specific rates to the populations of study. This yields the expected number of cases in each district, if incidence had been in accordance with the reference model. The age-standardized incidence ratio is calculated by dividing the number of observed deaths over the number of expected. It is sometimes multiplied by 100 and expressed as a percentage. Table 3 shows, in our theoretical example, that the incidence in district A is 1.02 times the reference incidence and 0.95 times in district B, which shows that the incidence is lower after standardizing on age.
Table 3: Calculation of age-standardized rate ratios in 2 hypothetical districts by indirect standardization
District A
District B
Age group
Reference rates
Population
Expected cases
Population
Expected cases
0-39 years
7,0
600,000
42,0
800,000
56,0
40 years & +
3,0
800,000
24,0
600,000
18,0
Total
1 400,000
66,0
1 400,000
74,0
Observed cases
67
70
Age-standardized rate ratio (SRR)
1,02
0,95
Strategy for standardization
When considering whether standardization is indicated, the first step is to consider whether the mapping of the data can be confounded by variables such as age and/or sex. If the disease is not associated with age or sex, standardization on these variables is not required. Similarly, if the age and/or sex structure of the population is identical across geographical areas, standardization on age and/or sex is not required. In other instances, standardization is required if a summary value of the incidence of the disease is desired, in order to control for the induced confounding effect.
When mapping the data can potentially be confounded by age and/or sex, using age and/or sex-specific rates allow accurate comparisons of the geographical distribution of the disease. However, whenever summary incidence information is preferred, age and/or sex standardized rates are indicated.
Direct standardization allows better comparability across geographical areas but may be unreliable if age-specific rates are based on small numbers. In addition, age-standardized rates represent hypothetical values which have no base in reality. Indirect standardization requires less detailed information on cases. It is expressed as percents of a reference situation, which is easily understood. However, indirect standardization of rates is less robust for comparing different geographical areas when the population structure is very heterogeneous.