
healthequal: Calculating summary measures of inequality
Daniel A. Antiporta, Patricia Menéndez, Katherine Kirkby, Ahmad Hosseinpoor
November, 2024
Source:vignettes/healthequal-package-demo.Rmd
      healthequal-package-demo.RmdMotivation
Measuring and monitoring inequalities in health is important for informing policies and programs that aim to tackle health inequities. Broadly defined, inequalities in health are measurable differences in health across population subgroups defined by dimensions of inequality (demographic, socioeconomic or geographic characteristics). Summary measures of inequality summarise the amount of inequality across subgroups in a single number – which facilitates the comparison of inequalities over time and across different settings and indicators.
Summary measures of health inequality use either disaggregated data or individual-level data as inputs.
- Disaggregated data refers to health indicator data recorded for different subgroups of a population that are defined according to dimensions of inequality, such as demographic, socioeconomic or geographic characteristics (such as wealth quintiles or subnational regions). Each record in the dataset contains data for a population subgroup. 
- Individual-level data refers to data where each record in the dataset contains data pertaining to an individual. 
About summary measures of health inequality
Simple summary measures (difference and ratio) compare two population subgroups. They can be calculated for all dimensions of inequality (with two subgroups or more). Complex measures are calculated for inequality dimensions with more than two population subgroups and consider the situation in all subgroups. They can only be calculated for dimensions with more than two subgroups.
Selecting appropriate measures for analysing and reporting inequality involves considering several methodological issues. There are considerations relating to the characteristics of the underlying data, which determines the types of measures that can be calculated and how they are calculated:
- Whether the dimension of inequality is ordered or non-ordered. Ordered dimensions have subgroups with an inherent ordering (such as education level or wealth quintiles), while non-ordered and binary dimensions have subgroups that cannot logically be ranked (such as subnational regions, ethnicity or sex).
- Whether the dimension of inequality has two or more than two subgroups. Simple measures can be calculated for all inequality dimensions, but complex measures can only be calculated for dimensions with more than two subgroups.
- The optimum level that is to be achieved for the health indicator. For favourable indicators, the goal is to attain a maximum level; for adverse indicators, the goal is to attain a minimum level. Note that for some indicators there is no optimum level. The optimum level impacts the calculation of certain summary measures.
There are also considerations relating to the properties of the different measures and the desired purpose of the analysis:
- Whether inequality is measured in absolute or relative terms. Absolute measures indicate the magnitude of inequality between population subgroups, while relative measures show proportional inequality between subgroups.
- Whether subgroups should be weighted according to their population size or not. This involves a value judgement depending on the purpose of the analysis.
- The selection of a reference point for non-ordered measures. For instance, this may be the mean, the best-performing subgroup or a user-selected reference group.
The paper Summary measures of health inequality: a review of existing measures and their application provides further information about these considerations.
Summary measures of health inequality
The following summary measures of health inequality are included in
the healthequal library:
- Simple measures
- Difference (d)
- Ratio (r)
 
- Difference (
- Disproportionality measures (ordered dimensions)
- Absolute concentration index (aci)
- Relative concentration index (rci)
 
- Absolute concentration index (
- Regression-based measures (ordered dimensions)
- Slope index of inequality (sii)
- Relative index of inequality (rii)
 
- Slope index of inequality (
- Variance measures (non-ordered dimensions)
- Between-group variance (bgv)
- Between-group standard deviation (bgsd)
- Coefficient of variation (covar)
 
- Between-group variance (
- Mean difference measures (non-ordered dimensions)
- Mean difference from mean - unweighted and weighted
(mdmuandmdmw)
- Mean difference from best-performing subgroup - unweighted and
weighted (mdbuandmdbw)
- Mean difference from reference subgroup - unweighted and weighted
(mdruandmdrw)
- Index of disparity - unweighted and weighted (idisuandidisw)
 
- Mean difference from mean - unweighted and weighted
(
- Disproportionality measures (non-ordered dimensions)
- Theil index (ti)
- Mean log deviation (mld)
 
- Theil index (
- Impact measures
- Population attributable risk (parisk)
- Population attributable fraction (paf)
 
- Population attributable risk (
Loading the healthequal library
First, load the healthequal library in your R session.
This requires the dplyr package.
Read and query the data included in the healthequal
package
The healthequal package comes with sample data for users
to be able to test the package functions. The OrderedSample
and NonorderedSample data contain data disaggregated by
economic status and subnational region, respectively, for a single
indicator.
                                        indicator
1 Births attended by skilled health personnel (%)
2 Births attended by skilled health personnel (%)
3 Births attended by skilled health personnel (%)
4 Births attended by skilled health personnel (%)
                          dimension             subgroup subgroup_order
1 Economic status (wealth quintile) Quintile 1 (poorest)              1
2 Economic status (wealth quintile)           Quintile 2              2
3 Economic status (wealth quintile)           Quintile 3              3
4 Economic status (wealth quintile)           Quintile 4              4
  estimate        se population setting_average favourable_indicator
1 75.60530 1.5996131   2072.436        91.59669                    1
2 91.01997 1.1351504   2112.204        91.59669                    1
3 96.03959 0.6461946   1983.059        91.59669                    1
4 97.04223 0.5676206   2052.124        91.59669                    1
  ordered_dimension indicator_scale
1                 1             100
2                 1             100
3                 1             100
4                 1             100                                        indicator          dimension
1 Births attended by skilled health personnel (%) Subnational region
2 Births attended by skilled health personnel (%) Subnational region
3 Births attended by skilled health personnel (%) Subnational region
4 Births attended by skilled health personnel (%) Subnational region
         subgroup  estimate       se population setting_average
1            aceh  95.11784 1.538443  230.20508        91.59669
2            bali 100.00000 0.000000  149.46272        91.59669
3 bangka balitung  97.41001 1.267644   55.66533        91.59669
4          banten  80.35694 3.544053  451.26550        91.59669
  favourable_indicator ordered_dimension indicator_scale reference_subgroup
1                    1                 0             100                  1
2                    1                 0             100                  0
3                    1                 0             100                  0
4                    1                 0             100                  0The OrderedSampleMultipleind and
OrderedSampleMultipleind data contain disaggregated data by
economic status and subnational region, respectively, for two
indicators.
                                        indicator
1 Births attended by skilled health personnel (%)
2 Births attended by skilled health personnel (%)
3 Births attended by skilled health personnel (%)
4 Births attended by skilled health personnel (%)
                          dimension             subgroup subgroup_order
1 Economic status (wealth quintile) Quintile 1 (poorest)              1
2 Economic status (wealth quintile)           Quintile 2              2
3 Economic status (wealth quintile)           Quintile 3              3
4 Economic status (wealth quintile)           Quintile 4              4
  estimate        se population setting_average favourable_indicator
1 75.60530 1.5996131   2072.436        91.59669                    1
2 91.01997 1.1351504   2112.204        91.59669                    1
3 96.03959 0.6461946   1983.059        91.59669                    1
4 97.04223 0.5676206   2052.124        91.59669                    1
  ordered_dimension indicator_scale
1                 1             100
2                 1             100
3                 1             100
4                 1             100                                        indicator          dimension
1 Births attended by skilled health personnel (%) Subnational region
2 Births attended by skilled health personnel (%) Subnational region
3 Births attended by skilled health personnel (%) Subnational region
4 Births attended by skilled health personnel (%) Subnational region
         subgroup  estimate       se population setting_average
1            aceh  95.11784 1.538443  230.20508        91.59669
2            bali 100.00000 0.000000  149.46272        91.59669
3 bangka balitung  97.41001 1.267644   55.66533        91.59669
4          banten  80.35694 3.544053  451.26550        91.59669
  favourable_indicator ordered_dimension indicator_scale reference_subgroup
1                    1                 0             100                  1
2                    1                 0             100                  0
3                    1                 0             100                  0
4                    1                 0             100                  0For information about the datasets, type the following commands, which will display the corresponding dataset help file:
?healthequal::OrderedSample
?healthequal::NonorderedSample
?healthequal::OrderedSampleMultipleind
?healthequal::NonorderedSampleMultipleind
?healthequal::IndividualSampleExamples
Calculate the absolute concentration index (ACI)
The Absolute Concentration Index (ACI) is a summary measure of health inequality that can be used with ordered dimensions. For information about the ACI function type the following command, which will display the corresponding help file:
?aciThe OrderedSample dataset can be used to calculate ACI.
Two arguments are required: est (the subgroup estimate,
recorded as estimate in the same dataset), and
subgroup_order (the order of subgroups in an increasing
sequence). Other arguments, such as pop (the number of
people within each subgroup, recorded as population in the
sample dataset) or weight (the sampling weight for survey
data), are optional. Lastly, the force argument can be used
to estimate ACI when some estimates are missing.
  measure estimate       se  lowerci  upperci
1     aci  4.30052 1.284031 1.783865 6.817174Calculate the slope index of inequality (SII)
The Slope Index of Inequality (SII) is a summary measure of health inequality that can be used with ordered dimensions. The slope index of inequality (SII) is an absolute measure of inequality that represents the difference in estimated indicator values between the most-advantaged and most-disadvantaged, while taking into consideration the situation in all other subgroups/individuals – using an appropriate regression model.
For information about the SII function type the following command, which will display the corresponding help file:
?siiSII can be calculated using disaggregated or individual data. In this
example, the IndividualSample dataset, a survey weighted
dataset, is used. For this type of data, five arguments are required:
est (the individual estimate, recorded as sba
in the same dataset), subgroup_order (the order of
subgroups in an increasing sequence), weight (the sampling
weight), psu (the primary sampling unit) and
strata (the variable identifying the strata).
with(IndividualSample,
     aci(est = sba,
         subgroup_order = subgroup_order,
         weight = weight,
         psu = psu,
         strata = strata))  measure   estimate          se    lowerci    upperci
1     aci 0.07801353 0.002950566 0.07223053 0.08379654Calculate between-group variance (BGV)
Between-group variance (BGV) is a summary measure of health
inequality that it can be used to measure inequality across
non-ordered dimensions of inequality. It is calculated
as the weighted average of squared differences between subgroup
estimates and the weighted mean. Type ?bgv to view the
corresponding help file.
The NonorderedSample dataset can be used to calculate
BGV, which requires two arguments: pop (the number of
people within each subgroup, recorded as population in the
sample dataset), and est (the subgroup estimate, recorded
as estimate in the same dataset). The argument
se (the standard error of the subgroup estimate) is
required only to compute the corresponding 95% confidence intervals.
  measure estimate       se  lowerci  upperci
1     bgv 50.42023 9.560196 31.68259 69.15787Calculate multiple measures of inequality for a dataset with multiple indicators
The previous examples showed the calculation of a single measure of
inequality for a single indicator-dimension combination. These next
examples use the dataset NonorderedSampleMultipleind, which
contains disaggregated data by subnational region for two
indicators:
- Births attended by skilled health personnel (%) and
- Under-five mortality rate (deaths per 1000 live births)
The data can be inspected as follows:
head(NonorderedSampleMultipleind, n = 4)                                        indicator          dimension
1 Births attended by skilled health personnel (%) Subnational region
2 Births attended by skilled health personnel (%) Subnational region
3 Births attended by skilled health personnel (%) Subnational region
4 Births attended by skilled health personnel (%) Subnational region
         subgroup  estimate       se population setting_average
1            aceh  95.11784 1.538443  230.20508        91.59669
2            bali 100.00000 0.000000  149.46272        91.59669
3 bangka balitung  97.41001 1.267644   55.66533        91.59669
4          banten  80.35694 3.544053  451.26550        91.59669
  favourable_indicator ordered_dimension indicator_scale reference_subgroup
1                    1                 0             100                  1
2                    1                 0             100                  0
3                    1                 0             100                  0
4                    1                 0             100                  0
unique(NonorderedSampleMultipleind$indicator)[1] "Births attended by skilled health personnel (%)"        
[2] "Under-five mortality rate (deaths per 1000 live births)"The Coefficient of Variation (COV) is a summary measure of health
inequality that it can be used to measure inequality across
non-ordered dimensions of inequality. COV is a relative
measure of inequality that considers all population subgroups. Type
?covar to view the corresponding help file. The
NonorderedSampleMultipleind dataset can be used to
calculate COV for two different dimensions.
library(dplyr)
measures <- NonorderedSampleMultipleind %>%
  dplyr::group_by(indicator) %>%
  dplyr::summarize(covar(est = estimate,
                         se = se, 
                         pop = population,
                         scaleval = indicator_scale))
as.data.frame(measures)                                                 indicator measure  estimate se
1         Births attended by skilled health personnel (%)     cov  7.752158 NA
2 Under-five mortality rate (deaths per 1000 live births)     cov 41.500365 NA
    lowerci   upperci
1  6.860121  9.349835
2 39.094980 46.881011The NonorderedSampleMultipleind dataset can also be used
to calculate two or more different summary measures (in the example
below COV and BGV) for multiple dimensions.
multiplemeasures <- NonorderedSampleMultipleind %>%
  dplyr::group_by(indicator,
                  dimension) %>%
  dplyr::summarize(
    covar = covar(est = estimate,
                  se = se, 
                  pop = population,
                  scaleval = indicator_scale),
    bgv = bgv(est = estimate,
              se = se, 
              pop = population)) 
# It is possible to extract the measures separately
multiplemeasures$covar  measure  estimate se   lowerci   upperci
1     cov  7.752158 NA  6.860121  9.349835
2     cov 41.500365 NA 39.094980 46.881011
multiplemeasures$bgv  measure   estimate         se    lowerci    upperci
1     bgv   50.42023   9.560196   31.68259   69.15787
2     bgv 2468.70951 273.155091 1933.33537 3004.08365Further References
Ahn J, Harper S, Yu M, Feuer EJ, Liu B, Luta G. Variance Estimation and Confidence Intervals for 11 Commonly Used Health Disparity Measures. JCO Clin Cancer Inform. 2018 Dec;2:1–19. https://doi.org/10.1200/CCI.18.00031
Erreygers G. Correcting the Concentration Index. Journal of Health Economics. 2009;28:504–515. https://doi.org/10.1016/j.jhealeco.2008.02.003
Harper S, Lynch J, Meersman SC, Breen N, Davis WW, Reichman ME. An overview of methods for monitoring social disparities in cancer with an example using trends in lung cancer incidence by area-socioeconomic position and race-ethnicity, 1992-2004. Am J Epidemiol. 2008;167(8):889-899. https://doi.org/10.1093/aje/kwn016
Keppel K, Pamuk E, Lynch J, Carter-Pokras O, Kim I, Mays V, Pearcy J, Schoenbach V, Weissman JS. Methodological issues in measuring health disparities. Vital Health Stat. Ser. 2. 2005;141:1–16. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3681823/
O’Donnell O, van Doorslaer E, Wagstaff A, Lindelow M. Analyzing Health Equity Using Household Survey Data: A Guide to Techniques and Their Implementation. World Bank Institute; Washington, D.C: 2008. https://hdl.handle.net/10986/6896
Schlotheuber A, Hosseinpoor AR. Summary Measures of Health Inequality: A Review of Existing Measures and Their Application. Int J Environ Res Public Health. 2022 Mar 20;19(6):3697. https://doi.org/10.3390/ijerph19063697
Wagstaff A. The bounds of the concentration index when the variable of interest is binary, with an application to immunization inequality. Health Economics. 2005;14:429–432. https://doi.org/10.1002/hec.953
World Health Organization. Handbook on health inequality monitoring: with a special focus on low- and middle-income countries. Geneva: World Health Organization; 2013. https://www.who.int/publications/i/item/9789241548632