- Review
- Open access
- Published:
Making health inequality analysis accessible: WHO tools and resources using Microsoft Excel
International Journal for Equity in Health volume 23, Article number: 205 (2024)
Abstract
Addressing health inequity is a central component of the Sustainable Development Goals and a priority of the World Health Organization (WHO). WHO supports countries in strengthening their health information systems in order to better collect, analyze and report health inequality data. Improving information and research about health inequality is crucial to identify and address the inequalities that lead to poorer health outcomes. Building analytical capacities of individuals, particularly in low-resource areas, empowers them to build a stronger evidence-base, leading to more informed policy and programme decision-making. However, health inequality analysis requires a unique set of skills and knowledge. This paper describes three resources developed by WHO to support the analysis of inequality data by non-statistical users using Microsoft Excel, a widely used and accessible software programme. The resources include a practical eLearning course, which trains learners in the preparation and reporting of disaggregated data using Excel, an Excel workbook that takes users step-by-step through the calculation of 21 summary measures of health inequality, and a workbook that automatically calculates these measures with the user’s disaggregated dataset. The utility of the resources is demonstrated through an empirical example.
Introduction
“Leaving no one behind” is a key principle for both the UN Sustainable Development Goals (SDGs) and the World Health Organization (WHO). It emphasises the commitment to ensuring that all individuals, regardless of their socioeconomic status, geographical location, or other factors, benefit from development efforts and have access to essential services, including health care [1]. Monitoring inequalities in health is crucial to identifying population groups that are being left behind, due to unequal healthcare access, outcomes, and social determinants. Understanding where such inequalities lie allows for targeted interventions, ultimately working towards a more equitable and inclusive health care system.
Issues of equity are a dominant theme requested in health sector analysis in developing countries across all regions and income groups [2]. However, despite the importance and utility of monitoring inequalities in health, capacity gaps remain in many countries. A 2020 global report on health data systems and capacity found that only half of the 133 surveyed countries included disaggregated data in their national health statistics reports [3]. Some countries may lack comprehensive, up-to-date, and reliable disaggregated data on health indicators, especially in remote or marginalised areas. Others may have a shortage of professionals skilled in the analysis and reporting of inequality data. Health research capacity gaps in low- and middle-income countries have been emphasised in many studies [4,5,6,7], which extend to health inequality-related research. Analytical capacity gaps exist in high-income countries too; for instance, a lack of analytically robust studies examining the impact of specific public health policy interventions on health inequalities has been highlighted [8].
The mandate of WHO includes supporting countries in the effective collection, analysis, reporting and use of data, with a focus on health inequalities [9]. The WHO 2022-27 Inequality monitoring and analysis strategy has an overarching mission to strengthen country capacities in monitoring health inequalities, including through developing and refining health inequality monitoring methods, tools, resources, and best practices [10]. Starting points for inequality analysis are disaggregated health data (i.e., indicator estimates broken down by population subgroups to show underlying inequality patterns that are not evident from overall averages across a whole population) and summary measures of health inequality (which quantify the level of inequality in a single number).
This paper describes three WHO resources developed to support the analysis of health inequality data by non-statistical users using Microsoft Excel: an eLearning course about preparing disaggregated datasets, a step-by-step workbook for how to calculate 21 summary measures of health inequality, and an automated workbook that calculates these measures for a user-inputted dataset. Excel is a widely available software that makes conducting data analysis accessible, particularly in low- and middle-income countries where resources and know-how for specialised statistical software may be limited. Excel-based tools are also of value to those working in lower administrative tiers of health services and local government in high-income countries, where expertise in more sophisticated analytical tools is at a premium. Its interface and spreadsheet format are user-friendly, and, although it may have limitations for more complex statistical analysis, it is a versatile tool that can be used for data entry, cleaning, visualisation, and basic statistical computations. According to Microsoft, 1.2 billion people own Excel, of which 800 million people (two thirds) are estimated to currently use it [11]. While Microsoft Excel has a wide variety of online resources, tutorials and forums that can assist users in learning and troubleshooting, none are targeted specifically to the topic of inequality monitoring and the handling of disaggregated data.
eLearning course: Disaggregated data
About the course
Disaggregated data are the backbone of reliable, relevant and regular health inequality monitoring practices. Disaggregated data are health indicator data broken down by population subgroups within a dimension of inequality such as age, gender, ethnicity, income or geographical location. Simply put, they are indicator means by population subgroups. A free and self-paced online course ‘Inequality analysis using Excel: Disaggregated data’ aims to help users learn how to use Excel to prepare disaggregated datasets for analysis [12]. It introduces learners to a set of Excel formulas and processes that are useful for disaggregated data preparation and reporting, demonstrated using sample datasets and exercises. The target audience includes monitoring and evaluation officers, data analysts, academics and researchers, public health professionals, medical and public health students, and others with an interest in health data, inequality monitoring, and data analysis. The course is hosted on the WHO Health Inequality Monitoring eLearning channel of the OpenWHO platform. The development of this channel has been described in detail elsewhere [13].
The course is presented in four modules:
-
An introductory module provides an overview of the use of disaggregated data in health inequality monitoring.
-
Module 1 covers basic steps for preparing disaggregated datasets in Excel, teaching learners how to calculate disaggregated estimates for population subgroups (such as proportions and rates), clean disaggregated datasets for analysis (such as automating the standardisation of population subgroup names), join information from different datasets (such as bringing data from a reference table into a dataset of disaggregated data), and manage common errors in formulas.
-
Module 2 covers more advanced steps for preparing disaggregated datasets in Excel including various methods to re-categorise population subgroups, automate formulas by using relative and absolute cell references, and combine disaggregated datasets based on multiple conditions.
-
Module 3 covers the presentation of disaggregated data using graphs, including the use of bar graphs to present the latest situation of inequality and Equiplots (or horizontal circle graphs) to present the change in inequality over time, applying good visualisation practices.
-
Module 4 covers the steps to format a disaggregated dataset for its use and exploration with the Health Equity Assessment Toolkit (HEAT) software. HEAT is a software application that facilitates the assessment of within-country health inequalities [14, 15]. The upload database edition of the software, HEAT Plus, enables users to upload and work with their own data, which must comply with a defined format. Once data are uploaded, patterns of inequality can be explored, including compared over time and across indicators and settings, using both disaggregated data and summary measures of inequality (which are automatically calculated within the tool) displayed in a variety of interactive visualisations.
The course is around two hours in duration and utilises a range of features to support learning including videos, step-by-step guidance using examples, links to additional resources and readings, and ‘knowledge check’ quizzes. A unique aspect of the course is that rather than solely focusing on teaching useful Excel formulas and functions, it also seeks to embed good data management principles from the beginning – bearing in mind that, especially in countries with weaker health information systems, poor data management and data quality issues are often major challenges. Much of a data scientist’s time can be related to data cleaning and organising, so the course emphasises good practices such as using formulas rather than manual inputs, automating dataset cleaning, and utilising separate but linked worksheets to conduct cleaning and analysis steps, preserving the original data.
Anticipated impact and opportunities
This course is the first of its kind and is an innovative and necessary addition to existing tools and resources to support the advancement of health equity. It is an opportunity to reach diverse global audiences and the concepts and skills covered can be applied to diverse analytical projects, health topics and settings (including national and subnational levels of the health sector).
The course was launched on 25th August 2023 and within nine months had over 5700 enrolments and over 1300 certificates of achievement. The number of enrolments was comparatively higher than similar skill-building courses using the statistical programmes Stata and R that were released at the same time and had around 3500 and 2900 enrolments within the same period, respectively. The majority of users to date have been from low- and middle-income countries; only 15% are from high-income countries (further information in Supplementary Material 1). This highlights a discernible opportunity for leveraging Microsoft Excel for conducting inequality analysis, particularly in settings where capacity and/or access to statistical programmes such as Stata and R may be less common.
A wealth of expertise in working with different types of disaggregated data and collaborating with countries in the preparation and analysis of such data was harnessed in the development of this course. The course is hosted on the WHO Health Inequality Monitoring eLearning channel of the OpenWHO platformFootnote 1. OpenWHO is a global social learning network that offers free, interactive eLearning courses on a variety of health topics, supporting multiple languages. Key features of the platform help to attract learners and promote their success in health inequality monitoring courses: courses are free of charge; course materials are available in multiple formats and can be downloaded and used offline; learners can navigate selectively through the courses to access materials that meet their immediate needs; discussion forums provide spaces for learner interactions and networking; and verified certificates of achievement are issued to learners who score at least 80% on the graded assessment at the end of the course.
Limitations and considerations
Household surveys such as Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS) are some of the most common sources of disaggregated data, especially in low- and middle-income countries. However, household surveys often collect data for a limited number of indicators and dimensions of inequality, they are infrequent, and estimates are typically only representative at national levels, masking subnational disparities and not capturing all relevant aspects of inequality [16]. Moreover, calculating disaggregated estimates from surveys requires taking sampling design complexities (such as weighting, clustering and stratification) into consideration. Statistical software such as Stata, SPSS, SAS or R should be used for this process. Statistical codes are available to support the calculation of disaggregated estimates from household surveys using these software programmes [17]. Having said that, disaggregated estimates from household surveys are often already calculated and published – for instance, DHS estimates are published via the DHS STATcompiler [18], MICS estimates are published in the UNICEF Data Warehouse [19], and disaggregated data from other publicly-available sources are collated in the WHO Health Inequality Data Repository (HIDR) [20]. This course aims to build capacity in managing and presenting such data using Excel.
Routine and administrative information that are collected and recorded by health facilities is another potential data source for inequality data. While these can produce detailed information about uptake and outcomes of health services, data quality issues (availability, completeness, accuracy, relevance and timeliness) and the availability of data about inequality dimensions (such as socioeconomic status) may pose challenges for inequality monitoring [16]. Strengthening health information systems to improve the collection of inequality data is therefore an essential precursor to conducting comprehensive analysis of health inequalities [21].
Excel is not an ideal tool for the management and analysis of very large datasets (an Excel sheet is limited to 1,048,567 rows and calculations at this scale slow computation speed considerably). When it comes to automating data management tasks for big data, other software programmes are more suitable. This course targets users of smaller datasets. For instance, a technical officer may extract a subset of data pertaining to certain populations of interest from a large administrative database. Or they may download a set of disaggregated estimates from the DHS STATcompiler for a set of health indicators.
The course is not intended to provide comprehensive training in all of Excel’s formulas, functions and functionalities, nor all data management practices. It highlights those that are most pertinent for a beginner user of Excel to work with a disaggregated dataset, seeking to establish good data management practices from the start.
Workbooks: Summary measures of health inequality
About the workbooks
In addition to disaggregated data, summary measures of health inequality are also important for monitoring inequalities. Summary measures build on disaggregated data, quantifying the magnitude and direction of inequality in a single number. They are valuable for gaining an overarching understanding of the level of inequality and how it has changed over time or across settings and indicators. Many summary measures of inequality exist, each with different characteristics and calculation methods, which can lead to different conclusions about inequality.
The HEAT software automatically calculates up to 21 summary measures of inequality using the software R Shiny [14, 15]. Datasets of disaggregated data in Excel format can be uploaded to HEAT Plus for interactive exploration of inequality using both disaggregated data and summary measures (a process detailed in Module 4 of the Excel eLearning course described above). Users can also download summary measure results calculated by HEAT Plus in CSV (comma-separated values) and Excel formats.
For users who would like to calculate summary measures of interest without the requirement of preparing and uploading a dataset to HEAT Plus, two Excel workbooks ‘Summary measures of health inequality: step-by-step calculation workbook’ and ‘Summary measures of health inequality: automated calculation workbook’ have been published on the WHO Health Inequality Monitor website [17]. Figure 1, available in the ‘about’ tab of the workbooks, summarises the 21 summary measures that are calculated within the workbooks and provides a decision tree for selecting appropriate measures, based on the research purpose and characteristics of the underlying data. The step-by-step workbook provides formulas and instructions for the calculation of each summary measure, using sample data, along with information about how to interpret summary measure results. This workbook serves an educational purpose, showing in detail how measures are calculated. The automated workbook allows users to input a dataset of disaggregated data, and relevant summary measures are automatically calculated (and their 95% confidence intervals, where possible).
The calculation methods for these summary measures have been explained in detail elsewhere [22]. Selecting appropriate measures for analysing and reporting inequality involves the consideration of several methodological issues [23]. First, there are considerations relating to the characteristics of the underlying data, which determines the types of measures that can be calculated and how they are calculated:
-
Whether the dimension of inequality comprises two or more than two subgroups. While simple measures and impact measures can be calculated for all inequality dimensions, ordered and non-ordered complex measures can only be calculated for dimensions with more than two subgroups (such as wealth quintiles or subnational regions). Simple measures are more suited when there are only two subgroups (though they can be used if there are more than two).
-
Whether the dimension of inequality is ordered, non-ordered or binary. Ordered dimensions have subgroups with a natural or inherent ordering (such as education level or wealth quintiles), while non-ordered dimensions have subgroups that cannot logically be ranked (such as subnational regions or ethnicity). Certain summary measures (such as SII and ACI) can only be calculated for ordered dimensions, while other summary measures (such as MDMU and BGV) can be calculated for non-ordered dimensions. Binary dimensions have only two subgroups (such as males versus females); only simple measure can be calculated for binary dimensions.
-
Whether the health indicator in question is favourable or adverse. For favourable indicators, the aim is to obtain a maximum level (such as complete coverage of antenatal care). For adverse (or non-favourable) indicators, the aim is to realise a minimum level (such as zero under-five mortality). There are also indicators that do not fall into either of these two categories (such as fertility rates, caesarean section rates or hospitalisation rates); rather, the optimum depends on the setting and context. The type of health indicator (favourable versus adverse) impacts the calculation (directionality) of certain summary measures.
-
Whether adequate data are available for the calculation of summary measures. For ordered dimensions, estimates must be available for all subgroups. For non-ordered dimensions, estimates should ideally be available for all subgroups, or at a minimum a large proportion of subgroups. Data on the population sizes of subgroups is required for the calculation of most weighted measures, and standard errors of subgroup estimates are required for the calculation of many 95% confidence intervals (CIs).
Second, there are considerations relating to the properties of the different measures and the desired purpose of the analysis [24]:
-
Whether absolute or relative inequality (or both) should be assessed. Absolute measures indicate the magnitude of inequality between population subgroups, while relative measures show proportional inequality between subgroups. Relative measures have no unit and are therefore useful for comparing the situation across indicators with different units. Inequality analysis should, ideally, consider both absolute and relative inequality as they measure different aspects of inequality that complement each other. For example, inequality between two income groups could be measured in absolute terms by calculating the difference between them (e.g., immunization coverage is 20% points higher among the richest than the poorest) or in relative terms by calculating the ratio between them (e.g., immunization coverage is two times higher among the richest compared to the poorest).
-
Whether the summary measure should account for the population size of each subgroup. Population size refers to the indicator denominator for a specific subgroup. Simple measures (difference and ratio) are always unweighted. Ordered and non-ordered measures may weight population subgroups by their population size or may treat them equally. This involves a value judgement depending on the purpose of the analysis. For example, the use of a weighted measure to show inequality between ethnic groups may not capture inequality if the number of individuals in some ethnic groups was very small, while the use of an unweighted measure would treat all groups equally and be more likely to capture inequality.
-
Whether there is a reference point against which the other subgroups are compared, if the dimension is non-ordered. For instance, this may be the most-advantaged subgroup, the best-performing subgroup, a reference value such as the mean, or a user-selected reference subgroup.
Instructions for using each workbook are provided in a ‘how to’ tab. Calculation methods and information about how to interpret summary measure results are provided in a ‘technical notes’ tab, and terminology and variable definitions are provided in a glossary (see copies in Supplementary Materials 2 and 3). Sample data are pasted into ‘Data input’ tab(s), following the data structure provided. Checks are automatically carried out to ensure that variables are defined correctly, with error messages appearing if the input data do not follow specified rules.
In the step-by-step workbook, formulas to calculate each summary measure of inequality are presented in separate tabs; blue tabs calculate measures for ordered dimensions, while green tabs calculate measures for non-ordered dimensions. Within each summary measure tab, there are three types of cells: data cells, calculation cells and result cells (Fig. 2). Data cells reflect the data from the ‘Data input’ tab that are required for the calculation. The formulas in the calculation cells demonstrate the calculation of the relevant summary measure and its 95% CIs (if applicable), which are then presented in the result cells. The formulas calculate a summary measure for one unique set of subgroup estimates at a time (i.e., disaggregated data for subgroups within a single indicator, date, data source and inequality dimension combination).
In the automated workbook, users can enter data for multiple subgroup estimates at a time (i.e., disaggregated data for multiple indicators, dates and/or inequality dimensions). Clicking a “Refresh” button triggers the calculation process and relevant summary measure results are presented in the ‘Result’ tab (Fig. 3).
The selection and interpretation of suitable summary measures for health inequality analysis requires a good understanding of their relative advantages, disadvantages and applicability. Therefore, these Excel tools are complemented by other WHO resources for inequality monitoring that can support users to understand and correctly apply summary measures of inequality within their analysis. These include an eLearning course ‘Summary measures of health inequality’ [25] (part of the Health Inequality Monitoring Foundations course series on the OpenWHO platform) and a handbook on health inequality monitoring (with second edition expected in 2024) [16].
Anticipated impact and opportunities
While other workbooks for the calculation of some summary measures of inequality have been published, this is the first compilation of formulas in Excel for a wide range of summary measures that encompasses both ordered and non-ordered dimensions of inequality, as well as an automated calculation tool for a disaggregated dataset. For example, Public Health Information for Scotland (ScotPHO) has published an Excel tool for calculating difference, ratio, SII, RII and PAR [26] and the UK Office for Health Improvement and Disparities (OHID) Inequalities Analysis Tool calculates difference ratio, SII, RII, ACI and RCI [27], each for a single set of subgroup estimates at a time. A step-by-step guide for measuring social inequalities in health has been published by Every Woman Every Child Latin America and Caribbean (EWEC LAC), which demonstrates the use of Excel to calculate difference, ratio, SII and ACI [28]. None of these existing resources include a comprehensive selection of summary measures of inequality, including for non-ordered dimensions, nor do they automate calculation for a dataset containing multiple sets of disaggregated estimates. Moreover, methodologies for calculating SII and RII in these resources differ from those used in the WHO tools; they use simple linear regression models rather than logistic regression (discussed below) and the ScotPHO and OHID tools do not take into account population shares when ranking subgroups. Confidence intervals are also not systematically calculated (they are only calculated for SII and RII in the OHID tool).
By clearly presenting formulas and steps for the calculation of summary measures of inequality, the objective of the step-by-step workbook is to make these methods user-friendly, accessible and reproducible, fostering increased utilisation for inequality analysis. The process of calculating these measures serves an educational function by enhancing users’ comprehension of the underlying concepts, thereby facilitating a deeper understanding of what is being measured. This, in turn, enables users to interpret and present the data more effectively. On the other hand, the automated workbook saves users time in setting up formulas by enabling the rapid calculation of multiple summary measures for a small set of disaggregated data. Together, these Excel workbooks can empower public health professionals to independently analyse health inequality data without relying on more specialised software, fostering local capacity building and decision-making. To increase access to disaggregated data, all of the datasets within the HIDR, which contains over 2000 health and health-related indicators, are available for download in the same format required within the Excel tools [20].
Summary measures can simplify complex information about the state of inequality, facilitating effective communication of inequality patterns to policymakers, researchers, and broader audiences. They allow for the monitoring of inequality trends over time, aiding in the evaluation of the effectiveness of interventions and policy measures in mitigating or exacerbating health inequalities. These measures can also enable straightforward comparisons of health inequalities across different regions, countries or indicators, facilitating a deeper understanding of variations and patterns that may inform targeted interventions.
Limitations and considerations.
Some basic knowledge of data preparation as well as Excel functionality and formulas is required to use the workbooks. The disaggregated data must be prepared before they are inserted into the workbooks (i.e., each row in the dataset inserted into the workbook must pertain to a population subgroup, rather than to an individual), and the workbooks cannot check the quality or accuracy of the underlying disaggregated data. Limitations discussed previously in this paper with regards to the availability of disaggregated data apply.
In the automated workbook, the size of the input dataset is limited to 200 rows, due to the computational power of Excel. Therefore, this tool is targeted towards users of small datasets; users with larger datasets are recommended to use HEAT Plus or other statistical software packages for the calculation of summary measures (such as Stata or R, for which summary measure codes have been published by WHO [17]).
The workbooks do not calculate 95% CIs for some summary measures of inequality (including BGSD, COV, MDMU, MDMW, MDBU, MDBW, IDISU and IDISW). For these measures a simulation methodology is recommended to estimate CIs. While such simulations are possible to do in Excel with macro programming (using Excel Visual Basic, VBA), it increases complexity and decreases computation speed. Therefore, users who require CIs for these measures are recommended to use HEAT Plus or another software programme.
Moreover, the methodology for calculating the slope index of inequality (SII) and the relative index of inequality (RII) in the workbook involves some caveats. SII and RII are regression-based measures, which are based on fitting a regression line of the health indicator and the ranked population subgroups, and using this line to estimate the health indicator value at the top and bottom values of the population (e.g., the richest and the poorest). Weighted logistic regression (weighted to consider subgroup population sizes) is the method applied in the workbook. However, the SII and RII results obtained from the formulas in the workbook may differ somewhat from results produced by using other statistical software and/or regression models, but minor differences due to model selection and calculation should not be concerning. It is also recommended to use HEAT Plus or another software programme for the calculation of 95% CIs of SII and RII (WHO have published specific codes for SII and RII for the programmes Stata and R [17]).
Empirical example
To illustrate the use of these resources, we use sample data monitoring the coverage of three doses of diphtheria-tetanus-pertussis (DTP3) immunization among children under the age of one year in Congo. The first dataset contains data from three household surveys (from 2005, 2011 and 2016) disaggregated by population subgroups formed on the basis of four dimensions of inequality: economic status (wealth quintiles), education (highest achieved level of education by the mother), place of residence (urban and rural), and child’s sex. Each row in the dataset contains immunization coverage data for a population subgroup. The dataset has unformatted dimension and subgroup names and contains only numerators and denominators (Fig. 4a). Following the lessons in Module 1 of the course, we can calculate the DTP3 immunization coverage within each subgroup and automate the standardisation of dimension and subgroup names across the dataset (Fig. 4b).
The second dataset contains administrative data at district (second administrative level, or admin2) level (Fig. 4c). Following the lessons in Module 2 of the course, we can automate the aggregation of district-level estimates to province (admin1) level (Fig. 4d) and combine these data with the disaggregated data (i.e., subgroup means) from the previous household surveys dataset, also integrating information about the setting average (i.e., the indicator average for the country) (Fig. 4e). Finally, the dataset is prepared in a specific format for further exploration of the data using HEAT Plus, including specific fields used for the calculation of summary measures of inequality (Fig. 4f).
Following the lessons in Module 3 of the course, we can use these disaggregated data to visualise the latest status of inequality in DTP3 immunization coverage using bar charts (Fig. 5a and b) and visualise the change in education-related inequality over time using a bar chart or Equiplot (Fig. 5c and d). We can see that DTP3 immunization coverage in Congo was lower among the poorest quintile, among people with no or primary education, those living in rural areas, and certain subnational regions such as Likouala and Sangha. While coverage increased between 2005 and 2014 among children with mothers who had no education, it remained unchanged among children with mothers who had primary education. Coverage remained highest among the secondary and higher education subgroup.
After preparing and exploring the patterns shown in the disaggregated data, summary measures of inequality can be calculated to summarise the magnitude of inequality. The formatted data from the previous steps are pasted into the automated calculation Excel workbook. Relevant summary measure results are then presented for each set of disaggregated data; for instance, since education is an ordered dimension, simple measures of difference and ratio are calculated as well as complex measures of SII, RII, ACI and RCI and the impact measures of PAR and PAF.
We start by focusing on analysing education-related inequality using the simple summary measures of difference and ratio. In 2005, the difference in coverage between children of the most and least educated mothers was 43.3% points (95%CI 29.9–56.6), while data from the 2014 MICS survey shows a difference of 21.5% points (95%CI 9.2–33.8) (Table 1). Therefore, in absolute terms, education-related inequality halved over the 2005–2014 periodFootnote 2. The ratio in coverage between the most and least educated subgroups decreased from 2.2 (95%CI 1.5–3.1) in 2005 to 1.4 (95%CI 1.1–1.8) in 2014.
A limitation of difference and ratio is that they only consider two (extreme) subgroups when measuring the level of inequality and also ignore shifts over time in the population’s composition between all the subgroups. The situation in the middle subgroup (primary education) is ignored – for instance, it does not capture the fact that in 2014 the coverage in the primary education subgroup declined compared to 2011. Complex measures of inequality, such as SII and RII, consider the situation in all subgroups and the population’s composition shifts over time. Using SII, we find that the difference in coverage reduced from 45.9 to 34.8% points between 2005 and 2014 (Table 1). When looking at population composition, the ‘no education’ subgroup contained 7% of the population while the ‘secondary and higher education’ subgroup contained 54%. In 2014, these proportions were 8% and 66%, respectively, indicating that an increased share of the population had secondary or higher education in 2014. Therefore, when using SII to account for the situation in all subgroups and the population composition, the magnitude of inequality is larger compared to when using difference to look at the situation among only the least and most educated subgroups and not taking population weights into account (without assessing whether SII and difference are statistically significantly different from one another).
Using PAR and/or its relative version, PAF, we can estimate how much the national average DTP3 coverage would have improved in Congo if education-related inequality was eliminated. The overall coverage could have improved by 6.4% points (95%CI 4.6–8.1) in 2014 if there was no education-related inequality (i.e., if the national average coverage was the same as the coverage in the most educated subgroup) (Table 1).
Summary measures of inequality are also useful for comparing inequality across subnational regions within a country. To exemplify this, we look at how subnational inequalities in Congo changed between 2016 and 2021. Subnational region is a non-ordered dimension of inequality (i.e., there is no inherent ordering to subnational regions, unless a separate indicator is used to order regions). There are a range of summary measures that can be calculated with non-ordered dimensions – for illustrative purposes, we will use the unweighted and weighted mean difference from mean (MDMU and MDMW). These show the mean difference between each subnational region’s coverage and the national average. The unweighted version of the measure treats all subnational regions equally, while in the weighted version subnational regions are weighted according to their share of the population. There is an unweighted MDM of 9.7% points and weighted MDM of 6.9% points in 2016 (Table 2). This compares to 13.2 and 9.8% points, respectively, in 2021. Therefore, absolute inequality in DTP3 coverage between subnational regions in Congo increased in this period. PAR can also be used to quantify the impact of eliminating inequality – in this case, DTP3 coverage would have been 18.6% points (95%CI 17.8–19.3) higher in 2021 if there was no subnational inequality (Table 2).
To better understand the computations behind these results, the step-by-step workbook could be consulted. The same data (for a single set of disaggregated estimates) can be pasted into the relevant ‘Data input’ tab and the calculation steps reviewed. For example, Fig. 6a and b show the calculation steps for the difference and SII, respectively, in DTP3 immunization coverage between the most and least educated subgroups in 2014.
Conclusion
Microsoft Excel is one of the most widely used spreadsheet software programmes globally. Its ubiquity indicates a high level of familiarity and comfort among users and highlights an opportunity to increase capacity among diverse global audiences in its use for analysing data related to health inequalities. The Excel resources described in this paper seek to support users to use disaggregated data and summary measures of inequality, two pillars of inequality analysis. Importantly, the tools are free, self-paced and user-friendly. They are complemented by a range of further resources to support their use – including inequality monitoring training courses and the HEAT Plus software. The tools are new and, while there have been many enrolments to the eLearning course, use cases are not available yet. Further development of these resources may include translation into other languages in order to increase accessibility, and the development of tools that facilitate the visualisation of inequality data using Excel.
Building capacity in the analysis and reporting of inequality data is an integral component of strengthening health inequality monitoring. This process involves enhancing skills and knowledge to proficiently work with disaggregated data, calculate summary measures of inequality, and interpret and report findings. However, analysis is just one step of the health inequality monitoring cycle [29]. To further strengthen inequality monitoring, it is imperative to also address the availability of disaggregated data by building robust data collection mechanisms. Moreover, the effectiveness of monitoring efforts significantly hinges on the effective communication of results and their strategic utilisation in decision-making processes. Therefore, the synergy between data collection, data analysis, reporting and the translation of this knowledge is essential for fostering a comprehensive and impactful approach to tackling health inequalities and achieve health equity.
Data availability
The data used in the empirical example in this paper are published in the WHO Health Inequality Data Repository at https://www.who.int/data/inequality-monitor/data.
Change history
15 November 2024
A Correction to this paper has been published: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12939-024-02330-0
Notes
Among several other ‘skill-building’ courses, which provide practical guidance for analysis methods and the use of selected software programmes, including Stata and R.
The difference between the difference value in 2005 and the difference value in 2014 is statistically significant (i.e., the CIs do not cross zero).
Abbreviations
- ACI:
-
Absolute Concentration Index
- BGSD:
-
Between-Group Standard Deviation
- BGV:
-
Between-Group Variance
- CI:
-
Confidence Interval
- COV:
-
Coefficient of Variation
- CSV:
-
Comma-Separated Values
- DHS:
-
Demographic and Health Survey
- DTP3:
-
Three Doses of Diphtheria-Tetanus-Pertussis Immunization
- EWEC:
-
LAC Every Woman Every Child Latin America and Caribbean
- HEAT:
-
Health Equity Assessment Toolkit
- HIDR:
-
Health Inequality Data Repository
- IDISU:
-
Index of Disparity (unweighted)
- IDISW:
-
Index of Disparity (weighted)
- MDBU:
-
Mean Difference from the Best-Performing Subgroup (unweighted)
- MDBW:
-
Mean Difference from the Best-Performing Subgroup (weighted)
- MDMU:
-
Mean Difference from Mean (unweighted)
- MDMW:
-
Mean Difference from Mean (weighted)
- MDRU:
-
Mean Difference from a Reference Subgroup (unweighted)
- MDRW:
-
Mean Difference from a Reference Subgroup (weighted)
- MICS:
-
Multiple Indicator Cluster Survey
- MLD:
-
Mean Log Deviation
- OHID:
-
UK Office for Health Improvement and Disparities
- PAF:
-
Population Attributable Fraction
- PAR:
-
Population Attributable Risk
- RCI:
-
Relative Concentration Index
- RII:
-
Relative Index of Inequality
- ScotPHO:
-
Public Health Information for Scotland
- SDGs:
-
Sustainable Development Goals
- SII:
-
Slope Index of Inequality
- TI:
-
Theil Index
- VBA:
-
Visual Basic for Application
- WHO:
-
World Health Organization
References
United Nations General Assembly. Transforming our world: the 2030 Agenda for Sustainable Development. 2015.
Gaudin S, Yazbeck A. Identifying Major Health-System challenges in developing Countries using PERs: equity is the Elephant in the room. Heal Syst Reform. 2021;7(2).
World Health Organization. Global report on health data systems and capacity. 2020. 2020.
Franzen SRP, Chandler C, Lang T, Samuel D, Franzen RP. Health research capacity development in low and middle income countries: reality or rhetoric? A systematic meta-narrative review of the qualitative literature. Open. 2017;7:12332.
Bates I, Boyd A, Smith H, Cole DC. A practical and systematic approach to organisational capacity strengthening for research in the health sector in Africa. Heal Res Policy Syst. 2014;12(1):1–10.
Schleiff MJ, Kuan A, Ghaffar A. Comparative analysis of country-level enablers, barriers and recommendations to strengthen institutional capacity for evidence uptake in decision-making. Heal Res Policy Syst. 2020;18(1):1–12.
Hoxha K, Hung YW, Irwin BR, Grépin KA. Understanding the challenges associated with the use of data from routine health information systems in low- and middle-income countries: a systematic review. Heal Inf Manag J. 2022;51(3):135–48.
Thomson K, Hillier-Brown F, Todd A, McNamara C, Huijts T, Bambra C. The effects of public health policies on health inequalities in high-income countries: an umbrella review. BMC Public Health. 2018;18(1):1–21.
World Health Organization. Thirteenth General Programme of Work, 2019–2023. 2019.
World Health Organization. Inequality monitoring and analysis strategy, 2022-27. 2022.
Microsoft. Satya Nadella and Terry Myerson: Build 2016 [Speech Transcript]. 2016.
World Health Organization. Inequality analysis using Excel: Disaggregated data [Internet]. OpenWHO. 2023 [cited 2024 Jan 18]. https://openwho.org/courses/inequality-monitoring-disaggregation-excel.
Bergen N, Kirkby K, Baptista A, Nambiar D, Schlotheuber A, Vidal Fuertes C et al. Health Inequality Monitoring channel on OpenWHO: capacity strengthening through eLearning. Int J Equity Heal 2022 211. 2022;21(1):1–8.
Hosseinpoor AR, Schlotheuber A, Nambiar D, Ross Z. Health Equity Assessment Toolkit Plus (HEAT plus): software for exploring and comparing health inequalities using uploaded datasets. Glob Health Action. 2018;11.
Kirkby K, Schlotheuber A, Vidal Fuertes C, Ross Z, Hosseinpoor AR. Health Equity Assessment Toolkit (HEAT and HEAT plus): exploring inequalities in the COVID-19 pandemic era. Int J Equity Health. 2022;21.
World Health Organization. Handbook on health inequality monitoring: with a special focus on low- and middle-income countries. Geneva: World Health Organization; 2013.
World Health Organization. Statistical codes for health inequality analysis [Internet]. [cited 2024 Jan 18]. https://www.who.int/data/inequality-monitor/tools-resources/statistical_codes.
DHS Program. STATcompiler [Internet]. [cited 2024 Jan 18]. https://www.statcompiler.com/en/.
UNICEF. UNICEF Data Warehouse [Internet]. 2024 [cited 2024 Jan 25]. https://data.unicef.org/dv_index/.
World Health Organization. Health Inequality Data Repository [Internet]. 2023 [cited 2023 Sep 22]. https://www.who.int/data/inequality-monitor/data.
Moorthie S, Peacey V, Evans S, Phillips V, Roman-Urrestarazu A, Brayne C et al. A scoping review of approaches to improving quality of data relating to Health inequalities. Int J Environ Res Public Health. 2022;19(23).
Schlotheuber A, Hosseinpoor AR. Summary measures of Health Inequality: a review of existing measures and their application. Int J Environ Res Public Health. 2022;19(6).
Keppel K, Pamuk E, Lynch J, Carter-Pokras O, Kim Insun, Mays V et al. Methodological issues in Measuring Health disparities. Vital Health Stat 2. 2005;(141):1.
Harper S, King NB, Meersman SC, Reichman ME, Breen N, Lynch J. Implicit value judgments in the measurement of Health inequalities. Milbank Q. 2010;88(1):4.
World Health Organization. Health inequality monitoring foundations: Summary measures of health inequality [Internet]. 2023 [cited 2024 Jan 18]. https://openwho.org/courses/inequality-monitoring-summary-measures.
Public Health Information for Scotland. Measuring Health Inequalities [Internet]. [cited 2024 Jan 18]. https://www.scotpho.org.uk/methods-and-data/measuring-health-inequalities/.
Office for Health Improvement and Disparities. Fingertips guidance - Public Health methods [Internet]. [cited 2024 Jan 18]. https://fingertips.phe.org.uk/profile/guidance/supporting-information/PH-methods.
Every Woman Every Child Latin America and Caribbean (EWEC LAC). Step by step guide for measuring social inequalities in health. 2018.
World Health Organization. National health inequality monitoring: a step-by-step manual. World Health Organization. Geneva: World Health Organization; 2017. pp. 1–54.
Acknowledgements
The authors would like to acknowledge Sam Harper (McGill University), George Luta (Georgetown University) and Zev Ross (ZevRoss Spatial Analysis) for their support in developing the summary measure calculations used in the Health Equity Assessment Toolkit (HEAT and HEAT Plus), and Patricia Menéndez (Monash University) for her review of the calculations. The eLearning course was produced by members of the WHO Health Inequality Monitoring Team (Ahmad Reza Hosseinpoor, Katherine Kirkby, Andreia Baptista and Nicole Bergen) and benefited from review by Andrea Bertola (European Office for Investment for Health and Development), Jessico Ho and Annet Mahanani (WHO Department of Data and Analytics). Andreia Baptista is affiliated with NHS England at the time of publication of the article. The Excel workbooks were produced by the WHO Health Inequality Monitoring Team (Katherine Kirkby with review and feedback from Ahmad Reza Hosseinpoor and Anne Schlotheuber) and benefited from review by Oscar Mujica (WHO/PAHO).
Funding
Funding for the eLearning course ‘Inequality analysis using Excel: Disaggregated data’ was provided by Global Affairs Canada (GAC).
Author information
Authors and Affiliations
Contributions
ARH and KK led the conceptualisation of the paper. KK drafted the paper, which was reviewed and edited by ARH, AS and DA. All authors have read and agreed the manuscript. DA is affiliated with the Pan American Health Organization, Quito, Ecuador, at the time of publication of the article.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Disclaimer
The opinions expressed in this article are those of the authors and do not necessarily reflect the views of the WHO, its representatives, or the countries they represent.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests. The authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the World Health Organization.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original version of this article was revised: In Supplementary Material 3, the text in the ‘Interpretation’ column was updated for the following summary measures: aci, rci, sii, and rii.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Kirkby, K., Antiporta, D.A., Schlotheuber, A. et al. Making health inequality analysis accessible: WHO tools and resources using Microsoft Excel. Int J Equity Health 23, 205 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12939-024-02229-w
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12939-024-02229-w