Chapter 13. Institution-based data sources
Overview
Institution-based data sources capture an array of sources that collect data during administrative and routine activities. This broad class of data sources is diverse in scope and the type of information collected. Institution-based data sources may collect data within the health sector (e.g. at clinics, hospitals or other health service points) or outside of it (e.g. as part of food and agricultural, occupational, police, tax or welfare records). Institution-based data include information collected by commissioners and funders of health care, such as large-scale purchasing organizations and medical insurance agencies. Institution-based data sources may contain information about health indicators and/or dimensions of inequality. This information may be available at an individual level or for small areas such as districts or municipalities.
This chapter describes the general characteristics of various institution-based data sources within the health sector (individual, service and resource records) and provides an overview of sources outside the health sector. It covers considerations for assessing the quality of these data sources and outlines their potential applications for health inequality monitoring, including how they may be used in conjunction with other sources. It offers insights into how institution-based sources are part of the data landscape for health inequality monitoring, and how inequality monitoring can benefit from strengthened and expanded institution-based data sources.
Institution-based data sources within the health sector
Institution-based data sources within the health sector contain routine and administrative information collected and recorded by and at health facilities (i.e. clinics, hospitals, and other public, private or community-based health service points). Data are collected through individual, service and resource records (1) and function as data inputs into overarching routine health information systems (RHIS; see Box 13.1). By design, data are collected from all health facilities and clinical services (ideally capturing the public and private sectors), thereby yielding a rich source of information about disease and health status; preventive measures such as vaccination and reproductive health services and screening; and the broader operations of the health sector.
BOX 13.1. What are routine health information systems?
RHIS are systems to regularly record, analyse, report and present routinely collected data from health facilities and by health facility staff. The data within RHIS provide information about the services delivered at health facilities, and information about individuals accessing those facilities, including their health status. RHIS data are primarily used for monitoring health service performance, for operational management of health facilities, including planning. Regular collection and analysis of these data allows frequent and current assessments of population health at the district level. RHIS data are also part of health-sector reviews at the national and subnational levels and may be useful for inequality monitoring activities, especially if district-level data can be linked to other sources of data about relevant dimensions of inequality.
The WHO Toolkit for Routine Health Information Systems Data supports the introduction of standards for health data collection at facilities, capacity-building to optimize analysis, and use of routine facility data, promoting an integrated, standards-based approach using a set of internationally recommended standardized core indicators with standard analyses, visualizations with dashboards, and guidance for data use (2).
Definitions of institution-based data sources in the health sector and approaches to classifying health data sources vary across contexts and applications. The United States Agency for International Development definition, in its resource Health information system strengthening: standards and best practices for data sources, includes “routine, administrative data sources as well as cross-sectional data collected through health facility assessments” (3). The WHO Health Metrics Network definition, in the second edition of the Framework and standards for country health information systems, includes “institution-based sources generate data as a result of administrative and operational activities [within and outside the health sector]” (1). For the purposes of this book, routine data generated across all health facilities (i.e. institution-based data sources within the health sector, discussed in this chapter) are considered separately from health facility assessments and other sources that collect data about health facilities, which are covered in Chapter 14.
Institution-based data sources include sources that collect information in the course of administrative and operational activities at institutions. Institution-based sources contain data only about people who have interacted with the institution.
Individual health records capture information to manage health services provided to individual clients in health institutions or through outreach in the community (1). They include primary care consultation records; case reports and disease records routinely produced by health workers; records on individual clients to monitor growth or antenatal and delivery care; and information in special disease registries, such as for cancer. The type of information encompasses individual demographics, health status, risk factors and medical history data. Individual records are retrievable by health workers and can provide a longitudinal assessment of an individual’s progress and outcomes.
The increasing digitalization of health and medical records allows data to be standardized, managed, aggregated, shared and analysed more easily. Electronic health records are nearly ubiquitous in high-income country contexts, but this is not (yet) the case in low- and middle-income countries. Increasingly, low- and middle-income countries are introducing simpler individual medical record systems that capture client profiles and essential information for case management, providing longitudinal data for disease prevention and control programmes (see also surveillance systems, covered in Chapter 14).
Health service records at the health facility level contain service-generated data such as testing and diagnosis, financial costs, activity statistics, quality of care, care offered and treatments administered (1). Ideally, they should collect data using a standardized reporting form and a systematic manner that permits comparisons across facilities, regions and time (noting that facilities may have different protocols due to health information system fragmentation or multiple managing authorities, which limits data comparability). A primary use of service records is to yield locally relevant data for managing local health services and generating national statistics on health service use, health service coverage and health system performance. To estimate health service coverage, the data derived from health service records may need to be combined with denominator estimates (the number of people targeted or the population in the catchment area) derived from other sources. In low- and middle-income countries, data from health service records may be more readily available than data from electronic health records.
Resource records contain information about the quality, availability, readiness and logistics of health service inputs (1). This includes data about the density and distribution of health facilities, and data related to human resources for health (according to qualifications and training), budgets and expenditures, medicines, and other core commodities and services (sometimes referred to as the logistics management information system). The use of geographic information system software may aid in assessing the location of service delivery sites and administrative boundaries and catchment areas (see Chapter 16).
Institution-based data sources outside the health sector
Outside the health sector, institution-based sources include records kept by other institutions such as national statistical offices, the police, veterinary services, insurance companies, environmental health authorities, tax and welfare agencies, and occupational health agencies. These records are numerous and diverse. They may contain information about health and/or determinants of health (Box 13.2). They may also be sources for information about dimensions of inequality – in which case, they may need to be linked to sources of health data for use in health inequality monitoring.
BOX 13.2. Examples of health-related institution-based data sources outside the health sector
The WHO/UNICEF Joint Monitoring Programme for Water Supply, Sanitation and Hygiene sources data about water, sanitation and hygiene indicators from ministries of water and sanitation, education and health, and regulatory agencies (4, 5).
Data on the health of people in prisons and other closed settings are collected by administrative systems in such settings and collated by the responsible government institution. These sources may also contain information about relevant dimensions of inequality (e.g. age, ethnicity, sex) and other details about the criminal history of individuals.
Data about migrants, including information on the migration of health workers, can be sourced from administrative records. For example, information about new entries of migrants can be traced through administrative registration for residence or working permits from interior affairs or immigration services, foreign employment departments and other administrative services or border registration (6). Cross-border data, including routine screening of passengers, may be collected by the ministry of aviation or transportation. Information such as age, country of origin, legal status and sex may be available. For more on inequality monitoring among refugee and migrant populations, see Chapter 5.
Strengths and limitations of institution-based data sources
General strengths of institution-based data sources are that the data tend to cover large numbers of people or large population areas and are generated on a recurring basis. The data are widely available across sectors, because almost every government ministry (e.g. education, finance, health, justice, social welfare) has administrative records that could be used to source data for health inequality monitoring. When functioning well, institution-based data sources contain consistent and accurate data. Data quality, however, can vary and is contingent on the standards and practices surrounding data collection, processing and access (see Chapter 11). In addition to quality issues, data comparability may be a limitation if, for example, definitions are not harmonized across ministries; international definitions are not used; or administrative records are discontinued, changed or altered over time to align with shifting policy, legislation, regulation or political environments. The Data Quality Assurance (DQA) toolkit provides a harmonized approach for assessing and improving the quality of health facility data (Box 13.3).
BOX 13.3. Data Quality Assurance toolkit
Recognizing the importance of ensuring high-quality data from health facilities, the DQA toolkit (previously known as the Data Quality Review) was developed to provide a harmonized methodology and common language applicable across diverse contexts (7). The objectives of the DQA toolkit are to institutionalize a system for assessing data quality; identify weaknesses in data management systems and interventions for system strengthening; and monitor the performance of data quality over time. The methodology is a multipronged approach that includes desk reviews and site assessments, carried out as part of routine and regular data-quality checks and discrete or cross-sectional assessments.
Institution-based data sources within the health sector can provide detailed information about uptake and outcomes of health services in a particular setting. This is especially the case for disease programmes with dedicated financing and support for monitoring and reporting. The breadth of data and how information is recorded, however, reflect the underlying administrative purposes and may limit the usefulness of the data for inequality monitoring. For example, administrative records often contain information about geographical location, but data about multiple other inequality dimensions, especially socioeconomic factors, tend to be limited.
Institution-based data sources may be limited in their coverage because they generally collect information only from people or populations that interact with a given institution and for whom records are kept. Medical or employment records, for example, do not provide information on people who do not interact with the health sector or people who are not formally employed. Missing data may be an issue if records are incomplete or contain errors, or if an individual does not have a required form of identification such as a health card or national identification. The possession of official or appropriate identification can be problematic for people in vulnerable groups and settings, such as Indigenous Peoples, refugees and transgender people. In such cases, data from institution-based sources are unlikely to be representative of the whole population in a particular area, and household health surveys may be an important complementary data source (see Chapter 12).
There may be differences in accessibility and scope of data from public, private and community institutions, with implications for health inequality monitoring. Data from public sources are more likely to be linked across other public sources and to be nationally reported, with more systematic data collection and reporting protocols. Data from private institutions may not have standardized records and, depending on the context, may be difficult to access by people outside the institution. Data from private institutions may be more likely to be excluded from national collation and reporting. Likewise, data from community or nongovernmental facilities may be overlooked in national reports, unless facilities are part of national networks.
Across countries, there is variability in the extent to which institutional records can be internally integrated and aggregated, which has implications for the ability to make comparisons within and across countries. In the health sector, some countries are able to integrate data across different levels of care (e.g. primary, secondary and tertiary) and can track individuals through the continuum of care. In other countries, separate data collection and analysis systems may be in place due to lack of linking mechanisms, lack of information technology systems or infrastructure interoperability, or lack of information-sharing mechanisms between facilities. Health sectors of some countries may be more integrated than others, but this brings other legal, governance and information technology challenges.
The use of standardized electronic medical records and protocols across institutions helps enhance the reliability and comparability of data, especially when there are protocols for checking data completeness, consistency and accuracy (and measures to adjust statistics based on such findings). The use of electronic records creates possibilities for aggregating data, such as across facilities within an area. Electronic forms are, however, subject to limitations that can arise if data entry and coding are inaccurate or if there is a systematic error (e.g. a default code that is applied when information is not available or not entered can lead to misleading inferences that would not be encountered with paper forms). Additionally, there is no defined standardized mechanism for data-sharing. In contexts where paper records are still used, errors may occur during data registry or data entry or due to the physical degradation of records.
The accessibility of institution-based sources may prove a substantial barrier to the use of these data in some applications or contexts. Accessing individual health records, for example, requires that records are sufficiently anonymized to protect confidentiality and/or data security measures are in place to regulate who can access the data and when, where and for what purposes the data can be used. In some cases, data held by private institutions may have tight restrictions on external access, allowing the data to be used only in legally mandated reporting. For data that are to be linked across data sources, personal or small-area identifiers such as postal codes need to be retained – which may make anonymization particularly challenging, leading to complex and time-consuming procedures to gain access to the data (8).
Use of institution-based data sources for inequality monitoring
Institution-based data sources have some advantages over other data sources for use in health inequality monitoring. Compared with household health surveys and censuses (see Chapter 12), data from institution-based sources are collected closer to real time, which allows for more timely reporting and tracking of trends. Also, administrative data are collected as a part of routine activities, providing a more feasible and affordable alternative to many population-based sources. Compared with civil registration and vital statistics systems and census data, institution-based data sources in the health sector represent a greater, albeit different, range of indicators about health services, outcomes and health sector activities, along with basic information derived from individual profiles such as age, location, occupation or sex.
Data from institution-based sources in the health sector are often collected at local levels of the health-care system, thus presenting opportunities for monitoring inequalities between lower administrative levels such as districts or municipalities. The inclusion of small-area identifiers enables the practice of data linking, whereby data are combined across multiple sources through common overlapping information (see Chapter 15). For example, health data collected at the district level can be combined with district-level socioeconomic data (generated through other institutional sources, censuses or surveys), yielding district-level disaggregated data for inequality monitoring. Data linking at the individual level may also be possible, provided individual-level identifiers are present in both data sources and appropriate ethical protocols and data security protocols (see Chapter 4) are in place.
When using data from institution-based data sources to derive rates or coverage, the challenge of estimating denominator values may arise. For example, an institution-based data source may contain information about the number of people who received a health service but not the number of people eligible for or in need of receiving that service. As another example, an institution-based data source may contain information about the number of people diagnosed with a particular health condition but not the total size of the population. In some cases, the denominator data can be estimated based on information from another source, such as a census, a different administrative data source or a routine population estimate, although there may be limitations. For example, the timing of data collection efforts may be misaligned between data sources, leading to unreliable denominator estimates. Furthermore, people may access services outside the area where they live and therefore may not be represented in the appropriate denominator group if it is derived from the catchment area (although if the usual residence is recorded on the service record, indicators may be constructed using the area of residence rather than the area of service usage). Where available, geospatial data such as satellite images may be an alternative source to estimate denominator values (see Chapter 16).
To enhance the usability of institution-based data sources for health inequality monitoring, sources could expand the collection of data about key inequality dimensions such as age, geographical location, sex and socioeconomic factors. This would ensure different dimensions of inequality could be linked to relevant indicators of health and/or health determinants. In many contexts, greater efforts are needed to facilitate sharing and exchange of data across sectors, which may entail strengthening collaborations across various sectors and levels of government.
References
1. Health Metrics Network. Framework and standards for country health information systems, 2nd edition. Geneva: World Health Organization; 2008 (https://iris.who.int/handle/10665/43872, accessed 15 May 2024).
2. WHO toolkit for routine health information systems data. Geneva: World Health Organization (https://www.who.int/data/data-collection-tools/health-service-data/toolkit-for-routine-health-information-system-data/modules, accessed 15 May 2024).
3. Greenwell F, Salentine S. Health information system strengthening: standards and best practices for data sources. Chapel Hill, NC: United States Agency for International Development; 2018 (https://www.measureevaluation.org/resources/publications/tr-17-225/at_download/document, accessed 5 June 2024).
4. World Health Organization/United Nations Children’s Fund Joint Monitoring Programme for Water Supply, Sanitation and Hygiene. JMP methodology for WASH in schools. Geneva: World Health Organization; 2021 (https://washdata.org/reports/jmp-2021-methodology-wash-in-schools, accessed 15 May 2024).
5. World Health Organization/United Nations Children’s Fund Joint Monitoring Programme for Water Supply, Sanitation and Hygiene. Data sources. Geneva: World Health Organization (https://washdata.org/monitoring/methods/data-sources, accessed 15 May 2024).
6. International labour migration statistics (ILMS database). Geneva: International Labour Organization (https://ilostat.ilo.org/methods/concepts-and-definitions/description-international-labour-migration-statistics/, accessed 15 May 2024).
7. Data quality assurance (DQA). Geneva: World Health Organization (https://www.who.int/data/data-collection-tools/health-service-data/data-quality-assurance-dqa, accessed 15 May 2024).
8. Harron K, Dibben C, Boyd J, Hjern A, Azimaee M, Barreto ML, et al. Challenges in administrative data linkage for research. Big Data Soc. 2017;4(2):205395171774567. doi:10.1177/2053951717745678.