Vital statistics linked birth/infant death and hospital discharge record linkage for epidemiological studies

Beate Herrchen, Jeffrey B. Gould, Thomas S Nesbitt

Research output: Contribution to journalArticle

109 Citations (Scopus)

Abstract

A methodology for linking vital statistics linked birth/death data and hospital discharge data is described. The resulting data set combines information on a neonate's sociodemographic characteristics, prenatal care, and mortality aspects and connects it to detailed health outcome and resource utilization data, thus establishing an extensive database for epidemiological studies. In the absence of a universal identifier common to both databases, our linkage strategy relied on using a virtual identifier based on variables common to both data sets. In the case of multiple incidences of the same virtual identifier we used secondary health status information to optimize the likelihood of linking low birth weight or premature infants in one database to infants of similar health status in the other while randomizing cases in which no secondary information was present. Applying our method to the 1992 California birth cohort, we could link 563,114 out of 571,189 eligible births (98.59%). Of these links, 91.2% were established on the basis of unique virtual identifiers. The link was internally consistent and no bias was evident when comparing variable distributions for all single live births in the vital statistics linked birth/death file and linked births in the linked vital statistics linked birth/death and hospital discharge file. Multiple imputation techniques showed that the prediction error incurred by randomization was negligible. Even though computationally intensive, our method for linking the vital statistics linked birth/death file and the hospital discharge file appeared to be effective. However, it is important to be aware of the limitations of the resulting data set, in particular the fact that it cannot be used for tracking individual cases. The method provides a database suitable for a variety of perinatal epidemiological analyses, such as descriptive studies of disease distribution in neonates, studies of the geographic distribution of disease, and studies of the relationship between risk and outcome.

Original languageEnglish (US)
Pages (from-to)290-305
Number of pages16
JournalComputers and Biomedical Research
Volume30
Issue number4
DOIs
StatePublished - Aug 1997

Fingerprint

Vital Statistics
Hospital Records
Epidemiologic Studies
Statistics
Parturition
Health
Databases
Health Status
Newborn Infant
Prenatal Care
Health Resources
Live Birth
Low Birth Weight Infant
Random Allocation
Infant Death
Premature Infants
Mortality
Incidence

ASJC Scopus subject areas

  • Medicine (miscellaneous)

Cite this

Vital statistics linked birth/infant death and hospital discharge record linkage for epidemiological studies. / Herrchen, Beate; Gould, Jeffrey B.; Nesbitt, Thomas S.

In: Computers and Biomedical Research, Vol. 30, No. 4, 08.1997, p. 290-305.

Research output: Contribution to journalArticle

@article{120aecdc9cb84a48a8dad7cbee064f7f,
title = "Vital statistics linked birth/infant death and hospital discharge record linkage for epidemiological studies",
abstract = "A methodology for linking vital statistics linked birth/death data and hospital discharge data is described. The resulting data set combines information on a neonate's sociodemographic characteristics, prenatal care, and mortality aspects and connects it to detailed health outcome and resource utilization data, thus establishing an extensive database for epidemiological studies. In the absence of a universal identifier common to both databases, our linkage strategy relied on using a virtual identifier based on variables common to both data sets. In the case of multiple incidences of the same virtual identifier we used secondary health status information to optimize the likelihood of linking low birth weight or premature infants in one database to infants of similar health status in the other while randomizing cases in which no secondary information was present. Applying our method to the 1992 California birth cohort, we could link 563,114 out of 571,189 eligible births (98.59{\%}). Of these links, 91.2{\%} were established on the basis of unique virtual identifiers. The link was internally consistent and no bias was evident when comparing variable distributions for all single live births in the vital statistics linked birth/death file and linked births in the linked vital statistics linked birth/death and hospital discharge file. Multiple imputation techniques showed that the prediction error incurred by randomization was negligible. Even though computationally intensive, our method for linking the vital statistics linked birth/death file and the hospital discharge file appeared to be effective. However, it is important to be aware of the limitations of the resulting data set, in particular the fact that it cannot be used for tracking individual cases. The method provides a database suitable for a variety of perinatal epidemiological analyses, such as descriptive studies of disease distribution in neonates, studies of the geographic distribution of disease, and studies of the relationship between risk and outcome.",
author = "Beate Herrchen and Gould, {Jeffrey B.} and Nesbitt, {Thomas S}",
year = "1997",
month = "8",
doi = "10.1006/cbmr.1997.1448",
language = "English (US)",
volume = "30",
pages = "290--305",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",
number = "4",

}

TY - JOUR

T1 - Vital statistics linked birth/infant death and hospital discharge record linkage for epidemiological studies

AU - Herrchen, Beate

AU - Gould, Jeffrey B.

AU - Nesbitt, Thomas S

PY - 1997/8

Y1 - 1997/8

N2 - A methodology for linking vital statistics linked birth/death data and hospital discharge data is described. The resulting data set combines information on a neonate's sociodemographic characteristics, prenatal care, and mortality aspects and connects it to detailed health outcome and resource utilization data, thus establishing an extensive database for epidemiological studies. In the absence of a universal identifier common to both databases, our linkage strategy relied on using a virtual identifier based on variables common to both data sets. In the case of multiple incidences of the same virtual identifier we used secondary health status information to optimize the likelihood of linking low birth weight or premature infants in one database to infants of similar health status in the other while randomizing cases in which no secondary information was present. Applying our method to the 1992 California birth cohort, we could link 563,114 out of 571,189 eligible births (98.59%). Of these links, 91.2% were established on the basis of unique virtual identifiers. The link was internally consistent and no bias was evident when comparing variable distributions for all single live births in the vital statistics linked birth/death file and linked births in the linked vital statistics linked birth/death and hospital discharge file. Multiple imputation techniques showed that the prediction error incurred by randomization was negligible. Even though computationally intensive, our method for linking the vital statistics linked birth/death file and the hospital discharge file appeared to be effective. However, it is important to be aware of the limitations of the resulting data set, in particular the fact that it cannot be used for tracking individual cases. The method provides a database suitable for a variety of perinatal epidemiological analyses, such as descriptive studies of disease distribution in neonates, studies of the geographic distribution of disease, and studies of the relationship between risk and outcome.

AB - A methodology for linking vital statistics linked birth/death data and hospital discharge data is described. The resulting data set combines information on a neonate's sociodemographic characteristics, prenatal care, and mortality aspects and connects it to detailed health outcome and resource utilization data, thus establishing an extensive database for epidemiological studies. In the absence of a universal identifier common to both databases, our linkage strategy relied on using a virtual identifier based on variables common to both data sets. In the case of multiple incidences of the same virtual identifier we used secondary health status information to optimize the likelihood of linking low birth weight or premature infants in one database to infants of similar health status in the other while randomizing cases in which no secondary information was present. Applying our method to the 1992 California birth cohort, we could link 563,114 out of 571,189 eligible births (98.59%). Of these links, 91.2% were established on the basis of unique virtual identifiers. The link was internally consistent and no bias was evident when comparing variable distributions for all single live births in the vital statistics linked birth/death file and linked births in the linked vital statistics linked birth/death and hospital discharge file. Multiple imputation techniques showed that the prediction error incurred by randomization was negligible. Even though computationally intensive, our method for linking the vital statistics linked birth/death file and the hospital discharge file appeared to be effective. However, it is important to be aware of the limitations of the resulting data set, in particular the fact that it cannot be used for tracking individual cases. The method provides a database suitable for a variety of perinatal epidemiological analyses, such as descriptive studies of disease distribution in neonates, studies of the geographic distribution of disease, and studies of the relationship between risk and outcome.

UR - http://www.scopus.com/inward/record.url?scp=0031213517&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031213517&partnerID=8YFLogxK

U2 - 10.1006/cbmr.1997.1448

DO - 10.1006/cbmr.1997.1448

M3 - Article

VL - 30

SP - 290

EP - 305

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - 4

ER -