Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort

Mark N. Kvale, Stephanie Hesselson, Thomas J. Hoffmann, Yang Cao, David Chan, Sheryl Connell, Lisa A. Croen, Brad P. Dispensa, Jasmin Eshragh, Andrea Finn, Jeremy Gollub, Carlos Iribarren, Eric Jorgenson, Lawrence H. Kush, Richard Lao, Yontao Lu, Dana Ludwig, Gurpreet K. Mathaud, William B. McGuire, Gangwu MeiSunita Miles, Michael Mittman, Mohini Patil, Charles P. Quesenberry, Dilrini Ranatunga, Sarah Rowell, Marianne Sadler, Lori C. Sakoda, Michael Shapero, Ling Shen, Tanu Shenoy, David Smethurst, Carol P. Somkin, Stephen K. Van Den Eeden, Lawrence Walter, Eunice Wan, Teresa Webster, Rachel Whitmer, Simon Wong, Chia Zau, Yiping Zhan, Catherine Schaefer, Pui Yan Kwok, Neil Risch

Research output: Contribution to journalArticle

54 Citations (Scopus)

Abstract

The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California—San Francisco, undertook genome-wide genotyping of >100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated >70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8%), with a range of 92.1-95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4% across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions.

Original languageEnglish (US)
Pages (from-to)1051-1060
Number of pages10
JournalGenetics
Volume200
Issue number4
DOIs
StatePublished - Aug 1 2015
Externally publishedYes

Fingerprint

Genetic Research
Informatics
Molecular Epidemiology
Quality Control
Genotype
Single Nucleotide Polymorphism
Health
Genome
Electronic Health Records
Genome-Wide Association Study
Saliva
DNA
Research
Genes

Keywords

  • Affymetrix axiom
  • Genome-wide genotyping
  • GERA cohort
  • Quality control
  • Saliva DNA

ASJC Scopus subject areas

  • Genetics

Cite this

Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. / Kvale, Mark N.; Hesselson, Stephanie; Hoffmann, Thomas J.; Cao, Yang; Chan, David; Connell, Sheryl; Croen, Lisa A.; Dispensa, Brad P.; Eshragh, Jasmin; Finn, Andrea; Gollub, Jeremy; Iribarren, Carlos; Jorgenson, Eric; Kush, Lawrence H.; Lao, Richard; Lu, Yontao; Ludwig, Dana; Mathaud, Gurpreet K.; McGuire, William B.; Mei, Gangwu; Miles, Sunita; Mittman, Michael; Patil, Mohini; Quesenberry, Charles P.; Ranatunga, Dilrini; Rowell, Sarah; Sadler, Marianne; Sakoda, Lori C.; Shapero, Michael; Shen, Ling; Shenoy, Tanu; Smethurst, David; Somkin, Carol P.; Van Den Eeden, Stephen K.; Walter, Lawrence; Wan, Eunice; Webster, Teresa; Whitmer, Rachel; Wong, Simon; Zau, Chia; Zhan, Yiping; Schaefer, Catherine; Kwok, Pui Yan; Risch, Neil.

In: Genetics, Vol. 200, No. 4, 01.08.2015, p. 1051-1060.

Research output: Contribution to journalArticle

Kvale, MN, Hesselson, S, Hoffmann, TJ, Cao, Y, Chan, D, Connell, S, Croen, LA, Dispensa, BP, Eshragh, J, Finn, A, Gollub, J, Iribarren, C, Jorgenson, E, Kush, LH, Lao, R, Lu, Y, Ludwig, D, Mathaud, GK, McGuire, WB, Mei, G, Miles, S, Mittman, M, Patil, M, Quesenberry, CP, Ranatunga, D, Rowell, S, Sadler, M, Sakoda, LC, Shapero, M, Shen, L, Shenoy, T, Smethurst, D, Somkin, CP, Van Den Eeden, SK, Walter, L, Wan, E, Webster, T, Whitmer, R, Wong, S, Zau, C, Zhan, Y, Schaefer, C, Kwok, PY & Risch, N 2015, 'Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort', Genetics, vol. 200, no. 4, pp. 1051-1060. https://doi.org/10.1534/genetics.115.178905
Kvale, Mark N. ; Hesselson, Stephanie ; Hoffmann, Thomas J. ; Cao, Yang ; Chan, David ; Connell, Sheryl ; Croen, Lisa A. ; Dispensa, Brad P. ; Eshragh, Jasmin ; Finn, Andrea ; Gollub, Jeremy ; Iribarren, Carlos ; Jorgenson, Eric ; Kush, Lawrence H. ; Lao, Richard ; Lu, Yontao ; Ludwig, Dana ; Mathaud, Gurpreet K. ; McGuire, William B. ; Mei, Gangwu ; Miles, Sunita ; Mittman, Michael ; Patil, Mohini ; Quesenberry, Charles P. ; Ranatunga, Dilrini ; Rowell, Sarah ; Sadler, Marianne ; Sakoda, Lori C. ; Shapero, Michael ; Shen, Ling ; Shenoy, Tanu ; Smethurst, David ; Somkin, Carol P. ; Van Den Eeden, Stephen K. ; Walter, Lawrence ; Wan, Eunice ; Webster, Teresa ; Whitmer, Rachel ; Wong, Simon ; Zau, Chia ; Zhan, Yiping ; Schaefer, Catherine ; Kwok, Pui Yan ; Risch, Neil. / Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. In: Genetics. 2015 ; Vol. 200, No. 4. pp. 1051-1060.
@article{04b22a0e70ab4682adc079e17262a093,
title = "Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort",
abstract = "The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California—San Francisco, undertook genome-wide genotyping of >100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated >70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8{\%}), with a range of 92.1-95.4{\%} for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4{\%} across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions.",
keywords = "Affymetrix axiom, Genome-wide genotyping, GERA cohort, Quality control, Saliva DNA",
author = "Kvale, {Mark N.} and Stephanie Hesselson and Hoffmann, {Thomas J.} and Yang Cao and David Chan and Sheryl Connell and Croen, {Lisa A.} and Dispensa, {Brad P.} and Jasmin Eshragh and Andrea Finn and Jeremy Gollub and Carlos Iribarren and Eric Jorgenson and Kush, {Lawrence H.} and Richard Lao and Yontao Lu and Dana Ludwig and Mathaud, {Gurpreet K.} and McGuire, {William B.} and Gangwu Mei and Sunita Miles and Michael Mittman and Mohini Patil and Quesenberry, {Charles P.} and Dilrini Ranatunga and Sarah Rowell and Marianne Sadler and Sakoda, {Lori C.} and Michael Shapero and Ling Shen and Tanu Shenoy and David Smethurst and Somkin, {Carol P.} and {Van Den Eeden}, {Stephen K.} and Lawrence Walter and Eunice Wan and Teresa Webster and Rachel Whitmer and Simon Wong and Chia Zau and Yiping Zhan and Catherine Schaefer and Kwok, {Pui Yan} and Neil Risch",
year = "2015",
month = "8",
day = "1",
doi = "10.1534/genetics.115.178905",
language = "English (US)",
volume = "200",
pages = "1051--1060",
journal = "Genetics",
issn = "0016-6731",
publisher = "Genetics Society of America",
number = "4",

}

TY - JOUR

T1 - Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort

AU - Kvale, Mark N.

AU - Hesselson, Stephanie

AU - Hoffmann, Thomas J.

AU - Cao, Yang

AU - Chan, David

AU - Connell, Sheryl

AU - Croen, Lisa A.

AU - Dispensa, Brad P.

AU - Eshragh, Jasmin

AU - Finn, Andrea

AU - Gollub, Jeremy

AU - Iribarren, Carlos

AU - Jorgenson, Eric

AU - Kush, Lawrence H.

AU - Lao, Richard

AU - Lu, Yontao

AU - Ludwig, Dana

AU - Mathaud, Gurpreet K.

AU - McGuire, William B.

AU - Mei, Gangwu

AU - Miles, Sunita

AU - Mittman, Michael

AU - Patil, Mohini

AU - Quesenberry, Charles P.

AU - Ranatunga, Dilrini

AU - Rowell, Sarah

AU - Sadler, Marianne

AU - Sakoda, Lori C.

AU - Shapero, Michael

AU - Shen, Ling

AU - Shenoy, Tanu

AU - Smethurst, David

AU - Somkin, Carol P.

AU - Van Den Eeden, Stephen K.

AU - Walter, Lawrence

AU - Wan, Eunice

AU - Webster, Teresa

AU - Whitmer, Rachel

AU - Wong, Simon

AU - Zau, Chia

AU - Zhan, Yiping

AU - Schaefer, Catherine

AU - Kwok, Pui Yan

AU - Risch, Neil

PY - 2015/8/1

Y1 - 2015/8/1

N2 - The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California—San Francisco, undertook genome-wide genotyping of >100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated >70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8%), with a range of 92.1-95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4% across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions.

AB - The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California—San Francisco, undertook genome-wide genotyping of >100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated >70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8%), with a range of 92.1-95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4% across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions.

KW - Affymetrix axiom

KW - Genome-wide genotyping

KW - GERA cohort

KW - Quality control

KW - Saliva DNA

UR - http://www.scopus.com/inward/record.url?scp=84939426058&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84939426058&partnerID=8YFLogxK

U2 - 10.1534/genetics.115.178905

DO - 10.1534/genetics.115.178905

M3 - Article

C2 - 26092718

AN - SCOPUS:84939426058

VL - 200

SP - 1051

EP - 1060

JO - Genetics

JF - Genetics

SN - 0016-6731

IS - 4

ER -