Repetitive element signature-based visualization, distance computation, and classification of 1766 microbial genomes

Kang Hoon Lee, Kyung Seop Shin, Debora Lim, Woo Chan Kim, Byung Chang Chung, Gyu Bum Han, Jeongkyu Roh, Dong Ho Cho, Kiho Cho

Research output: Contribution to journalArticle

Abstract

The genomes of living organisms are populated with pleomorphic repetitive elements (REs) of varying densities. Our hypothesis that genomic RE landscapes are species/strain/individual-specific was implemented into the Genome Signature Imaging system to visualize and compute the RE-based signatures of any genome. Following the occurrence profiling of 5-nucleotide REs/words, the information from top-50 frequency words was transformed into a genome-specific signature and visualized as Genome Signature Images (GSIs), using a CMYK scheme. An algorithm for computing distances among GSIs was formulated using the GSIs' variables (word identity, frequency, and frequency order). The utility of the GSI-distance computation system was demonstrated with control genomes. GSI-based computation of genome-relatedness among 1766 microbes (117 archaea and 1649 bacteria) identified their clustering patterns; although the majority paralleled the established classification, some did not. The Genome Signature Imaging system, with its visualization and distance computation functions, enables genome-scale evolutionary studies involving numerous genomes with varying sizes.

Original languageEnglish (US)
Pages (from-to)30-42
Number of pages13
JournalGenomics
Volume106
Issue number1
DOIs
StatePublished - Jul 1 2015

Fingerprint

Microbial Genome
Genome
Archaea

Keywords

  • Genome distance
  • Genome signature
  • Genome visualization
  • Genome-scale classification
  • Microbial genomes
  • Repetitive element

ASJC Scopus subject areas

  • Genetics

Cite this

Repetitive element signature-based visualization, distance computation, and classification of 1766 microbial genomes. / Lee, Kang Hoon; Shin, Kyung Seop; Lim, Debora; Kim, Woo Chan; Chung, Byung Chang; Han, Gyu Bum; Roh, Jeongkyu; Cho, Dong Ho; Cho, Kiho.

In: Genomics, Vol. 106, No. 1, 01.07.2015, p. 30-42.

Research output: Contribution to journalArticle

Lee, Kang Hoon ; Shin, Kyung Seop ; Lim, Debora ; Kim, Woo Chan ; Chung, Byung Chang ; Han, Gyu Bum ; Roh, Jeongkyu ; Cho, Dong Ho ; Cho, Kiho. / Repetitive element signature-based visualization, distance computation, and classification of 1766 microbial genomes. In: Genomics. 2015 ; Vol. 106, No. 1. pp. 30-42.
@article{2707ebe4766b4ae39065df925681f2c6,
title = "Repetitive element signature-based visualization, distance computation, and classification of 1766 microbial genomes",
abstract = "The genomes of living organisms are populated with pleomorphic repetitive elements (REs) of varying densities. Our hypothesis that genomic RE landscapes are species/strain/individual-specific was implemented into the Genome Signature Imaging system to visualize and compute the RE-based signatures of any genome. Following the occurrence profiling of 5-nucleotide REs/words, the information from top-50 frequency words was transformed into a genome-specific signature and visualized as Genome Signature Images (GSIs), using a CMYK scheme. An algorithm for computing distances among GSIs was formulated using the GSIs' variables (word identity, frequency, and frequency order). The utility of the GSI-distance computation system was demonstrated with control genomes. GSI-based computation of genome-relatedness among 1766 microbes (117 archaea and 1649 bacteria) identified their clustering patterns; although the majority paralleled the established classification, some did not. The Genome Signature Imaging system, with its visualization and distance computation functions, enables genome-scale evolutionary studies involving numerous genomes with varying sizes.",
keywords = "Genome distance, Genome signature, Genome visualization, Genome-scale classification, Microbial genomes, Repetitive element",
author = "Lee, {Kang Hoon} and Shin, {Kyung Seop} and Debora Lim and Kim, {Woo Chan} and Chung, {Byung Chang} and Han, {Gyu Bum} and Jeongkyu Roh and Cho, {Dong Ho} and Kiho Cho",
year = "2015",
month = "7",
day = "1",
doi = "10.1016/j.ygeno.2015.04.004",
language = "English (US)",
volume = "106",
pages = "30--42",
journal = "Genomics",
issn = "0888-7543",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - Repetitive element signature-based visualization, distance computation, and classification of 1766 microbial genomes

AU - Lee, Kang Hoon

AU - Shin, Kyung Seop

AU - Lim, Debora

AU - Kim, Woo Chan

AU - Chung, Byung Chang

AU - Han, Gyu Bum

AU - Roh, Jeongkyu

AU - Cho, Dong Ho

AU - Cho, Kiho

PY - 2015/7/1

Y1 - 2015/7/1

N2 - The genomes of living organisms are populated with pleomorphic repetitive elements (REs) of varying densities. Our hypothesis that genomic RE landscapes are species/strain/individual-specific was implemented into the Genome Signature Imaging system to visualize and compute the RE-based signatures of any genome. Following the occurrence profiling of 5-nucleotide REs/words, the information from top-50 frequency words was transformed into a genome-specific signature and visualized as Genome Signature Images (GSIs), using a CMYK scheme. An algorithm for computing distances among GSIs was formulated using the GSIs' variables (word identity, frequency, and frequency order). The utility of the GSI-distance computation system was demonstrated with control genomes. GSI-based computation of genome-relatedness among 1766 microbes (117 archaea and 1649 bacteria) identified their clustering patterns; although the majority paralleled the established classification, some did not. The Genome Signature Imaging system, with its visualization and distance computation functions, enables genome-scale evolutionary studies involving numerous genomes with varying sizes.

AB - The genomes of living organisms are populated with pleomorphic repetitive elements (REs) of varying densities. Our hypothesis that genomic RE landscapes are species/strain/individual-specific was implemented into the Genome Signature Imaging system to visualize and compute the RE-based signatures of any genome. Following the occurrence profiling of 5-nucleotide REs/words, the information from top-50 frequency words was transformed into a genome-specific signature and visualized as Genome Signature Images (GSIs), using a CMYK scheme. An algorithm for computing distances among GSIs was formulated using the GSIs' variables (word identity, frequency, and frequency order). The utility of the GSI-distance computation system was demonstrated with control genomes. GSI-based computation of genome-relatedness among 1766 microbes (117 archaea and 1649 bacteria) identified their clustering patterns; although the majority paralleled the established classification, some did not. The Genome Signature Imaging system, with its visualization and distance computation functions, enables genome-scale evolutionary studies involving numerous genomes with varying sizes.

KW - Genome distance

KW - Genome signature

KW - Genome visualization

KW - Genome-scale classification

KW - Microbial genomes

KW - Repetitive element

UR - http://www.scopus.com/inward/record.url?scp=84930540255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84930540255&partnerID=8YFLogxK

U2 - 10.1016/j.ygeno.2015.04.004

DO - 10.1016/j.ygeno.2015.04.004

M3 - Article

VL - 106

SP - 30

EP - 42

JO - Genomics

JF - Genomics

SN - 0888-7543

IS - 1

ER -