Corresponding
Author Email address: vinods@igib.in
The tubercle complex consists of closely
related mycobacterium species which appears to be variants of a single species.
Comparative genome analysis of different strains could provide useful clues and
insights into the genetic diversity of the species. We integrated genome
assemblies of 96 strains from Mycobacterium tuberculosis complex (MTBC) which
included 8 Indian clinical isolates sequenced and assembled in this study, to
understand its pangenome architecture. We predicted genes for all the 96
strains and clustered their respective CDSs into homologous gene clusters
(HGCs) to reveal a hard-core, soft-core and accessory genome component of MTBC.
The hard core (HGCs shared amongst 100% of the strains) was comprised of 2,066
gene clusters whereas the soft core (HGCs shared amongst at-least 95% of the
strains) comprised of 3,374 gene clusters. The change in the core and
dispensable genome components when observed as a function of their size
revealed that MTBC has an open pangenome. We identified 74 HGCs absent from
reference strains H37Rv and H37Ra but were present in majority of clinical
isolates. We report PCR validation on 10 candidate genes depicting 8 genes
completely absent from H37Rv and H37Ra whereas 2 genes shared partial homology
with them accounting to probable insertion and deletion events. The pangenome
approach is a promising tool for studying strain specific genetic differences
occurring within species. We also suggest that since selecting appropriate
target genes for typing purposes requires the expected target gene be present
in all isolates being typed, therefore, estimating the core-component of the
species becomes a subject of prime importance.
Links to the Data:
Strain Id/SRA Accession |
Contigs |
SRA |
OSDD487/SRR786669 |
||
OSDD472/SRR786667 |
||
OSDD071/SRR786373 |
||
OSDD326/SRR786668 |
||
OSDD518/SRR786670 |
||
OSDD630/SRR786188 |
||
OSDD386/SRR784917 |
||
OSDD504/SRR786397 |
Predicted genes for all 96 MTBC complex genomes using Prodigal gene prediction software (files are named with genome accessions): http://genome.igib.res.in/Mtb_Pangenome/Pred_genes/