is the causative organism of tuberculosis, a disease with high morbidity and mortality, especially in the developing world. The genetic variability between clinical isolates of this pathogen has been poorly understood. Recent years has seen the re-sequencing of a large number of clinical isolates for Mycobacterium tuberculosis from around the world. The availability of genomic data of multiple isolates in public domain offers a unique opportunity towards understanding the variome of the organism and the functional consequences of the variations. This necessitates systematic curation and analysis of datasets available in public domain.
In this report, we have re-analyzed data sets corresponding to over 400 isolates of Mtb available in public domain to reveal a comprehensive variome of Mtb comprising of over 29,000 single nucleotide variations, which has been deposited into a database (tbVar). Using a systematic computational pipeline, we have annotated potential functional variants and drug-resistance associated variants. Apart from a user-friendly interface, the database has a novel option to annotate variants from clinical re-sequencing of Mtb. To the best of our knowledge tbvar is the largest and most comprehensive genome variation resources for Mycobacterium tuberculosis