Bioinformatics Core
Department of Molecular Biology
Simches Research Center
Boston, MA 02114

Director : Ruslan Sadreyev

Fax: (617) 726-6893


Bioinformatics Group Downloads

Software and data produced by the Molecular Biology Bioinformatics Group are available for free and with no restrictions to academic and non-profit users. Commercial users may be asked to license these tools under some circumstances. Please use citations provided on the downloads page when publishing results generated with these tools.


MolBioLib is a compact efficient C++11 programming framework with accompanying set of tools enabling rapid development and deployment of next-generation sequencer data analysis programs. The library includes objects to aid in storage of relational data, traversal of delimited files, and manipulation of sequence files. Methods range from alignment writers to statistical functions and string manipulations. Tools to manipulate TSV files, count aligned hits to features such as genes, coverage, and ChIP-Seq are included.


A method to align short reads and identify reads having partial alignments to two distant locations on the reference sequence. This method was developed to detect reads that span genomic rearrangement breakpoints and is described in doi: 10.1016/j.ajhg.2011.03.013. These tools were compiled for the Linux system and supports the .qltout alignment format with support for other alignment formats in development.

BSA is a simple pipeline tool for making sense of data from bulk segregation analysis, a strategy described in the following papers:

Schneeberger K, Ossowski S, Lanz C, Juul T, Petersen AH, Nielsen KL, Jørgensen JE, Weigel D, Andersen SU. SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nat Methods. 2009 Aug;6(8):550-1. doi: 10.1038/nmeth0809-550. PubMed PMID: 19644454.

Cuperus JT, Montgomery TA, Fahlgren N, Burke RT, Townsend T, Sullivan CM, Carrington JC. Identification of MIR390a precursor processing-defective mutants in Arabidopsis by direct genome sequencing. Proc Natl Acad Sci U S A. 2010 Jan 5;107(1):466-71. doi: 10.1073/pnas.0913203107. Epub 2009 Dec 14. PubMed PMID: 20018656; PubMed Central PMCID: PMC2806713. applies the same logic as previous implementations but is intended to simplify integration with other short read aligners and variant callers.

This tool was written and tested using data from Arabidopsis thaliana, but in principle should work with data from other organisms.

For more on our bulk segregation analysis approach, including, see:

Zhang XC, Millet Y, Ausubel FM, Borowsky M. (2014) Next-Gen Sequencing-Based Mapping and Identification of Ethyl Methanesulfonate-Induced Mutations in Arabidopsis thaliana. Curr Protoc Mol Biol. 108, pp. 7.18.1-7.18.16. PMID: 25271717