GitHub: github.com/bjpop
Varlap is primarily a quality control tool for genetic variants arising from high throughput DNA sequencing, where the variants have been called by aligning DNA sequencing reads to a reference genome. It takes as input a set of DNA variants and one or more BAM files. Varlap considers the genomic locus of each variant in each of the supplied BAM files and records information about the corresponding alignment context at that locus. For example, one of the metrics it calculates is the average edit distance of reads overlapping the variant locus. This can be a useful metric because regions with significantly higher average edit distance are more likely to contain erroneous variant calls. Varlap outputs a CSV file containing one row per input variant, with columns recording the various computed metrics about that variant. Subsequent analysis of this output (such as outlier detection) can be used to identify potentially problematic variants and samples.
Common use cases are to consider somatic variants in the context of tumour and normal alignments, or germline variants against normal alignments. However, varlap is quite flexible and allows the use of any number of BAM files as input.
At its core Gurita provides a suite of commands, each of which carries out a common data analytics or plotting task. Additionally, Gurita allows commands to be chained together into flexible analysis pipelines.
It is designed to be fast and convenient, and is particularly suited to data exploration tasks. Input files with large numbers of rows (> millions) are readily supported.
Gurita commands are highly customisable, however sensible defaults are applied. Therefore simple tasks are easy to express and complex tasks are possible.
The purpose of Bionitio is to provide an easy-to-understand working example that is built on best-practice software engineering principles. It can be used as a basis for learning and as a solid foundation for starting new projects. We provide a script called bionitio-boot.sh for starting new projects from bionitio, which saves time and ensures good programming practices are adopted from the beginning
UNDR ROVER is an improved version of our ROVER variant calling tool for targeted DNA sequencing. It enables users to quickly and accurately identify genetic variants from PCR-targeted, overlapping paired-end MPS datasets. It calls the same variants as the ROVER tool but at a significantly reduced runtime. It achieves its higher performance by avoiding read alignment before variant calling, and can be applied directly to input FASTQ files.
HiTIME is a software tool for detecting twin ion signals in high resolution liquid chromatography mass spectrometry (LCMS) data.
This is a collaboration with Andrew Isaac, Michael Leeming, Richard O'Hair and William Alexander Donald.
This takes Illumina sequence data, a MLST (Multi-Locus Sequence Type) database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.
This is a collaboration with Kat Holt, Mike Inouye and others.
Methpat summarises the resultant DNA methylation pattern data from the output of Bismark methylation extractor. Information of the DNA methylation positions for each amplicon, DNA methylation patterns observed within each amplicon and their abundance counts are summarised into a tab delimited text file amenable for further downstream statistical analysis and visualization.
Annokey is a command line tool for annotating gene lists with the results of a key-term search of the NCBI Gene database and linked PubMed article abstracts. Its purpose is to help users prioritise genes by relevance to a domain of interest, such as "breast cancer" or "DNA repair" etcetera. The user steers the search by specifying a ranked list of keywords and terms that are likely to be highly correlated with their domain of interest.
ROVER-PCR Variant Caller enables users to quickly and accurately identify genetic variants from PCR-targeted, overlapping paired-end MPS datasets. The open-source availability of the software and threshold tailorability enables broad access for a range of PCR-MPS users.
Blip compiles Python 3 source files to bytecode. The output bytecode is compatible with the CPython interpreter.
Characterizing genetic diversity through the analysis of massively parallel sequence (MPS) data offers enormous potential in terms of our understanding of predisposition to complex human disease. Great challenges remain, however, regarding our ability to resolve those genetic variants that are genuinely associated with disease from the millions of "bystanders" and artefactual signals. FAVR is designed to assist in the resolution of some of these issues in the context of rare germline variants by facilitating "platform-steered" artefact filtering.
Berp is an implementation of Python 3. At its heart is a translator, which takes Python code as input and generates Haskell code as output. The Haskell code is fed into a Haskell compiler (GHC) for compilation to machine code or interpretation as byte code.
Berp provides both a compiler and an interactive interpreter. For the most part it can be used in the same way as CPython (the main Python implementation).
MPI is defined by the Message-Passing Interface Standard, as specified by the Message Passing Interface Forum. The latest release of the standard is known as MPI-2. These Haskell bindings are designed to work with any standards compliant implementation of MPI-2.
This package provides a parser (and lexer) for Python written in Haskell. It supports version 2 and 3 of Python. The parser is implemented using the happy parser generator, and the alex lexer generator. The package also provides a pretty printer, which makes it also suitable for generating Python code.
Ministg is an interpreter for a high-level, small-step, operational semantics for the STG machine. The STG machine is the abstract machine at the core of GHC. The operational semantics used in Ministg is taken from the paper "Making a fast curry: push/enter versus eval/apply for higher-order languages" by Simon Marlow and Simon Peyton Jones. Ministg implements both sets of evaluation rules from the paper.