Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Minia produces results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet).
A typical Minia command line looks like:
./minia reads.fa 31 3 4500000 output_prefix
Type
./miniafor a quick explanation of the parameters.
There is also a short manual in
manual/manual.pdf
Please use our BioStars site (dynamic FAQ system) to report bugs or ask any question.
The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (> 30 GB).
We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 Gb of memory in 23 hours.

This is a Figure from the article showing the space usage of a small de Bruijn graph using our encoding.

(a) shows S, an example de Bruijn graph (the 7 black nodes), and B, its probabilistic representation from a Bloom filter (taking the union of black, red and green nodes). Red nodes are immediate neighbors of S in B. The red nodes are the critical false positives. Green nodes are all the other nodes of B; (b) shows a sample of the hash values associates to the nodes of S (a toy hash function is used); (c) shows the complete Bloom filter associated to S; incidentally, the nodes of B are exactly those to which the Bloom filter answers positively; (d) describes the lower bound for exactly encoding the nodes of S (self-information) and the space required to encode our structure (Bloom filter, 10 bits, and 3 critical false positives, 6 bits per 3-mer).
R. Chikhi, G. Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter, WABI 2012
@inproceedings{minia,
author = {Chikhi, Rayan and Rizk, Guillaume},
title = {Space-Efficient and Exact de Bruijn Graph Representation Based on a Bloom Filter.},
booktitle = {WABI},
pages = {236-248},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
volume = 7534,
year = 2012
}
Download these raw human genome reads: SRX016231, convert them to FASTA and concatenate them into a single file (HG_reads_all_untrimed.fa).
Launch Minia as follows:
./minia HG_reads_all_untrimed.fa 31 3 2700000000 results_prefix