Software: DSK

DSK is a k-mer counting software, similar to Jellyfish. DSK can be executed on any machine (only 1 GB memory required for a mammalian dataset) with reasonably low temporary disk usage, and supports any value of k. DSK can count k-mers of large Illumina datasets on laptops and desktop computers.

Download

DSK 2.1.0 Linux 64 bits binaries (Source code)

For the Cosmo assembler, it is required to use DSK version 1.6906 (old codebase).

Latest release notes (18 February 2016): 2.0.7->2.1.0 is just a bugfix release

Release notes (5 February 2015): Transitioned to a new codebase (using the GATB library), version 2.0.1.

Release notes (July 8 2014): Fixed critical bug affecting all prior DSK versions when k was set to a multiple of 32, say X, and DSK was compiled with "make k=X". I.e. k=32 with "make k=32" is affected, but k=32 with "make k=64" is unaffected. K-mers were incorrectly counted in all affected cases. Versions: < 1.6706.

Release notes (December 24 2013): Fixed critical bug when all the k-mers in several consecutive reads contained N's. DSK was ignoring the rest of the file. Versions affected: {1.5280, ..., 1.5925}.

Release notes (May 31 2013): k-mers with N are now ignored. Fixed critical bug of corrupt solid_kmers_binary file when DSK was compiled with omp=1.

Release notes (March 26 2013): Between versions < 1.5030 and versions >= 1.5030, the file format of .solid_kmers_binary files has changed. Do not use an old version of parse_results.py on new data and vice-versa.

Support

Please use Biostars (dynamic FAQ system, click "New Post", top right corner) to report bugs or ask any question. (An archive of posts between 2012 and 2014 can be found here.)

PDF and Citation

Rizk, G., Lavenier, D. and Chikhi, R. DSK: k-mer counting with very low memory usage, Bioinformatics, 2013.

[journal PDF] [pre-print PDF]

@article{dsk,
    author = {Rizk, Guillaume and Lavenier, Dominique and Chikhi, Rayan}, 
    title = {DSK: k-mer counting with very low memory usage},
    year = {2013}, 
    doi = {10.1093/bioinformatics/btt020}, 
    journal = {Bioinformatics} 
}

	

Steps to reproduce the human genome k-mer counting from the article

Download these raw human genome reads: SRX016231, convert them to FASTA and concatenate them into a single file (HG_reads_all_untrimed.fa).

Launch DSK as follows:

./dsk -file HG_reads_all_untrimed.fa -kmer-size 27 -abundance-min 3