We present Carnelian, a pipeline for alignment-free functional binning and abundance estimation, that leverages low-density even-coverage locality sensititve hashing to represent metagenomic reads in a low-dimensional manifold. When coupled with one-against-all classifiers, our tool bins whole metagenomic sequencing reads by molecular function encoded in their gene content at significantly higher accuracy than existing methods, especially for novel proteins.
Metagenomic binning using low-density hashing a support vector machine
MICA (Metagenomic Inquiry Compressive Acceleration) is a family of programs for performing compressively-accelerated metagenomic sequence searches based on BLASTX and DIAMOND. MICA also includes compressively accelerated versions of the BLASTP family of tools (including PSI-BLAST and DELTA-BLAST), as well as a compression tool (mica-compress) for creating searchable, compressed databases based on an input FASTA file.