|
Project Information
Featured
|
cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs. InstallationIf setuptools is installed, then cutadapt can be installed using this command: easy_install cutadapt DocumentationSee the README file and the documentation for further installation instructions and the FAQ. NewsNovember 4, 2011: cutadapt v1.0 has been released. This special version number does not mark fundamental changes, it merely indicates that cutadapt is now a mature tool. Thanks also to all external contributors for their work on improving the tool! The changes in this release are: - ASCII-encoded quality values were assumed to be encoded as ascii(quality+33). With the new parameter --quality-base, FASTQ files with qualities encoded as ascii(quality+64) as used in some versions of the Illumina pipeline can be read (fixes issue 7 .)
- Allow to specify that adapters were ligated to the 5' end of reads. This change is based on a patch contributed by James Casbon.
- Add Galaxy support, contributed by Lance Parsons.
- Patch by James Casbon: Allow N wildcards in read or adapter or both. Wildcard matching of 'N's in the adapter is always done. If 'N's within reads should also match without counting as error, this needs to be explicitly requested via --match-read-wildcards.
- Not part of this release, but please have a look at the documentation page, which was updated.
August 2, 2011: An application note about cutadapt has been published in EMBnet.journal. Please be aware that cutadapt has gained some more features since that text was written. If you use cutadapt, I would be glad if you cite it. Please see the changelog for older release announcements. Features- Trims reads from current high-throughput sequencing machines (esp. Illumina, 454 and SOLiD).
- Gapped alignment with mismatches and indels, that is, errors in the adapter are tolerated
- Finds adapters both in the 5' and 3' ends of reads
- Accepts FASTQ, FASTA or .csfasta and .qual files (for AB SOLiD data)
- Any input or output file can be gzip-compressed (this is automatically detected)
- Outputs FASTA or FASTQ
- Trims color space reads correctly
- Optionally removes primer base in color space data
- Can produce MAQ- or BWA-compatible output
|