My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
FastqJoin  
fastq-join : merge overlapping paired-end reads
Updated Jan 17, 2012 by earone...@gmail.com

Usage

Usage: fastq-join [options] <read1.fq> <read2.fq> [mate.fq] -o <read.%.fq>

Joins two paired-end reads on the overlapping ends.

Options:

-o FIL          See 'Output' below
-v C            Verifies that the 2 files probe id's match up to char C
                  use '/' for Illumina reads
-p N            N-percent maximum difference (.20)
-m N            N-minimum overlap (6)
-r FIL          Verbose stitch length report

Output:

  You can supply 3 -o arguments, for un1, un2, join files, or one
argument as a file name template.  The suffix 'un1, un2, or join' is
appended to the file, or they replace a %-character if present.

  If a 'mate' input file is present (barcode read), then the files
'un3' and 'join2' are also created.

  Files named ".gz" are assumed to be compressed, and can be 
read/written as long as "gzip" is in the path.

Etc

This uses our sqr(distance)/len for anchored alignment quality algorithm. It's a good measure of anchored alignment quality, akin (in my mind) to squared-deviation for means.

Comment by martijnt...@gmail.com, Feb 7, 2012

Thank you for writing this program, although this field is in development the competition is always present here a few things to think about ( English might be poor, my apologies ): Existing software:

Stitch https://github.com/audy/stitch fastq-join FLASH http://www.cbcb.umd.edu/software/flash/ mergePairs.py http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py

python based(relatively slow):

mergePairs.py Stitch

C based(fast): FLASH fastq-join

although there is no real comparison of the diffent programs and how they handle adapter sequencing it might be intresting to compare the different programs. Using the FLASH simulated reads on this tool and adding simulated adapter sequences.

-improvements:

possibility to get the joined reads to create a partially pipeable software the use of a single file containing both the forward and the reverse reads for on stream purposes

use of identical headers in the join file: > sample1_s_8_join.fq <

@HWI-ST163_0392:8:1:3251:2152#CGATGT +HWI-ST163_0392:8:1:3251:2152#CGATGT/1 @HWI-ST163_0392:8:1:3920:2157#CGATGT +HWI-ST163_0392:8:1:3920:2157#CGATGT/1 @HWI-ST163_0392:8:1:3827:2186#CGATGT +HWI-ST163_0392:8:1:3827:2186#CGATGT/1

improvement to use-message (FIL => FILE, C => Char(icter), -p N => -p float, -m N => -m int(eger)), and :

Usage: fastq-join options? <read1.fq> <read2.fq> [mate.fq] -o <read.%.fq> ############################maybe remove the .fq everywhere and add a version switch
Joins two paired-end reads on the overlapping ends. ############################accepts the fastq and the gzipped fastq (.gz) format as long as "gzip" is in the path.
Options:
-o FILE See 'Output' below ############################prefereably no derefencing -v Char Verifies that the 2 files probe id's match up to char C
use '/' for Illumina reads
-p N N-percent maximum difference (.20) -m N N-minimum overlap (6) -r FIL Verbose stitch length report ######################use /tmp/ and cleanup afterwards or make a viewer for this file....
Output:
You can supply 3 -o arguments, for un1, un2, join files, or one

argument as a file name template. The suffix 'un1, un2, or join' is appended to the file, or they replace a %-character if present.

If a 'mate' input file is present (barcode read), then the files

'un3' and 'join2' are also created.

Files named ".gz" are assumed to be compressed, and can be # no indentation and this not present in my version

read/written as long as "gzip" is in the path.


Sign in to add a comment
Powered by Google Project Hosting