Usage
Usage: fastq-join [options] <read1.fq> <read2.fq> [mate.fq] -o <read.%.fq>
Joins two paired-end reads on the overlapping ends.
Options:
-o FIL See 'Output' below
-v C Verifies that the 2 files probe id's match up to char C
use '/' for Illumina reads
-p N N-percent maximum difference (.20)
-m N N-minimum overlap (6)
-r FIL Verbose stitch length report
Output:
You can supply 3 -o arguments, for un1, un2, join files, or one
argument as a file name template. The suffix 'un1, un2, or join' is
appended to the file, or they replace a %-character if present.
If a 'mate' input file is present (barcode read), then the files
'un3' and 'join2' are also created.
Files named ".gz" are assumed to be compressed, and can be
read/written as long as "gzip" is in the path.Etc
This uses our sqr(distance)/len for anchored alignment quality algorithm. It's a good measure of anchored alignment quality, akin (in my mind) to squared-deviation for means.
Thank you for writing this program, although this field is in development the competition is always present here a few things to think about ( English might be poor, my apologies ): Existing software:
Stitch https://github.com/audy/stitch fastq-join FLASH http://www.cbcb.umd.edu/software/flash/ mergePairs.py http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py
python based(relatively slow):
mergePairs.py Stitch
C based(fast): FLASH fastq-join
although there is no real comparison of the diffent programs and how they handle adapter sequencing it might be intresting to compare the different programs. Using the FLASH simulated reads on this tool and adding simulated adapter sequences.
-improvements:
possibility to get the joined reads to create a partially pipeable software the use of a single file containing both the forward and the reverse reads for on stream purposes
use of identical headers in the join file: > sample1_s_8_join.fq <
@HWI-ST163_0392:8:1:3251:2152#CGATGT +HWI-ST163_0392:8:1:3251:2152#CGATGT/1 @HWI-ST163_0392:8:1:3920:2157#CGATGT +HWI-ST163_0392:8:1:3920:2157#CGATGT/1 @HWI-ST163_0392:8:1:3827:2186#CGATGT +HWI-ST163_0392:8:1:3827:2186#CGATGT/1
improvement to use-message (FIL => FILE, C => Char(icter), -p N => -p float, -m N => -m int(eger)), and :
argument as a file name template. The suffix 'un1, un2, or join' is appended to the file, or they replace a %-character if present.
'un3' and 'join2' are also created.
read/written as long as "gzip" is in the path.