Introduction
The most up-to-date version of the demultplexer is written in Python, although an older more specialized C++ version is also included in the multiplexer folder. The info beneath refers to the Python version.
Input
The input to the demultiplexer consists of two files:
- The file with the true barcodes ("IDs").
- The file with the read sequences, either as a FASTQ file or as a SAM file.
The true barcodes file should be tab-delimited, with the true barcodes appearing in column 1, and arbitrary information appearing in columns 2,...:
AACCGGTT 11 44 Some attribute
CCAATTGG 33 22 Another attribute
TTCCGGAA 22 99 Yet another attribute
...
Output
The output from the demultiplexer consists of (at most) 4 files:
- A tab delimited file containing the annotation of each read, and the demultiplexing result, output.result.txt.
- A FASTQ/SAM file (same as the input format) containing the perfectly or unambiguously mapped reads, augmented with information for the barcode and attributes, output.mapped.fastq/sam. This information appears in the description field in FASTQ (on the same line as the annotation), and in their own fields in SAM format.
- A FASTQ/SAM file, as above, but for the ambiguously mapped reads, output.ambiguous.fastq/sam. Each read will appear multiple times, but with different barcodes and attributes added. The order of the ambiguous hits will be randomized.
- A FASTQ/SAM file, as above, but with unmapped reads, output.unmapped.fastq/sam.