public class ParseBarcodeRead
Takes a key file and then sets up the methods to decode a read from the sequencer. The key file decribes how barcodes are related to their taxon. Generally, a keyfile with all flowcells is included, and then the flowcell and lane to be processed are indicated in the constructor.
protected int maximumMismatchInBarcodeAndOverhang
protected static java.lang.String[] initialCutSiteRemnant
protected static int readEndCutSiteRemnantLength
protected static java.lang.String[] likelyReadEnd
protected static java.lang.String theEnzyme
public ParseBarcodeRead(java.lang.String keyFile,
java.lang.String enzyme,
java.lang.String flowcell,
java.lang.String lane)
Create the barcode parsing object
keyFile - file location for the keyfileenzyme - name of the enzymeflowcell - name of the flowcell to be processedlane - name of the lane to be processedpublic static void chooseEnzyme(java.lang.String enzyme)
Determines which cut sites to look for, and sets them, based on the enzyme used to generate the GBS library. For two-enzyme GBS both enzymes MUST be specified and separated by a dash "-". e.g. PstI-MspI, SbfI-MspI
enzyme - The name of the enzyme (case insensitive)public static ReadBarcodeResult removeSeqAfterSecondCutSite(java.lang.String seq, byte maxLength)
The barcode libraries used for this study can include two types of extraneous sequence at the end of reads. The first are chimeras created with the free ends. These will recreate the restriction site. The second are short regions (less than 64bp), so that will they will contain a portion of site and the universal adapter. This finds the first of site in likelyReadEnd, keeps the restriction site overhang and then sets everything to polyA afterwards
seq - An unprocessed tag sequence.maxLength - The maximum number of bp in the processed sequence.public ReadBarcodeResult parseReadIntoTagAndTaxa(java.lang.String seqS, java.lang.String qualS, boolean fastq, int minQual)
Return a that captures the processed read and taxa inferred by the barcodeclass ReadBarcodeResult
seqS - DNA sequence from the sequencerqualS - quality score string from the sequencerfastq - (fastq = true?; qseq=false?)minQual - minimum quality scoreclass ReadBarcodeResultpublic int getBarCodeCount()
Returns the number of barcodes for the flowcell and lane
public Barcode getTheBarcodes(int index)
Returns the for the flowcell and laneclass Barcode
class Barcodepublic java.lang.String[] getTaxaNames()
Returns the taxaNames for the flowcell and lane