| Class | Description |
|---|---|
| BarcodeTrie |
This is an implementation of a trie (prefix tree) in Java. Supports opeations like searching a string, searching a prefix, searching by prefix etc.
|
| DiscoverySNPCallerPluginV2 |
This class aligns tags at the same physical location against one another, calls SNPs, and then outputs the SNPs to a HapMap file. It is multi-threaded, as there are substantial speed increases with it.
|
| GBSEnzyme |
Determines which cut sites to look for, and sets them, based on the enzyme used to generate the GBS library. For two-enzyme GBS both enzymes MUST be specified and separated by a dash "-". e.g. PstI-MspI, SbfI-MspI The enzyme pair "PstI-EcoT22I" uses the Elshire common adapter while PstI-MspI, PstI-TaqI, and SbfI-MspI use a Y adapter (Poland et al. 2012)
|
| GBSSeqToTagDBPlugin |
Develops a discovery TBT file from a set of GBS sequence files. Keeps only good reads having a barcode and a cut site and no N's in the useful part of the sequence. Trims off the barcodes and truncates sequences that (1) have a second cut site, or (2) read into the common adapter. Originally the reference throughout was to "tag". This is being changed to "kmer" as the pipeline is a kmer alignment process.
|
| GBSUtils |
This class contains methods and constants used by various classes in the GBSv2 pipeline.
|
| GetTagSequenceFromDBPlugin |
This plugin queries a GBSv2 sql database for existing tags. If the user specifies a tag, and the tag is found, a tab-delimited file is written containing that tag sequence. If the tag is NOT found, a tab-delimited file is written that shows the tag and "not found". If no tag sequence is specified, the method will print out all tag sequences stored in the specified db. If input db not found, plugin throws an exception. If output directory doesn't exist, plugin throws an exception
|
| GetTagTaxaDistFromDBPlugin |
This plugin takes a GBSv2 database as input, queries for the tags and their taxa distribution, and creates a tab-delimited file of tag and taxa-distribution. IT can be used for verifying data in the db. The user may supply a file with tab-delimited columns specifying on which tags to present data. The input "tag" file must have at least 1 column named "TAGS" (any case - Tags/TAGS/tags). Other data in the file will be ignored. Output: A tab-delimited file where the first column is the tag, and subsequent columns indicate taxa found in the db. The values in the "taxa" columns are the number of occurranes of that tag for that taxa.
|
| ProductionSNPCallerPluginV2 |
This plugin converts all of the fastq (and/or qseq) files in the input folder and keyfile to genotypes and adds these to a genotype file in HDF5 format. We refer to this step as the "Production Pipeline". The output format is either HDF5 or VCF genotypes with allelic depth stored. Output file type is determined by presence of the ".h5" suffix. SNP calling is quantitative with the option of using either the Glaubitz/Buckler binomial method (pHet/pErr > 1 = het) (=default), or the Stacks method. Merging of samples with the same LibraryPrepID is handled by GenotypeTableBuilder.addTaxon(), with the genotypes re-called based upon the new depths. Therefore, if you want to keep adding genotypes to the same target HDF5 file in subsequent runs, use the -ko (keep open) option so that the output GenotypeTableBuilder will be mutable, using closeUnfinished() rather than build(). If the target output is HDF5, and that GenotypeTable file doesn't exist, it will be created. Each taxon in the output file is named "ShortName:LibraryPrepID" and is annotated with "Flowcell_Lanes" (=source seq data for current genotype). Requires a database with variants added from a previous "Discovery Pipeline" run. References to "tag" are being replaced by references to "kmer" as the pipeline is really a kmer alignment process. TODO add the Stacks likelihood method to BasicGenotypeMergeRule
|
| SAMToGBSdbPlugin |
Reads SAM file formats to determine the potential positions of Tags against the reference genome.
|
| SNPCutPosTagVerificationPlugin |
This class allows a user to specify a Cut or SNP position for which they would like data printed. For a Cut Position, the tags associated with that position are printed along with the number of times it appears in each taxa. For a SNP Position, each allele and the tags associated with that allele are printed along with the number of times the tag appears in each taxa. The tag is shown both as it is stored in the db, and as a forward strand. The SNP alignments are based on forward strand.
|
| SNPQualityProfilerPlugin |
Scores all discovered SNPs for various coverage, depth, and genotypic statistics for a given set of taxa (samples). For each subset of taxa, there are expectations for segregation that can be used to determine whether the SNP is behaving appropriately.
|
| TagExportToFastqPlugin |
Converts a TagCounts binary (*.cnt) file (presumably a master tag list) to a fastq file that can be used as input for BWA or bowtie2 (and possibly additional aligners). The same function can be performed with MergeMultipleTagCountPlugin using the -t option and a single Master Tag List file in the input directory, but having a separate plugin to do this reduces confusion and eliminates the risk of merging the master tag list back on itself.
|
| UpdateSNPPositionQualityPlugin |
This plugin takes as input: tab-delimited txt file with columns CHROM, POS, QUALITYSCORE dbFile: a GBSv2 database with snppositions recorded A PositionList of positions with quality scores is sent to the database where the snpposition table is updated with a qualityScore value for the specified chromosome and position.
|