| Class | Description |
|---|---|
| AlignmentInfo |
This class is used by RepGenAlignerPlugin to store alignment info to db table tagAlignments. It is also used when pulling alignments from the DB. The tag2chrom and tag2pos fields are used to determine if the tag2 of this alignment class is a reference tag. If tag2chrom is null and tag2pos = -1, the tag alignment info is a non-ref tag. If these fields are populated with good values, the tag2 alignment info is for a reference tag. The "alignmentPos" field indicates the position within tag2 where the tag1 alignment starts. This position is adjusted for any clipping of tag1 that occurred during alignment.
|
| RGBSProductionSNPCallerPlugin |
This plugin converts all of the fastq (and/or qseq) files in the input folder and keyfile to genotypes and adds these to a genotype file in HDF5 format. We refer to this step as the "Production Pipeline". The output format is either HDF5 or VCF genotypes with allelic depth stored. Output file type is determined by presence of the ".h5" suffix. SNP calling is quantitative with the option of using either the Glaubitz/Buckler binomial method (pHet/pErr > 1 = het) (=default), or the Stacks method. Merging of samples with the same LibraryPrepID is handled by GenotypeTableBuilder.addTaxon(), with the genotypes re-called based upon the new depths. Therefore, if you want to keep adding genotypes to the same target HDF5 file in subsequent runs, use the -ko (keep open) option so that the output GenotypeTableBuilder will be mutable, using closeUnfinished() rather than build(). If the target output is HDF5, and that GenotypeTable file doesn't exist, it will be created. Each taxon in the output file is named "ShortName:LibraryPrepID" and is annotated with "Flowcell_Lanes" (=source seq data for current genotype). Requires a database with variants added from a previous "Discovery Pipeline" run. References to "tag" are being replaced by references to "kmer" as the pipeline is really a kmer alignment process. TODO add the Stacks likelihood method to BasicGenotypeMergeRule
|
| RampSeqAlignFromBlastTags |
This class takes a rAmpSeq database populated with tags,a reference Genome, a filtered BLAST file output and primers, then: (1) creates reference tags using the blast alignment output (2) runs tag/tag alignment for each tag in the DB, (3) runs tag/refTag alignment for each tag against each ref tag (4) runs refTag/refTag alignment for each refTag in the DB> (5) all alignments are stored in the db tagAlignments table. Alignments are performed and stored in groups to prevent overwhelming the DB with massive load commands. Blast was run on CBSU using these parameters: Make the reference files: makeblastdb -dbtype nucl -in -parse_seqids -out maizeAGPV4.db Run blast using maizeAGPV4.db from command above: blastn -num_threads 24 -db maizeAGPV4.db -query anp68R1Tags.fasta -evalue 1e-60 -max_target_seqs 5 -max_hsps 1 -outfmt 6 -out anp68TagsR1Result/blastANP68_R1.txt The blast output file was filtered using the 3 commands below. The first filters identity down to 98 %, the second gets alignment lengths that were at least 148, the 3rd filters it down to just the chrom, start, end positions: awk '$3 >= 98.000 {print $0}' blastANP68_R1.txt > blastANP68_R1_98per.txt awk '$4 >= 148 {print $0}' blastANP68_R1_98per.txt > blastANP68_R1_98per_148align.txt awk {'printf ("%s\t%s\t%s\n", $2, $9, $10)'} blastANP68_R1_98per_148align.txt > blastANP68_R1_98per_148align_3cols.txt It is the last file from awk, blastANP68_R1_98per_148align_3cols.txt, that is given as a parameter here.
|
| RefTagData |
Class needed for storing reference tags into RepGen SQLite tables
|
| RepGenAlignerPlugin |
This plugin takes an existing repGen db, grabs the tags whose depth meets that specified in the minCount parameter, makes kmer seeds from these tags. Window for kmer seeds is default 20. The ref genome is walked with a sliding window of 1. Reference tags are created based on peaks within clusters where kmer seeds align. The kmerLen field should match the length of the kmers stored as tags during the RepGenLoadSeqToDBPlugin step. The default is 150. The refKmerLen() should minimally be the length of the db kmer tags, but can be longer. Our defaults are 150 for kmer tags, and twice this length (300) for the refKmerLen. There are 2 count parameters: minTagCount specifies the minimum depth of a tag for it to be used when creating seed kmers. minHitCount specifies the number of "hits" that must occur within a sliding window (window size = refKmerLen) for the window to remain part of a cluser. When sliding, if the hit count drops below the minhitCount threshold, the cluster ends. A new cluster does not begin until the hit count is back up to threshold. See createRefTagsForAlignment() and storeRefTagPositions() for specifics. This plugin creates and stores the reference tags in the refTag table in the database. Both the tagMapping and the physicalMapPosition table will we populated with the reference tag information. Once the tables have been populated with the reference information, Smith Waterman is run to align all the nonreference tags in the db against each other; each non-reference tag against the reference tags; finally each refTag against all other refTags. ALignment data is stored in the tagAlignments table. Smith Waterman from SourceForge neobio project is used to determine alignment score. Settings for match rewards, mismatch penalty and gap penalty may be changed by user via plugin parameters.
|
| RepGenLDAnalysisPlugin |
This class takes a rAmpSeq (formerly RepGen) database and for each tag in the tag table, performs the following tag-tag correlations based on the taxa distribution for each tag (1) tag-tag Pearson's correlation (2) tag-tag Spearman's correlation (3) tag-tag presence/absence correlations (4) r-squared The vectors presented to the analysis methods represent a list of taxa and the number of times the tag was seen in that taxa. The presence/absence vectors have a 1 or 0 as values in each slot.
|
| RepGenLoadSeqToDBPlugin |
Develops a discovery rGBS database based on a folder of sequencing files Keeps only good reads having no N's in the useful part of the sequence. Trims off the barcodes and truncates sequences that (1) have a second cut site, or (2) read into the common adapter. Originally the reference throughout was to "tag". This is being changed to "kmer" as the pipeline is a kmer alignment process.
|
| RepGenPhase2AlignerPlugin |
This plugin takes an existing repGen db, grabs the tags whose depth meets that specified in the minCount parameter, makes kmer seeds from these tags. Forward and reverse primer sequences are added as an input parameter. When a kmer seed is found on a reference chromosome, a ref sequence is created from 300bp before the hit, to 300 bp after the hit. This value is half the refKmerLen parameter passed by user. Default refKmerLen is 600. From the ref sequence created, a search is made for the primer pairs within this sequence. IF either both forward primer and the reverse complement of the reverse primer; or reverse primer and the reverse complement of the forward primer are found, a reference tag is created starting at the start of the first occurring primer from the primer pair found in the sequence. If both forward and reverse pairs are found, the ref tag is created based on the best match, defaulting to the forward primer if both are found. Search for additional kmer matches on the chromosome begins at the position on the ref chrom following the end of the second primer in the matched pair. The kmerLen field should match the length of the kmers stored as tags during the RepGenLoadSeqToDBPlugin step. The default is 150. The refKmerLen() should minimally be the length of the db kmer tags, but can be longer. Our defaults are 150 for kmer tags, and twice this length (300) for the refKmerLen. There are 2 count parameters: minTagCount specifies the minimum depth of a tag for it to be used when creating seed kmers. This plugin creates and stores the reference tags in the refTag table in the database. Both the tagMapping and the physicalMapPosition table will we populated with the reference tag information. Once the tables have been populated with the reference information, Smith Waterman is run to align all the nonreference tags in the db against each other; each non-reference tag against the reference tags; finally each refTag against all other refTags. ALignment data is stored in the tagAlignments table. Smith Waterman from SourceForge neobio project is used to determine alignment score. Settings for match rewards, mismatch penalty and gap penalty may be changed by user via plugin parameters.
|
| TagCorrelationInfo |