public class TagsByTaxaUtils
This class contains methods which process TagsByTaxa files one line at a time, as opposed to the methods in , which hold an entire file in memory.class AbstractTagsByTaxa
class AbstractTagsByTaxapublic static void main(java.lang.String[] args)
public static java.lang.String bitsetToString(OpenBitSet bitset)
Prints bitsets in human-readable format for debugging purposes.
bitset - An OpenBitSet object.public static java.lang.String[] bitsetToStringArray(OpenBitSet bitset)
Stores bitsets in human-readable format.
bitset - An OpenBitSet object.public static void streamTextToBinary(java.lang.String inputFileName,
java.lang.String outputFileName)
Converts a text .tbt file to binary, using only the memory required for the file buffer.
public static void slice(java.lang.String inputFileName,
java.lang.String[] sequences)
public static void filterByTOPM(java.lang.String inputFileName,
TagsOnPhysicalMap topm)
Outputs taxon bitDistribution of all tags in the supplied TOPM file.
inputFileName - topm - public static void replaceInNames(java.lang.String regex,
java.lang.String replacement,
java.lang.String inputFileName,
net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format,
java.lang.String outputFileName)
Replaces the supplied regex with the supplied string in taxon names.
regex - replacement - inputFileName - format - Format of the input .tbt file (a TagsByTaxa.FilePacking enumerated value)outputFileName - public static void positions(java.lang.String inputFileName,
net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format,
TagsOnPhysicalMap topm,
java.lang.String[] taxa)
Prints the physical coordinates of the supplied taxa, as stored in the supplied TOPM file.
inputFileName - A reference to a TagsByTaxa file.format - A TagsByTaxa.FilePacking enumerated value.topm - A TagsOnPhysicalMap object containing coordinate info for the supplied taxa.taxa - A list of taxon names.public static java.util.HashMap<java.lang.String,java.lang.Integer> sumCounts(java.lang.String inputFileName,
net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format,
boolean progressIndication)
Returns a list of taxa in the specified TagsByTaxa file, along with the total count of all tags found in that taxon.
inputFileName - format - A TagsByTaxa.FilePacking enumerated value.progressIndication - Whether or not to provide feeback on number of tags read.public static void printTotalTagsAndTaxa(java.lang.String directoryName,
net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format)
Prints a count of the total number of tags and taxa in the files contained in the specified directory.
public static void printSumCounts(java.lang.String inputFileName,
net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format,
boolean progressIndication)
Prints a list of taxa in the specified TagsByTaxa file, along with the total count of all tags found in that taxon.
inputFileName - format - A TagsByTaxa.FilePacking enumerated value.progressIndication - Whether or not to provide feedback on number of tags read.public static void printSumCountsOfAll(java.lang.String directoryName,
net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format)
Calls printSumCounts once for every file in the specified directory.
public static void sparsity(java.lang.String inputFileName,
net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format)
Write out only the taxa that match a list of names
public static void printCoverage(java.lang.String inputFileName,
net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format,
boolean itemized)
Prints tag coverage of each taxon in the file, and taxon coverage of each tag in the file. If itemized=true, prints one record for each tag and for each taxon. Otherwise prints a summary of how many tags/taxa are in each coverage "bin" (0-10% coverage, 10-20% coverage, etc.)
public static net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format(java.lang.String filename)
public static void mergeTaxaByName(java.lang.String inputFileName,
java.lang.String outputFileName,
net.maizegenetics.dna.tag.TagsByTaxa.FilePacking format,
boolean caseSensitive)
Merge taxa with identical names (merge their (binary) tag counts into a single column). The input must be a TagsByTaxaBit file in either binary or text format. The output is written in binary TagsByTaxaBit format. Other TagsByTaxa formats (Byte, Short, etc) are not currently supported.
public static void streamBinaryToText(java.lang.String inputFileName,
int maxRecords)
Converts a binary .tbt file to text, using only the memory required for the file buffer.