public class GenotypeTableBuilder
Builder for GenotypeTables. New genotypeTables are built from a minimum of TaxaList, PositionList, and GenotypeCallTable. Depth and Scores are optional features of GenotypeTables.
If you know the taxa,position, and genotypes are known from the beginning use: GenotypeTable a=GenotypeTableBuilder.getInstance(genotype, positionList, taxaList); In many situations only GenotypeTables are built incrementally, either by Taxa or Site.
For taxa building:
GenotypeTableBuilder gtb=GenotypeTableBuilder.getTaxaIncremental(gbs.positions(),outFile);
for (int i=0; i In many cases, genotype want to add taxa to an existing genotypeTable. Direct addition is not possible, as GenotypeTables are immutable, but the GenotypeTableBuilder.getTaxaIncremental provides a strategy for creating and merging taxa together. Key to the process is that GenotypeMergeRule defines how the taxa with identical names will be merged.
GenotypeTable existingGenotypeTable1, existingGenotypeTable2;
GenotypeTableBuilder gtb=GenotypeTableBuilder.getTaxaIncremental(existingGenotypeTable1,
new BasicGenotypeMergeRule(0.01));
for (int i=0; i
public static GenotypeTableBuilder getBuilder(java.lang.String existingHDF5File)
Returns a builder to an existing, unfinished HDF5 genotypes file. Can be used if you want to add/modify annotations, etc, and/or call build() to finalize it
public static GenotypeTableBuilder getTaxaIncremental(PositionList positionList)
Creates an in memory builder for addition by taxon. Each taxon can only be added once, i.e. merging is not possible
positionList - The positions used for the builderpublic static GenotypeTableBuilder getTaxaIncremental(PositionList positionList, GenotypeMergeRule mergeRule)
Creates an in memory builder for addition by taxon, which permits the merging of taxa.
positionList - The positions used for the buildermergeRule - rules for merging identically named taxapublic static GenotypeTableBuilder getTaxaIncremental(GenotypeTable genotypeTable, GenotypeMergeRule mergeRule)
Creates a builder initialized with the Genotypes in a existing GenotypeTable. The position list and initial taxa list is derived from the positions, taxa, and genotypes already in the GenotypeTable. The initial GenotypeTable is not changed as it is immutable.
genotypeTable - input genotype tablemergeRule - rules for merging identically named taxapublic static GenotypeTableBuilder getTaxaIncremental(PositionList positionList, java.lang.String newHDF5File)
Create a new taxa incremental HDF5 GenotypeTableBuilder
positionList - the defined list of positionsnewHDF5File - hdf5 file to be createdpublic static GenotypeTableBuilder mergeTaxaIncremental(java.lang.String existingHDF5File, GenotypeMergeRule mergeRule)
Merges taxa to an existing HDF5 file. The position list is derived from the positions already in the existing HDF5 file.
existingHDF5File - mergeRule - public static GenotypeTableBuilder getTaxaIncrementalWithMerging(java.lang.String newHDF5File, PositionList positionList, GenotypeMergeRule mergeRule)
Creates a new taxa incremental HDF5 GenotypeTableBuilder to which replicate taxa can be added
newHDF5File - positionList - mergeRule - public static GenotypeTableBuilder getSiteIncremental(TaxaList taxaList)
Build an alignment site by site in memory
taxaList - public static GenotypeTableBuilder getSiteIncremental(TaxaList taxaList, int numberOfPositions, java.lang.String newHDF5File)
Build an GenotypeTable by site block (1<<16 sites). Number of positions (sites) must be known from the beginning. Positions and genotypes must be added by block
taxaList - numberOfPositions - newHDF5File - public static GenotypeTable getInstance(GenotypeTable original, GenotypeCallTable newGenotypes)
public static GenotypeTable getInstance(GenotypeCallTable genotype, PositionList positionList, TaxaList taxaList, AlleleDepth alleleDepth, AlleleProbability alleleProbability, ReferenceProbability referenceProbability, Dosage dosage, GeneralAnnotationStorage annotations)
Standard approach for creating a new Alignment
genotype - positionList - taxaList - alleleDepth - alleleProbability - referenceProbability - dosage - annotations - public static GenotypeTable getInstance(GenotypeCallTable genotype, PositionList positionList, TaxaList taxaList, AlleleDepth alleleDepth)
public static GenotypeTable getInstance(GenotypeCallTable genotype, PositionList positionList, TaxaList taxaList)
public static GenotypeTable getInstance(GenotypeCallTable genotype, PositionList positionList, TaxaList taxaList, java.lang.String hdf5File)
Creates a new HDF5 file alignment based on existing Genotype, PositionList, and TaxaList.
genotype - positionList - taxaList - hdf5File - name of the filepublic static GenotypeTable getInstance(GenotypeTable a, java.lang.String hdf5File)
Creates a new HDF5 file alignment based on an existing alignment.
a - existing alignmenthdf5File - name of the filepublic static GenotypeTable getInstance(java.lang.String hdf5File)
public static GenotypeTable getInstance(GenotypeTable base, MaskMatrix mask)
public static GenotypeTable getInstanceOnlyMajorMinor(GenotypeTable genotype)
public static GenotypeTable getHomozygousInstance(GenotypeTable genotype)
public static GenotypeTable getInstanceMaskIndels(GenotypeTable genotype)
public static GenotypeTable getGenotypeCopyInstance(GenotypeTable genotypeTable)
Creates a GenotypeTable with in-memory instance of GenotypeCallTable. Primarily needed for performance critical situations like imputation.
genotypeTable - genotype tablepublic GenotypeTableBuilder addAlleleProbability(AlleleProbabilityBuilder alleleProbabilityBuilder)
public GenotypeTableBuilder addReferenceProbability(ReferenceProbabilityBuilder referenceProbabilityBuilder)
public GenotypeTableBuilder addDosage(DosageBuilder dosageBuilder)
public GenotypeTableBuilder addSite(Position pos, kotlin.Array[] genos)
public void addSiteBlock(int startSite,
PositionList blkPositionList,
kotlin.Array[] blockGenotypes,
kotlin.Array[] blockDepths)
Add TasselHDF5 Block of positions (generally 1<<16 positions). @note This is synchronized, which certainly slows things down but it is needed to prevent the same taxa dataset from being accessed at once. This can probably be rethought with parallelization at this stage across datasets
startSite - start site for positioning blocks correctionblkPositionList - blockGenotypes - array of genotypes[taxonIndex][siteIndex] true site=startSite+siteIndexblockDepths - public GenotypeTableBuilder addTaxon(Taxon taxon, kotlin.Array[] genos)
public GenotypeTableBuilder addTaxon(Taxon taxon, kotlin.Array[] genos, kotlin.Array[] depth)
public GenotypeTableBuilder addTaxon(Taxon taxon, kotlin.Array[] depths, kotlin.Array[] genos)
public boolean isHDF5()
public GenotypeTableBuilder sortTaxa()
Set the builder so that when built it will sort the taxa
public GenotypeTable build()
Finishes building the GenotypeTable. For HDF5 files it locks the taxa and genotype modules so that cannot be modified again.
public void closeUnfinished()
Used to close an HDF5 GenotypeTableBuilder, when it will be reopened later and appended. This file cannot be used for other purposes in this unfinished state.
public static void annotateHDF5File(ch.systemsx.cisd.hdf5.IHDF5Writer writer)
Annotates the HDF5 Genotype file with allele frequency information. Can only be called on unlocked HDF5 files. Currently, placed in the GenotypeTableBuilder as it still above genotypes, taxa, and sites.
writer - public static void annotateHDF5FileWithRefAllele(ch.systemsx.cisd.hdf5.IHDF5Writer writer,
kotlin.Array[] refAlleles)
Annotates the HDF5 Genotype file with the reference allele and reference genome version. Can only be called on unlocked HDF5 files.
writer - refAlleles - public GenotypeTableBuilder addAnnotation(java.lang.String key, java.lang.String value)
public GenotypeTableBuilder addAnnotation(java.lang.String key, java.lang.Number value)
public GenotypeTableBuilder dataSetName(java.lang.String dataSetName)
public GenotypeTableBuilder dataSetDescription(java.lang.String dataSetDescription)