public class GenomeFeatureMapBuilder
Created by jgw87 on 7/2/14. A Builder class to create a GenomeFeatureMap to identify genomic features. Can build piecemeal or read in from a file For now, not implementing functionality to make a new builder from an existing map.
public GenomeFeatureMap build()
public static void addFeatureToRangemap(com.google.common.collect.RangeMap<java.lang.Integer,java.util.HashSet> masterMap,
GenomeFeature feature)
Add a GenomeFeature to a RangeMap, stacking it on top of any existing features instead of overwriting them (how RangeMap behaves natively)
masterMap - Rangemap object to be modifiedfeature - GenomeFeature to be added. Only start and stop position are checkedpublic GenomeFeatureMapBuilder addFeature(GenomeFeature feature)
Adds a GenomeFeature to the map. If the feature's unique ID has already been loaded, it throws an UnsupportedOperationException
feature - The GenomeFeature to be addedpublic GenomeFeatureMapBuilder replaceFeature(GenomeFeature feature)
Replaces the GenomeFeature with a specified ID with a new one. If unique ID isn't already in the map, it throws an UnsupportedOperationException.
feature - The GenomeFeature to be addedpublic GenomeFeatureMapBuilder addOrReplaceFeature(GenomeFeature feature)
Adds a GenomeFeature to the map, regardless of whether or not it's already been added. This method throws no warnings if you'll overwrite existing data, so use it with caution.
feature - The GenomeFeature to be addedpublic GenomeFeatureMapBuilder addFromGffFile(java.lang.String filename)
Load in data from a GFF (Gene Feature Format) file. Since GFF files have only a loose standard for the 9th column, this involves several ad-hoc heuristics about what things to look for. As such, it is not the preferred way to read in annotations. (That is JSON or tab-delimited format.) This method does not build the map, so you can string multiple calls together (if, for example, you have different annotations in different files)
filename - public GenomeFeatureMapBuilder addFromJsonFile(java.lang.String filename)
Load in data from a JSON-formatted file. JSON format is defined at http://www.json.org/, and consists of structured key-value pairs. For genome features, the key is the name of an attribute and the value is (obviously) its value. (For example: "chromosome":1). Note that if you have more than one feature per file (the normal case), all but the last closing brace ('}') should be followed by a comma, and the whole group should be within square braces ('[...]' That is, the first character of the file should be '[' and the last should be ']'). This makes it a properly-formatted JSON array. Common attributes (keys) are listed below. Although only the "id" attribute is required, a feature is pretty useless without some sort of positional information (chromosome, start/stop, etc.). "id": Unique identifier for this feature. Repeated identifiers throw an error. (Also accepts "name".) Required. "chrom": Which chromosome it occurs on (Also accepts "chr" or "chromosome") "start": Start position on the chromosome "stop": Stop position on the chromosome (Also accepts "end") "position": Postion on chromosome (in place of "start" and "stop" for features that are a single nucleotide) "parent_id": What other named feature this descends from (eg, Gene -> Transcript -> Exon). If none given, this will default to the chromosome (or the genome if chromosome isn't supplied) This method does not build the map, so you can string multiple calls together (if, for example, you have different annotations in different files).
filename - public GenomeFeatureMapBuilder addFromFlatFile(java.lang.String filename)
Load in data from a flat, tab-delimited text file. The first row should be a header identifying what attribute is in each column, and each subsequent row should correspond to a single feature. Columns that don't apply to a given feature should use "NA" or left empty. Common attributes (columns) are listed below. Although only the "id" attribute is required, a feature is pretty useless without some sort of positional information (chromosome, start/stop, etc.). "id": Unique identifier for this feature. Repeated identifiers throw an error (Also accepts "name".) Required. "chrom": Which chromosome it occurs on (Also accepts "chr" or "chromosome") "start": Start position on the chromosome "stop": Stop position on the chromosome (Also accepts "end") "position": Postion on chromosome (in place of "start" and "stop" for features that are a single nucleotide) "parent_id": What other named feature this descends from (eg, Gene -> Transcript -> Exon). If none given, this will default to the chromosome (or the genome if chromosome isn't supplied) This method does not build the map, so you can string multiple calls together (if, for example, you have different annotations in different files).
filename -