Class Sequences

java.lang.Object
org.dishevelled.bio.sequence.Sequences

public final class Sequences extends Object
Utility methods on sequences.
Since:
1.1
Author:
Michael Heuer
  • Constructor Details

    • Sequences

      public Sequences()
  • Method Details

    • decode

      public static String decode(ByteBuffer bytes, int length) throws IOException
      Decode the specified byte buffer as an unambiguous DNA sequence the specified length as a string.
      Parameters:
      bytes - byte buffer, must not be null
      length - length, must be at least 0
      Returns:
      the specified byte buffer decoded as an unambiguous DNA sequence the specified length as a string
      Throws:
      IOException - if an I/O error occurs
      See Also:
    • decode

      public static <T extends Appendable> T decode(ByteBuffer bytes, int length, T appendable) throws IOException
      Decode the specified byte buffer as an unambiguous DNA sequence the specified length to the specified appendable.
      Type Parameters:
      T - appendable type
      Parameters:
      bytes - byte buffer, must not be null
      length - length, must be at least 0
      appendable - appendable to decode to, must not be null
      Returns:
      the specified byte buffer decoded as an unambiguous DNA sequence the specified length to the specified appendable
      Throws:
      IOException - if an I/O error occurs
      See Also:
    • encode

      public static ByteBuffer encode(String sequence)
      Encode the specified unambiguous DNA sequence to a new byte buffer. Valid unambiguous DNA sequence symbols are { A, C, G, T, a, c, g, t }. Similar to twoBit format the DNA symbols are packed to two bits per base, represented as so: T - 00, C - 01, A - 10, G - 11. The first base is in the most significant 2-bit byte; the last base is in the least significant 2 bits. For example, the sequence TCAG is represented as 00011011.
      Parameters:
      sequence - unambiguous DNA sequence to encode, must not be null
      Returns:
      the specified unambiguous DNA sequence encoded to a new byte buffer
      Throws:
      IllegalArgumentException - if the specified sequence contains any ambiguity symbols
    • encode

      public static ByteBuffer encode(String sequence, ByteBuffer bytes)
      Encode the specified unambiguous DNA sequence to the specified byte buffer. Valid unambiguous DNA sequence symbols are { A, C, G, T, a, c, g, t }. Similar to twoBit format the DNA symbols are packed to two bits per base, represented as so: T - 00, C - 01, A - 10, G - 11. The first base is in the most significant 2-bit byte; the last base is in the least significant 2 bits. For example, the sequence TCAG is represented as 00011011.
      Parameters:
      sequence - unambiguous DNA sequence to encode, must not be null
      bytes - byte buffer, must not be null
      Returns:
      the specified unambiguous DNA sequence encoded to the specified byte buffer
      Throws:
      IllegalArgumentException - if the specified sequence contains any ambiguity symbols
    • decodeWithNs

      public static String decodeWithNs(ByteBuffer bytes, int length) throws IOException
      Decode the specified byte buffer as a DNA sequence with N ambiguity symbols the specified length as a string.
      Parameters:
      bytes - byte buffer, must not be null
      length - length, must be at least 0
      Returns:
      the specified byte buffer decoded as a DNA sequence with N ambiguity symbols the specified length as a string
      Throws:
      IOException - if an I/O error occurs
      See Also:
    • decodeWithNs

      public static <T extends Appendable> T decodeWithNs(ByteBuffer bytes, int length, T appendable) throws IOException
      Decode the specified byte buffer as a DNA sequence with N ambiguity symbols the specified length to the specified appendable.
      Type Parameters:
      T - appendable type
      Parameters:
      bytes - byte buffer, must not be null
      length - length, must be at least 0
      appendable - appendable to decode to, must not be null
      Returns:
      the specified byte buffer decoded as a DNA sequence with N ambiguity symbols the specified length to the specified appendable
      Throws:
      IOException - if an I/O error occurs
      See Also:
    • encodeWithNs

      public static ByteBuffer encodeWithNs(String sequence)
      Encode the specified DNA sequence with N ambiguity symbols to a new byte buffer. Valid DNA sequence with N ambiguity symbols are { A, C, G, T, N, a, c, g, t, n }. Similar to .nib format the DNA symbols are packed two bases to the byte. The first base is packed in the high-order 4 bits (nibble); the second base is packed in the low-order four bits: byte = (base0<<4) + base1. The numerical representations for the bases are T - 0, C - 1, A - 2, G - 3, N - 4.
      Parameters:
      sequence - DNA sequence with N ambiguity symbols to encode, must not be null
      Returns:
      the specified DNA sequence with N ambiguity symbols encoded to a new byte buffer
      Throws:
      IllegalArgumentException - if the specified sequence contains any ambiguity symbols other than { N, n }
    • encodeWithNs

      public static ByteBuffer encodeWithNs(String sequence, ByteBuffer bytes)
      Encode the specified DNA sequence with N ambiguity symbols to the specified byte buffer. Valid DNA sequence with N ambiguity symbols are { A, C, G, T, N, a, c, g, t, n }. Similar to .nib format the DNA symbols are packed two bases to the byte. The first base is packed in the high-order 4 bits (nibble); the second base is packed in the low-order four bits: byte = (base0<<4) + base1. The numerical representations for the bases are T - 0, C - 1, A - 2, G - 3, N - 4.
      Parameters:
      sequence - DNA sequence with N ambiguity symbols to encode, must not be null
      bytes - byte buffer, must not be null
      Returns:
      the specified DNA sequence with N ambiguity symbols encoded to the specified byte buffer
      Throws:
      IllegalArgumentException - if the specified sequence contains any ambiguity symbols other than { N, n }
    • decodeWithAmbiguity

      public static String decodeWithAmbiguity(ByteBuffer bytes, int length) throws IOException
      Decode the specified byte buffer as a DNA sequence with ambiguity symbols the specified length as a string.
      Parameters:
      bytes - byte buffer, must not be null
      length - length, must be at least 0
      Returns:
      the specified byte buffer decoded as a DNA sequence with ambiguity symbols the specified length as a string
      Throws:
      IOException - if an I/O error occurs
      Since:
      1.2
      See Also:
    • decodeWithAmbiguity

      public static <T extends Appendable> T decodeWithAmbiguity(ByteBuffer bytes, int length, T appendable) throws IOException
      Decode the specified byte buffer as a DNA sequence with ambiguity symbols the specified length to the specified appendable.
      Type Parameters:
      T - appendable type
      Parameters:
      bytes - byte buffer, must not be null
      length - length, must be at least 0
      appendable - appendable to decode to, must not be null
      Returns:
      the specified byte buffer decoded as a DNA sequence with ambiguity symbols the specified length to the specified appendable
      Throws:
      IOException - if an I/O error occurs
      Since:
      1.2
      See Also:
    • encodeWithAmbiguity

      public static ByteBuffer encodeWithAmbiguity(String sequence)
      Encode the specified DNA sequence with ambiguity symbols to a new byte buffer. Per the BAM specification, ambiguity symbols { =, A, a, C, c, M, m, G, g, R, r, S, s, V, v, T, t, W, w, Y, y, H, h, K, k, D, d, B, b, N, n } are mapped to bytes in the range [0, 15], with other characters mapped to N; high nibble first (1st symbol in the highest 4-bit of the 1st byte).
      Parameters:
      sequence - DNA sequence with ambiguity symbols to encode, must not be null
      Returns:
      the specified DNA sequence with ambiguity symbols encoded to a new byte buffer
      Since:
      1.2
    • encodeWithAmbiguity

      public static ByteBuffer encodeWithAmbiguity(String sequence, ByteBuffer bytes)
      Encode the specified DNA sequence with ambiguity symbols to the specified byte buffer. Per the BAM specification, ambiguity symbols { =, A, a, C, c, M, m, G, g, R, r, S, s, V, v, T, t, W, w, Y, y, H, h, K, k, D, d, B, b, N, n } are mapped to bytes in the range [0, 15], with other characters mapped to N; high nibble first (1st symbol in the highest 4-bit of the 1st byte).
      Parameters:
      sequence - DNA sequence with ambiguity symbols to encode, must not be null
      bytes - byte buffer, must not be null
      Returns:
      the specified DNA sequence with ambiguity symbols encoded to the specified byte buffer
      Since:
      1.2
    • formatBits

      static String formatBits(byte b)