public class NeedlemanWunsch
extends PairwiseAlignmentAlgorithm
This class implements the classic global alignment algorithm (with linear gap penalty function) due to S.B.Needleman and C.D.Wunsch (1970).
It is based on a dynamic programming approach. The idea consists of, given two sequences A and B of sizes n and m, respectively, building an (n+1 x m+1) matrix M that contains the similarity of prefixes of A and B. Every position M[i,j] in the matrix holds the score between the subsequences A[1..i] and B[1..j]. The first row and column represent alignments with spaces.
Starting from row 0, column 0, the algorithm computes each position M[i,j] with the following recurrence:
M[0,0] = 0
M[i,j] = max { M[i,j-1] + scoreInsertion (B[j]),
M[i-1,j-1] + scoreSubstitution (A[i], B[j]),
M[i-1,j] + scoreDeletion(A[i]) }
In the end, the value at the last position (last row, last column) will contain the similarity between the two sequences. This part of the algorithm is accomplished by the method. It has quadratic space complexity since it needs to keep an (n+1 x m+1) matrix in memory. And since the work of computing each cell is constant, it also has quadratic time complexity.net.maizegenetics.analysis.gbs.neobio.NeedlemanWunsch$computeMatrix()
After the matrix has been computed, the alignment can be retrieved by tracing a path back in the matrix from the last position to the first. This step is performed by the method, and since the path can be roughly as long as (m + n), this method has O(n) time complexity.net.maizegenetics.analysis.gbs.neobio.NeedlemanWunsch$buildOptimalAlignment()
If the similarity value only is needed (and not the alignment itself), it is easy to reduce the space requirement to O(n) by keeping just the last row or column in memory. This is precisely what is done by the method. Note that it still requires O(n2) time.net.maizegenetics.analysis.gbs.neobio.NeedlemanWunsch$computeScore()
For a more efficient approach to the global alignment problem, see the class CrochemoreLandauZivUkelson algorithm. For local alignment, see the class SmithWaterman algorithm.
net.maizegenetics.analysis.gbs.neobio.NeedlemanWunsch$computeMatrix(),
net.maizegenetics.analysis.gbs.neobio.NeedlemanWunsch$buildOptimalAlignment(),
net.maizegenetics.analysis.gbs.neobio.NeedlemanWunsch$computeScore(),
class CrochemoreLandauZivUkelson,
class SmithWaterman,
class SmithWaterman,
class CrochemoreLandauZivUkelson,
class CrochemoreLandauZivUkelsonLocalAlignment,
class CrochemoreLandauZivUkelsonGlobalAlignmentprotected CharSequence seq1
The first sequence of an alignment.
protected CharSequence seq2
The second sequence of an alignment.
protected kotlin.Array[] matrix
The dynamic programming matrix. Each position (i, j) represents the best score between the firsts i characters of seq1 and j characters of seq2.
protected void loadSequencesInternal(java.io.Reader input1,
java.io.Reader input2)
Loads sequences into class CharSequence instances. In case of any error, an exception is raised by the constructor of CharSequence (please check the specification of that class for specific requirements).
input1 - Input for first sequenceinput2 - Input for second sequenceIOException - If an I/O error occurs when reading the sequencesInvalidSequenceException - If the sequences are not validclass CharSequence,
class CharSequenceprotected void unloadSequencesInternal()
Frees pointers to loaded sequences and the dynamic programming matrix so that their data can be garbage collected.
protected PairwiseAlignment computePairwiseAlignment()
Builds an optimal global alignment between the loaded sequences after computing the dynamic programming matrix. It calls the buildOptimalAlignment method after the computeMatrix method computes the dynamic programming matrix.
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.#computeMatrix,
#buildOptimalAlignmentprotected void computeMatrix()
Computes the dynamic programming matrix.
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.protected PairwiseAlignment buildOptimalAlignment()
Builds an optimal global alignment between the loaded sequences. Before it is executed, the dynamic programming matrix must already have been computed by the computeMatrix method.
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.#computeMatrixprotected int computeScore()
Computes the score of the best global alignment between the two sequences using the scoring scheme previously set. This method calculates the similarity value only (doesn't build the whole matrix so the alignment cannot be recovered, however it has the advantage of requiring O(n) space only).
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.