public class SmithWaterman
This class implement the classic local alignment algorithm (with linear gap penalty function) due to T.F.Smith and M.S.Waterman (1981).
This algorithm is very similar to the NeedlemanWunsch algorithm for global alignment. The idea here also consists of building an (n+1 x m+1) matrix M given two sequences A and B of sizes n and m, respectively. However, unlike in the global alignment case, every position M[i,j] in the matrix contains the similarity score of suffixes of A[1..i] and B[1..j].
Starting from row 0, column 0, the method computes each position M[i,j] with the following recurrence:net.maizegenetics.analysis.gbs.SmithWaterman$computeMatrix()
M[0,0] = M[0,j] = M[i,0] = 0
M[i,j] = max { M[i,j-1] + scoreInsertion (B[j]),
M[i-1,j-1] + scoreSubstitution (A[i], B[j]),
M[i-1,j] + scoreDeletion(A[i]) }
Note that, here, all cells in the first row and column are set to zero. The best local alignment score is the highest value found anywhere in the matrix.
Just like in global alignment case, this algorithm has quadratic space complexity because it needs to keep an (n+1 x m+1) matrix in memory. And since the work of computing each cell is constant, it also has quadratic time complexity.
After the matrix has been computed, the alignment can be retrieved by tracing a path back in the matrix from the position of the highest score until a cell of value zero is reached. This step is performed by the { buildOptimalAlignment} method, and its time complexity is linear on the size of the alignment.
If the similarity value only is needed (and not the alignment itself), it is easy to reduce the space requirement to O(n) by keeping just the last row or column in memory. This is precisely what is done by the computeScore method. Note that it still requires O(n2) time.
For a more efficient approach to the local alignment problem, see the CrochemoreLandauZivUkelson algorithm. For global alignment, see the NeedlemanWunsch algorithm.
protected java.lang.String seq1
The first sequence of an alignment.
protected java.lang.String seq2
The second sequence of an alignment.
protected kotlin.Array[] matrix
The dynamic programming matrix. Each position (i, j) represents the best score between a suffic of the firsts i characters of seq1 and a suffix of the first j characters of seq2.
protected int max_row
Indicate the row of where an optimal local alignment can be found in the matrix..
protected int max_col
Indicate the column of where an optimal local alignment can be found in the matrix.
public SmithWaterman()
public SmithWaterman(int rows,
int cols)
public SmithWaterman(kotlin.Array[] b1,
kotlin.Array[] b2)
protected int computeMatrix()
Computes the dynamic programming matrix. If the scoring scheme is not compatible with the loaded sequences.
public int computeScore()
Computes the score of the best local alignment between the two sequences using the scoring scheme previously set. This method calculates the similarity value only (doesn't build the whole matrix so the alignment cannot be recovered, however it has the advantage of requiring O(n) space only).
protected int scoreInsertion(char a)
protected int scoreSubstitution(char a,
char b)
protected int scoreDeletion(char a)
protected int max(int v1,
int v2)
Helper method to compute the the greater of two values.
v1 - first valuev2 - second valuev1 and v2protected int max(int v1,
int v2,
int v3)
Helper method to compute the the greater of three values.
v1 - first valuev2 - second valuev3 - third valuev1, v2 and v3protected int max(int v1,
int v2,
int v3,
int v4)
Helper method to compute the the greater of four values.
v1 - first valuev2 - second valuev3 - third valuev4 - fourth valuev1, v2 v3 and v4