Reference¶
Ambio¶
A light-weight bioinformatics library written in Python. It contains a variety of algorithms and functions that deal with strings.
Submodule: align¶
Features the most famous and useful algorithms used for matching and analizing DNA sequences represented as strings.
-
ambio.align.LCS(string1, string2)¶ Calculates the Longest Common Subsequence between two strings.
Learn more: Longest Common Subsequence problem [wikipedia]
- Parameters
string1 (str) – The first string.
string2 (str) – The second string.
- Returns
The Longest Common Subsequence between string1 and string2.
- Return type
str
Example code
>> LCS("sunday", "saturday") "suday"
-
ambio.align.alignmentScore(string1, string2, insertionDeletionWeight=- 2, substitutionWeight=- 1, matchWeight=1)¶ Calculate the alignment score of two given strings using the Needleman–Wunsch algorithm.
The algorithm essentially divides a large problem (e.g. the full sequence) into a series of smaller problems, and it uses the solutions to the smaller problems to find an optimal solution to the larger problem.
Learn more: Needleman-Wunsch Algorithm [wikipedia]
- Parameters
string1 (str) – The first string to align.
string2 (str) – The second string to align.
paths (bool) – When True the table used for backtracking is also returned.
insertionDeletionWeight (int) – The cost of an insertion or deletion, default is -2.
substitutionWeight (int) – The cost of a substitution, default is -1.
matchWeight (int) – The profit of a matching character, default is 1.
- Returns
A matrix containing the alignment scores, and optionally another matrix with the origin cells.
- Return type
list
Example code
>> alignmentScore("sunday", "sunray") 4
-
ambio.align.alignmentScoreTable(string1, string2, paths=False, insertionDeletionWeight=- 2, substitutionWeight=- 1, matchWeight=1)¶ Generates the table of scores needed for finding the alignment score with the Needleman-Wunsch algorithm.
Learn more: Needleman-Wunsch Algorithm [wikipedia]
- Parameters
string1 (str) – The first string to align.
string2 (str) – The second string to align.
paths (bool) – When True the table used for backtracking is also returned.
insertionDeletionWeight (int) – The cost of an insertion or deletion, default is -2.
substitutionWeight (int) – The cost of a substitution, default is -1.
matchWeight (int) – The profit of a matching character, default is 1.
- Returns
A matrix containing the alignment scores, and optionally another matrix with the origin cells.
- Return type
list
Example code
>> scores, paths = alignmentScoreTable("home", "house", paths=True) >> scores [[ 0, -2, -4, -6, -8], [-2, 1, -1, -3, -5], [-4, -1, 2, 0, -2], [-6, -3, 0, 1, -1], [-8, -5, -2, -1, 0], [-10, -7, -4, -3, 0]] >> paths [[(-1,-1), (0, 0), (0, 1), (0, 2), (0, 3)], [ (0, 0), (0, 0), (1, 1), (1, 2), (1, 3)], [ (1, 0), (1, 1), (1, 1), (2, 2), (2, 3)], [ (2, 0), (2, 1), (2, 2), (2, 2), (3, 3)], [ (3, 0), (3, 1), (3, 2), (3, 3), (3, 3)], [ (4, 0), (4, 1), (4, 2), (4, 3), (4, 3)]]
-
ambio.align.editDistance(string1, string2)¶ Calculates the edit distance between two strings.
The edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.
Learn more: Edit distance [wikipedia]
- Parameters
string1 (str) – The first string
string2 (str) – The second string
- Returns
The edit distance between string1 and string2.
- Return type
int
Example code
>> editDistance("sunday", "saturday") 3
-
ambio.align.hammingDistance(string1, string2)¶ Calculates the Hamming distance between two strings.
The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.
Learn more: Hamming Distance [wikipedia]
- Parameters
string1 (str) – The first string.
string2 (str) – The second string.
- Raises
ValueError – The two strings are not the same length.
- Returns
The Hamming distance between string1 and string2.
- Return type
int
Example code
>> hammingDistance("sunday", "sunray") 1 >> hammingDistance("hello", "world!") ValueError: "The two given strings are not the same length."
-
ambio.align.showAlignment(string1, string2, insertionDeletionWeight=- 2, substitutionWeight=- 1, matchWeight=1)¶ Returns a visual representation of the alignment of two given strings using the Needleman–Wunsch algorithm.
Learn more: Needleman-Wunsch Algorithm [wikipedia]
- Parameters
string1 (str) – The first string to align.
string2 (str) – The second string to align.
insertionDeletionWeight (int) – The cost of an insertion or deletion, default is -2.
substitutionWeight (int) – The cost of a substitution, default is -1.
matchWeight (int) – The profit of a matching character, default is 1.
- Returns
A list containing the two strings modified to show which edits have been made for them to be aligned.
- Return type
list
Example code
>> al = showAlignment("sunday", "sunray") >> print(al[0] + "\n" + al[1]) s--unday saturday
Submodule: match¶
-
ambio.match.naiveExactMatch(pattern, string)¶ Perform a naive exact pattern search of pattern in string.
- Parameters
pattern (str) – The pattern to be found in string.
string (str) – The main string in which we look for the pattern.
- Returns
The index of the first occurrence of pattern in string. If the pattern is not found, return -1.
- Return type
int
Example Code
>> naiveExactMatch("abc", "ufkabcodp") 3