Reference

Ambio

A light-weight bioinformatics library written in Python. It contains a variety of algorithms and functions that deal with strings.

Submodule: align

Features the most famous and useful algorithms used for matching and analizing DNA sequences represented as strings.

ambio.align.LCS(string1, string2)

Calculates the Longest Common Subsequence between two strings.

Learn more: Longest Common Subsequence problem [wikipedia]

Parameters
  • string1 (str) – The first string.

  • string2 (str) – The second string.

Returns

The Longest Common Subsequence between string1 and string2.

Return type

str

Example code

>> LCS("sunday", "saturday")
"suday"
ambio.align.alignmentScore(string1, string2, insertionDeletionWeight=- 2, substitutionWeight=- 1, matchWeight=1)

Calculate the alignment score of two given strings using the Needleman–Wunsch algorithm.

The algorithm essentially divides a large problem (e.g. the full sequence) into a series of smaller problems, and it uses the solutions to the smaller problems to find an optimal solution to the larger problem.

Learn more: Needleman-Wunsch Algorithm [wikipedia]

Parameters
  • string1 (str) – The first string to align.

  • string2 (str) – The second string to align.

  • paths (bool) – When True the table used for backtracking is also returned.

  • insertionDeletionWeight (int) – The cost of an insertion or deletion, default is -2.

  • substitutionWeight (int) – The cost of a substitution, default is -1.

  • matchWeight (int) – The profit of a matching character, default is 1.

Returns

A matrix containing the alignment scores, and optionally another matrix with the origin cells.

Return type

list

Example code

>> alignmentScore("sunday", "sunray")
4
ambio.align.alignmentScoreTable(string1, string2, paths=False, insertionDeletionWeight=- 2, substitutionWeight=- 1, matchWeight=1)

Generates the table of scores needed for finding the alignment score with the Needleman-Wunsch algorithm.

Learn more: Needleman-Wunsch Algorithm [wikipedia]

Parameters
  • string1 (str) – The first string to align.

  • string2 (str) – The second string to align.

  • paths (bool) – When True the table used for backtracking is also returned.

  • insertionDeletionWeight (int) – The cost of an insertion or deletion, default is -2.

  • substitutionWeight (int) – The cost of a substitution, default is -1.

  • matchWeight (int) – The profit of a matching character, default is 1.

Returns

A matrix containing the alignment scores, and optionally another matrix with the origin cells.

Return type

list

Example code

>> scores, paths = alignmentScoreTable("home", "house", paths=True)
>> scores
[[ 0,  -2, -4, -6, -8],
 [-2,   1, -1, -3, -5],
 [-4,  -1,  2,  0, -2],
 [-6,  -3,  0,  1, -1],
 [-8,  -5, -2, -1,  0],
 [-10, -7, -4, -3,  0]]
>> paths
[[(-1,-1), (0, 0), (0, 1), (0, 2), (0, 3)],
 [ (0, 0), (0, 0), (1, 1), (1, 2), (1, 3)],
 [ (1, 0), (1, 1), (1, 1), (2, 2), (2, 3)],
 [ (2, 0), (2, 1), (2, 2), (2, 2), (3, 3)],
 [ (3, 0), (3, 1), (3, 2), (3, 3), (3, 3)],
 [ (4, 0), (4, 1), (4, 2), (4, 3), (4, 3)]]
ambio.align.editDistance(string1, string2)

Calculates the edit distance between two strings.

The edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.

Learn more: Edit distance [wikipedia]

Parameters
  • string1 (str) – The first string

  • string2 (str) – The second string

Returns

The edit distance between string1 and string2.

Return type

int

Example code

>> editDistance("sunday", "saturday")
3
ambio.align.hammingDistance(string1, string2)

Calculates the Hamming distance between two strings.

The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.

Learn more: Hamming Distance [wikipedia]

Parameters
  • string1 (str) – The first string.

  • string2 (str) – The second string.

Raises

ValueError – The two strings are not the same length.

Returns

The Hamming distance between string1 and string2.

Return type

int

Example code

>> hammingDistance("sunday", "sunray")
1

>> hammingDistance("hello", "world!")
ValueError: "The two given strings are not the same length."
ambio.align.showAlignment(string1, string2, insertionDeletionWeight=- 2, substitutionWeight=- 1, matchWeight=1)

Returns a visual representation of the alignment of two given strings using the Needleman–Wunsch algorithm.

Learn more: Needleman-Wunsch Algorithm [wikipedia]

Parameters
  • string1 (str) – The first string to align.

  • string2 (str) – The second string to align.

  • insertionDeletionWeight (int) – The cost of an insertion or deletion, default is -2.

  • substitutionWeight (int) – The cost of a substitution, default is -1.

  • matchWeight (int) – The profit of a matching character, default is 1.

Returns

A list containing the two strings modified to show which edits have been made for them to be aligned.

Return type

list

Example code

>> al = showAlignment("sunday", "sunray")
>> print(al[0] + "\n" + al[1])
s--unday
saturday

Submodule: match

ambio.match.naiveExactMatch(pattern, string)

Perform a naive exact pattern search of pattern in string.

Parameters
  • pattern (str) – The pattern to be found in string.

  • string (str) – The main string in which we look for the pattern.

Returns

The index of the first occurrence of pattern in string. If the pattern is not found, return -1.

Return type

int

Example Code

>> naiveExactMatch("abc", "ufkabcodp")
3