Chapter 6: Q26E (page 197)

Sequence alignment. When a new gene is discovered, a standard approach to understanding its function is to look through a database of known genes and find close matches. The closeness of two genes is measured by the extent to which they are aligned. To formalize this, think of a gene as being a long string over an alphabet $\sum_{} = {A, C, G, T}$ . Consider two genes (strings) $x = A T G C C$ and $y = T A C G C A$ . An alignment of x and y is a way of matching up these two strings by writing them in columns, for instance:
${}_{-}A T_{-} G C C T A_{-} C G C$
Here the “_” indicates a “gap.” The characters of each string must appear in order, and each column must contain a character from at least one of the strings. The score of an alignment is specified by a scoring matrix $δ$ of size $(|\sum_{}| + 1) \times (|\sum_{}| + 1)$ , where the extra row and column are to accommodate gaps. For instance the preceding alignment has the following score:
$δ (- T) + δ (A, A) + δ (T, -) + δ (G, G) + δ (C, C) + δ (C, A)$
Give a dynamic programming algorithm that takes as input two strings X[1K n] and Y {1K m} and a scoring matrix $δ$ and returns the highest-scoring alignment. The running time should be O(mn) .

Short Answer

Expert verified

The dynamic algorithm that runs in O(nm) time has been obtained.

Step by step solution

Explain the given problem

Consider the two strings x and y with the length n and m respectively. Define a 2-Dimension array or a matrix that stores the score of aligning the string. Each value of the matrix signifies the score of aligning the sequence .

Defining the Recursive Relation

Based on he given condition, The matrix should be constructed using the following condition:

Align the strings as it is in input but omit the last element of these strings and then align characters $x_{p}$ and $y_{p}$
The recurrence will be: $S (i, j) = \max (\begin{matrix} S (i - 1, j) + δ (x [i], -) \\ S (i, j - 1, j) + δ (-, y [i]) \\ S (i - 1, j - 1) + 0 δ (x [i], y [j]) \end{matrix})$

Analysis of the Recurrence Relation

Since string $x_{1} x_{2} x_{3} K x_{p}$ has n characters and string $y_{1} y_{2} y_{3} k y_{q}$ has characters, The recursion relation will run up to $n * m$ times. Hence time complexity is: O (nm).

Therefore, the dynamic algorithm that runs in O (nm) time has been obtained.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Short Answer

Step by step solution

Explain the given problem

Defining the Recursive Relation

Analysis of the Recurrence Relation

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Computer Organisation and Architecture

Computer Programming

Issues in Computer Science

Data Structures

Theory of Computation

Functional Programming

Study anywhere. Anytime. Across all devices.

Company

Product

Help