Chapter 6: Q29E (page 198)

Exon chaining.Each gene corresponds to a subregion of the overall genome (the DNA sequence); however, part of this region might be “junk DNA.” Frequently, a gene consists of several pieces called exons, which are separated by junk fragments called introns. This complicates the process of identifying genes in a newly sequenced genome.
Suppose we have a new DNA sequence and we want to check whether a certain gene (a string) is present in it. Because we cannot hope that the gene will be a contiguous subsequence, we look for partial matches—fragments of the DNA that are also present in the gene (actually, even these partial matches will be approximate, not perfect). We then attempt to assemble these fragments.
Let x $[1 K n]$ denote the DNA sequence. Each partial match can be represented by a triple $(l_{i}, r_{i}, w_{i})$ , where $x [l_{i} K r_{i}]$ is the fragment and is a weight
representing the strength of the match (it might be a local alignment score or some other statistical quantity). Many of these potential matches could be false, so the goal is to find a subset of the triples that are consistent (nonoverlapping) and have a maximum total weight.
Show how to do this efficiently.

Short Answer

Expert verified

Dynamic programming solves the problem in $O (n)$

Step by step solution

Step 1:Explain Exon Chaining

The DNA sequence or genome is written as $x [1 K n]$

Each partial matching given represents the starting and the weight of the match.

From these given sets of intervals, the maximum chain of intervals in required.

The greedy solution to this problem repeats the process of taking the chain with the highest scores first and then the largest left in the range until no more chains can be taken. Then sum the taken chains to get the maximum score.

But by constructing a graph, this problem can be solved in $O (n)$ time complexity.

Step 2:Give Algorithm

Algorithm is as follows,

With a graph $G (V, E)$ , the algorithm is given as follows:

(G, n)

for i= 1 to 2n

$r e s_{i} = 0$

for i = 1 to 2n

if $v_{i}$ in G corresponds to right end l

$j_{i}$ left end index of vertex for l

$w_{i}$ -weight

$r e s_{j} = m a x \{(r e s_{j} + w_{i}), (r e s_{j - 1})\}$

else

$r e s_{i} = r e s_{i - 1}$

return $r e s_{2 n}$

Explain the algorithm

Explanation:

The Exon chaining problem can be solved using dynamic programming in a graph.

And that same approach is provided above.

The problem for n interval can be solved using 2nvertices in graph.

Assume that the set of left and right interval ends is sorted into ascending order.

And all positions are definite. This forms an ordered array of vertices.

There are 3n-1edges in the graph. In the algorithm, $r e s_{i}$ denotes the length of the longest path ending at vertex $v_{i}$ in the graph.

Therefore the final solution is $r e s_{2 n}$

Thus, dynamic programming solves the problem in O (n)

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Short Answer

Step by step solution

Step 1:Explain Exon Chaining

Step 2:Give Algorithm

Explain the algorithm

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Issues in Computer Science

Problem Solving Techniques

Algorithms in Computer Science

Game Design in Computer Science

Data Structures

Functional Programming

Study anywhere. Anytime. Across all devices.

Company

Product

Help