Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Exon chaining.Each gene corresponds to a subregion of the overall genome (the DNA sequence); however, part of this region might be “junk DNA.” Frequently, a gene consists of several pieces called exons, which are separated by junk fragments called introns. This complicates the process of identifying genes in a newly sequenced genome.

Suppose we have a new DNA sequence and we want to check whether a certain gene (a string) is present in it. Because we cannot hope that the gene will be a contiguous subsequence, we look for partial matches—fragments of the DNA that are also present in the gene (actually, even these partial matches will be approximate, not perfect). We then attempt to assemble these fragments.

Let x 1Kndenote the DNA sequence. Each partial match can be represented by a triple li,ri,wi, where xliKriis the fragment and is a weight

representing the strength of the match (it might be a local alignment score or some other statistical quantity). Many of these potential matches could be false, so the goal is to find a subset of the triples that are consistent (nonoverlapping) and have a maximum total weight.

Show how to do this efficiently.

Short Answer

Expert verified

Dynamic programming solves the problem in On

Step by step solution

01

Step 1:Explain Exon Chaining

The DNA sequence or genome is written asx1Kn

Each partial matching given represents the starting and the weight of the match.

From these given sets of intervals, the maximum chain of intervals in required.

The greedy solution to this problem repeats the process of taking the chain with the highest scores first and then the largest left in the range until no more chains can be taken. Then sum the taken chains to get the maximum score.

But by constructing a graph, this problem can be solved in Ontime complexity.

02

Step 2:Give Algorithm

Algorithm is as follows,

With a graph GV,E, the algorithm is given as follows:

(G, n)

for i= 1 to 2n

resi=0

for i = 1 to 2n

if viin G corresponds to right end l

jileft end index of vertex for l

wi-weight

resj=maxresj+wi,resj-1

else

resi=resi-1

return res2n

03

Explain the algorithm

Explanation:

The Exon chaining problem can be solved using dynamic programming in a graph.

And that same approach is provided above.

The problem for n interval can be solved using 2nvertices in graph.

Assume that the set of left and right interval ends is sorted into ascending order.

And all positions are definite. This forms an ordered array of vertices.

There are 3n-1edges in the graph. In the algorithm, residenotes the length of the longest path ending at vertex vi in the graph.

Therefore the final solution is res2n

Thus, dynamic programming solves the problem in O (n)

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Reconstructing evolutionary trees by maximum parsimony. Suppose we manage to sequence a particular gene across a whole bunch of different species. For concreteness, say there are n species, and the sequences are strings of length k over alphabet={A,C,G,T}. How can we use this information to reconstruct the evolutionary history of these species?

Evolutionary history is commonly represented by a tree whose leaves are the different species, whose root is their common ancestor, and whose internal branches represent speciation events (that is, moments when a new species broke off from an existing one). Thus we need to find the following:

• An evolutionary tree with the given species at the leaves.

• For each internal node, a string of length K: the gene sequence for that particular ancestor.

For each possible tree T annotated with sequencess(u)kat each of its nodes , we can assign a score based on the principle of parsimony: fewer mutations are more likely.

localid="1659249441524" score(T)=(u.v)E(T)(numberofpositionsonwhichs(u)ands(v)disagree)

Finding the highest-score tree is a difficult problem. Here we will consider just a small part of it: suppose we know the structure of the tree, and we want to fill in the sequences s(u) of the internal nodes u. Here’s an example with k=4 and n=5:


(a) In this particular example, there are several maximum parsimony reconstructions of the internal node sequences. Find one of them.

(b) Give an efficient (in terms of n and k ) algorithm for this task. (Hint: Even though the sequences might be long, you can do just one position at a time.)

Given two strings x=x1x2···xnand y=y1y2···ym, we wish to find the length of their longest common subsequence, that is, the largest k for which there are indices i1<i2<···<ikand j1<j2<···<jkwith xi1xi2···xik=yj1yj2···yjk. Show how to do this in time 0(mn).

Give an O(nt) algorithm for the following task. Input: A list of n positive integers a1,a2,...,an; a positive integer t. Question: Does some subset of the ai’s add up to t? (You can use each ai at most once.) (Hint: Look at subproblems of the form “does a subset of{a1,a2,...,ai} add up to ?”)

Consider the following variation on the change-making problem (Exercise 6.17): you are given denominations x1,x2,...,xn, and you want to make change for a value v, but you are allowed to use each denomination at most once. For instance, if the denominations are 1,5,10,20,then you can make change for 16=1+15and for 31=1+10+20but not for 40(because you can’t use 20 twice).

Input: Positive integers; x1,x2,...,xnanother integer v.

Output: Can you make change for v, using each denominationxi at most once?Show how to solve this problem in time O(nV).

Alignment with gap penalties. The alignment algorithm of Exercise 6.26 helps to identify DNA sequences that are close to one another. The discrepancies between these closely matched sequences are often caused by errors in DNA replication. However, a closer look at the biological replication process reveals that the scoring function we considered earlier has a qualitative problem: nature often inserts or removes entire substrings of nucleotides (creating long gaps), rather than editing just one position at a time. Therefore, the penalty for a gap of length 10 should not be 10 times the penalty for a gap of length 1, but something significantly smaller.

Repeat Exercise 6.26, but this time use a modified scoring function in which the penalty for a gap of length k is c0 + c1k, where c0 and c1 are given constants (and c0 is larger than c1).

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free