Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Reconstructing evolutionary trees by maximum parsimony. Suppose we manage to sequence a particular gene across a whole bunch of different species. For concreteness, say there are n species, and the sequences are strings of length k over alphabet={A,C,G,T}. How can we use this information to reconstruct the evolutionary history of these species?

Evolutionary history is commonly represented by a tree whose leaves are the different species, whose root is their common ancestor, and whose internal branches represent speciation events (that is, moments when a new species broke off from an existing one). Thus we need to find the following:

• An evolutionary tree with the given species at the leaves.

• For each internal node, a string of length K: the gene sequence for that particular ancestor.

For each possible tree T annotated with sequencess(u)kat each of its nodes , we can assign a score based on the principle of parsimony: fewer mutations are more likely.

localid="1659249441524" score(T)=(u.v)E(T)(numberofpositionsonwhichs(u)ands(v)disagree)

Finding the highest-score tree is a difficult problem. Here we will consider just a small part of it: suppose we know the structure of the tree, and we want to fill in the sequences s(u) of the internal nodes u. Here’s an example with k=4 and n=5:


(a) In this particular example, there are several maximum parsimony reconstructions of the internal node sequences. Find one of them.

(b) Give an efficient (in terms of n and k ) algorithm for this task. (Hint: Even though the sequences might be long, you can do just one position at a time.)

Short Answer

Expert verified

(a) One of the reconstructions of the internal nodes sequences:


(b)Algorithm for the given task.

δi,j=1ifi=j0otherwiseL=left-subtreeTR=right-subtreeTST,C=mina,bSL,a+SR,b+δc,a+δc,b

Step by step solution

01

Step 1:Find one of the reconstruction

(a)

Consider the given tree,start mutation(change in sequence) from one position at a time, this also removes ambiguity of making sequence. Now according to mutation, there can be many evolution trees. One of the possible evolutionary tree is as follows,


Therefore, one of the reconstruction is obtained.

02

Step 2:Give an efficient algorithm

(b)

Consider one bit at time to define the algorithm. Define the state S(T,c), Where cA,C,G,T. To represent the problem domain,T is the subtree and the minimum score when the root node of T is c. So the following algorithm is,

δI,J=1ifi=j0otherwiseL=left-subtreeTR=right-subtreeTST,C=mina,bSL,a+SR,b+δc,a+δc,b

Since inner nodes has two sons, the number of inner nodes that can be obtained is the number of leaf nodes minus 1. The state number can be calculated in s=O(n) time.

And the transition cost of the state is O (1). Thus, the whole algorithm complexity is O (nk)

Therefore, the efficient algorithm has been obtained with O (nk) complexity.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Let us define a multiplication operation on three symbols a,b,caccording to the following table; thus ab=b,ba=c, and so on. Notice that the multiplication operation defined by the table is neither associative nor commutative.

Find an efficient algorithm that examines a string of these symbols, say bbbbac, and decides whether or not it is possible to parenthesize the string in such a way that the value of the resulting expression is . For example, on input bbbbacyour algorithm should return yes because((b(bb))(ba))c=a.

Consider the following 3-PARTITION problem. Given integersa1,...,an, we want to determine whether it is possible to partition of {1,...,n} into three disjoint subsets I,J,Ksuch that

aiiI=ajjJ=akkk=13aii1 .

For example, for input(1,2,3,4,4,5,8) the answer is yes, because there is the partition(1,8),(4,5),(2,3,4). On the other hand, for input(2,2,3,5) the answer is no. Devise and analyze a dynamic programming algorithm3-PARTITION for that runs in time polynomial in n and in Σiai.

Given two strings x=x1x2···xnand y=y1y2···ym, we wish to find the length of their longest common substring, that is, the largest k for which there are indices i and j with xixi+1···xi+k-1=yjyj+1···yj+k-1. Show how to do this in time0(mn)

Exon chaining.Each gene corresponds to a subregion of the overall genome (the DNA sequence); however, part of this region might be “junk DNA.” Frequently, a gene consists of several pieces called exons, which are separated by junk fragments called introns. This complicates the process of identifying genes in a newly sequenced genome.

Suppose we have a new DNA sequence and we want to check whether a certain gene (a string) is present in it. Because we cannot hope that the gene will be a contiguous subsequence, we look for partial matches—fragments of the DNA that are also present in the gene (actually, even these partial matches will be approximate, not perfect). We then attempt to assemble these fragments.

Let x 1Kndenote the DNA sequence. Each partial match can be represented by a triple li,ri,wi, where xliKriis the fragment and is a weight

representing the strength of the match (it might be a local alignment score or some other statistical quantity). Many of these potential matches could be false, so the goal is to find a subset of the triples that are consistent (nonoverlapping) and have a maximum total weight.

Show how to do this efficiently.

Here is yet another variation on the change-making problem (Exercise 6.17). Given an unlimited supply of coins of denominations x1,x2,x3.........xnwe wish to make change for a value v using at most k coins; that is, we wish to find a set ofkcoins whose total value is v. This might not be possible: for instance, if the denominations are 5 and 10 and k=6, then we can make change for 55 but not for 65. Give an efficient dynamic-programming algorithm for the following problem. Input: ; x1,x2,x3.........xn;k;v.Question: Is it possible to make change for v using at most k coins, of denominations x1,x2,x3.........xn?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free