Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Comparison of Pax6 and eyeless In this exercise, you will examine the sequences for both Pax and eyeless and consider the differences and similarities between them. First, download the sequences for \(P a \times 6(82069480)\) and eyeless (12643549) from the \(\mathrm{NCB}\) Entrez Protein site using their accession numbers, given in parentheses. Go to the BLAST homepage (www.ncbi.nlm.nih.gov/blast) and, choose "Align two sequences using BLAST (bl2seq)" under the "Specialized BLAST heading. Instead of searching a large database as is typical with BLAST, we will only be aligning two sequences with one another. Paste your sequences for \(P\) ax 6 and eyeless into boxes for Sequence 1 and Sequence 2 and make sure that you choose blastp as the program. When all of this is done, push the Align button. In your BLAST alignments there is a line between "Query" and "Sbjct" that helps guide the eye with the alignment; if "Query" and "Sbjct" agree identically, the matching letter is repeated in the middle; if they do not match exactly but the amino acids are compatible in some sense (a favorable mismatch), then a "+ " is displayed on the middle line to indicate a positive score. Where there is no letter on the middle line indicates an unfavorable mismatch or a gap. The numbers at the beginning and end of the "Query" and "Sbjct" lines tell you the position in the sequence. (a) Choose one of the alignments returned by BLAST and give a tally of the number of (i) identical amino acids; (ii) favorable mismatches; (iii) unfavorable mismatches: (iv) gaps. (b) Give two examples of unfavorable mismatches and two examples of favorable mismatches in your chosen alignment. Based on what you know of the chemistry and structure of the amino acids, why might these amino acid pairs give rise to negative and positive scores, respectively? (c) Choose one unfavorable mismatch pair and one favorable mismatch pair from your chosen alignment. What codons may give rise to each of these amino acids? What is the minimum number of mutations necessary in the DNA to produce this particular unfavorable mismatch? How many DNA mutations would be required to produce the particular favorable mismatch you chose?

Short Answer

Expert verified
The numbers of identical amino acids, favorable mismatches, unfavorable mismatches and gaps can vary depending on the specific BLAST alignment chosen. Examples of favorable mismatches would be amino acids with similar chemical properties, while amino acids with significantly different properties would constitute an unfavorable mismatch. Depending on the specific mismatch pair selected, the minimum number of mutations necessary in the DNA sequence to produce these mismatches can also be determined.

Step by step solution

01

Insert Sequences into BLAST

Download the sequences for Pax6 (accession number: 82069480) and eyeless (accession number: 12643549) from the NCBI Entrez Protein site. Go to the BLAST homepage and choose 'Align two sequences using BLAST (bl2seq)'. Paste the sequences for Pax6 and eyeless into boxes for Sequence 1 and Sequence 2 respectively, and choose blastp as the program. Then press the Align button.
02

Analyze BLAST Alignment

After alignment, examine the middle line between the Query (Pax6) and Sbjct (eyeless). Tally the number of identical :amide: acids, favorable and unfavorable mismatches, and gaps. These will be visually represented - identical matches should have the same letter repeated, favorable matches with '+', and gaps or unfavorable matches with ' ' (a blank).
03

Compare Identical and Non-Identical Matches

Give examples of favorable and unfavorable mismatches that can be found in the BLAST alignment results. An amino acid mismatch where the amino acids have similar properties, such as charge or polarity, is a favorable mismatch, it will have a '+' sign, while a mismatch where the properties are vastly different is an unfavorable mismatch and will have a ' ' (space) sign on the alignment line. Discuss why these amino acid pairs have such assignments based on the chemistry and structure of the respective amino acids.
04

Determine Codons and DNA Mutations Required

Finally, choose an unfavorable and a favorable mismatch pair and look up the corresponding codons that produce these amino acids. Determine the minimum number of mutations in the DNA sequences necessary to produce these mismatches. In the genetic code, a mutation changing a single base pair could result in a different amino acid – a phenomenon called a missense mutation.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Understanding Sequence Alignment
Sequence alignment is a fundamental technique used in bioinformatics to compare and align two or more biological sequences, such as DNA, RNA, or proteins. It is utilized to identify regions of similarity that may indicate functional, structural, or evolutionary relationships between the sequences. During the alignment, the individual components, be they nucleotides or amino acids, are matched against each other. This alignment allows researchers to assess the degree of sequence conservation and to predict the biological features and functions of the sequences in question.

When using BLAST (Basic Local Alignment Search Tool) for sequence alignment, as illustrated in the exercise with Pax6 and eyeless protein sequences, it's imperative to understand the visualization of the output. The 'Query' represents the sequence you provided, while 'Sbjct' is the sequence BLAST compares it to. Identical amino acids are marked by the same letter on the middle line, signifying a perfect match, which suggests strong evolutionary conservation and functional similarity. On the other hand, mismatches may be present - these are amino acids that do not align perfectly but might still have similar properties.

For the student approaching this exercise, aiming for clarity in this comparison can be greatly aided by familiarizing oneself with the amino acid chart and understanding that amino acids with similar characteristics, such as size or polarity, often substitute for each other without drastic effects on protein function. These are called 'conservative substitutions' and are noted with a '+' in the BLAST alignment. In contrast, 'non-conservative substitutions' involve amino acids with different characteristics and might affect protein function, marked by a space in the alignment.
A Closer Look at Amino Acid Mismatches
Amino acid mismatches occur in sequence alignments when corresponding positions of two aligned sequences do not have the same amino acid. These mismatches can be categorized as either 'favorable' or 'unfavorable', which speaks to the compatibility and potential functional consequence of the substitution. A favorable mismatch is marked with '+' and means that even though the amino acids are not identical, they have similar physicochemical properties such as size, charge, or hydrophobicity. They are often functionally interchangeable in proteins, meaning the protein’s function is not significantly altered despite the mismatch.

In the educational exercise provided, students are asked to identify examples of both favorable and unfavorable mismatches. This requires knowledge of amino acid characteristics. For instance, if a hydrophobic amino acid like leucine is substituted by another hydrophobic amino acid like isoleucine, it's usually a favorable mismatch. Conversely, if an acidic amino acid like aspartic acid were to replace a basic amino acid like lysine, this could be considered an unfavorable mismatch due to their opposing charges leading to potential functional effects on the protein structure.

Understanding these mismatches is crucial as it allows students to infer potential effects on protein function and stability. Thus, when analyzing a BLAST alignment, noting these mismatches gives us valuable insights into the evolutionary pressures and functional constraints that have shaped these sequences. This also ties in with understanding the molecular disease mechanisms, where such mismatches could have pathological consequences.
Genetic Mutations and Code Changes
Genetic mutations are changes in the nucleotide sequence of DNA, which can lead to variations in the amino acid sequence of proteins encoded by the DNA. These changes can occur as substitutions, insertions, or deletions of base pairs. A 'missense mutation' is one such type of substitution that results in a different amino acid being incorporated into the protein. Understanding the mutational process that results in amino acid mismatches is critical when performing a BLAST sequence alignment exercise.

In the textbook exercise, students are tasked to choose an unfavorable and a favorable mismatch pair and then identify the codons that could produce these amino acids. This step underscores the relationship between genotype (the sequence of nucleotides in DNA) and phenotype (the sequence of amino acids in proteins). Analyzing the required mutations gives insight into how certain changes at the DNA level can have minor or major effects on protein function. For instance, if an unfavorable mismatch requires several nucleotide changes, that suggests a lower likelihood of such a mutation occurring naturally, while a favorable mismatch resulting from a single nucleotide change might be more common.

Students could improve their understanding by researching the genetic code to see how specific codon changes translate to amino acid substitutions. Factoring in the redundancy of the genetic code, where multiple codons can encode the same amino acid, adds a layer of complexity, illustrating how some mutations are silent, while others can be radical. This aspect of the exercise not only solidifies their grasp of molecular genetics but also emphasizes the precision required for maintaining the proper function of proteins.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Open reading frames in \(E .\) coll In this problem, we will search the \(E\). coli genome for open reading frames. The actual genome sequence of \(E\). coli is available on the book's website. (a) Write a program that scans the DNA sequence and records the distance between start and stop codons in each of the three ORFs on the forward strand. You may skip the calculation for the reverse strand. You can find an example of this code implemented in Matlab on the book's website. (b) Plot the distribution of ORF lengths \(L\) and compare it with that expected for random DNA calculated in Problem 4.7 (c) Estimate a cut-off value \(L_{\text {cut }}\), above which the ORFs are statistically significant, that is, the number of observed ORFs with \(L>L\) cut is much greater than expected by chance. (Problem courtesy of Sharad Ramanathan.)

The molecular clock In eukaryotes, the majority of individual point mutations are thought to be "neutral" and have little or no effect on phenotype. Only a small fraction of the genome codes for proteins and critical DNA regulatory sequences. Even within coding regions, the redundancy of the genetic code is suffcient to render many mutations "synonymous" (that is, they do not change the amino acid, and hence the protein, encoded by the DNA). The slow accumulation of neutral mutations between two populations can be used as a "molecular clock" to estimate the length of time that has passed since the existence of their last common ancestor. In these estimates, it is common to make the simplifying approximations that (1) most mutations are neutral and (2) the rate of accumulation of neutral mutations is just the average point mutation rate per generation (that is, ignoring other kinds of mutations such as deletions, inversions, etc., as well as variations in and correlations among mutations). (a) With a crude estimate of the point mutation rate of humans of \(10^{-8}\) per base pair per generation, what fraction of the possible nucleotide differences would you expect there to be between chimpanzees and humans given that the fossil record and radiochemical dating indicate their lineages diverged about six million years ago? Compare your estimate with the observed result from sequencing of about \(1.5 \%\) (b) Some parasitic organisms (lice are an example) have specialized and co- evolved with humans and chimps separately. A natural hypothesis is that the most recent common ancestor of the human and chimp parasites existed at the same time as that of the human and chimp themselves. How might you test this from DNA sequence data and other information? What are likely to be the largest causes of uncertainty in the estimates? (Problem courtesy of Daniel Fisher.)

Mutations of bacteria in our gut (a) The populations of the \(E\). coli in the guts of a collection of humans can be large enough that multiple mutations can occur simultaneously in one bacterium. Suppose that a very particular combination of \(k\) point mutations is required for a pathogenic strain to emerge and that these must all arise in one cell division (as could be the case if the subsets of these mutations are deleterious). With the point mutation rate per base pair per cell division of \(\mu,\) what is the probability \(m_{k}\) that this occurs in a single cell division? The simplest assumption is that the probabilities of the different mutations are independent. (b) In a human large intestine, the density of bacteria is estimated to be about \(10^{11.5}\) per milliliter, of which a fraction of about \(10^{-4}\) are \(E\) coll. Estimate how many \(E\) coli per person this implies. In a population of \(N\) humans, with \(n\) \(E\) coli in each of their guts, in \(T\) generations of the \(E\). coli estimate the total probability \(P_{k}\) that the particular combination of \(k\) mutations occurs at least once. (c) With the population of Silicon Valley over one year, what are the chances this occurs for \(k=2 ?\) For \(k=3 ?\) Some crucial factors in your estimate are \(\mu \approx 10^{-10}-10^{-9}\) mutations per base pair per cell division and the generation time of \(\bar{E}\). colt. the standard lab result is that \(E\). coll divide every 20 minutes. A low-end estimate for the division rate of \(E\). coli in human guts is about once every few days. Why is this more realistic? Given these and other uncertainties, how big are the uncertainties in your estimates of \(P_{2}\) and \(P_{3} ?\) (Problem courtesy of Daniel Fisher.)

Mutual information by another name In the chapter, we introduced the concept of mutual information as the average decrease in the missing information associated with one variable when the value of another variable in known. In terms of probability distributions, this can be written mathematically as \\[I=\sum_{y} p(y)\left[-\sum_{x} p(x) \log _{2} p(x)+\sum_{x} p(x | y) \log _{2} p(x | y)\right]\\] where the expression in square brackets is the difference in missing information, \(S_{x}-S_{x} y,\) associated with probability of \(x, p(x),\) and with probabilify of \(x\) conditioned on \(y, p(x | y)\) Using the relation between the conditional probability \(p(x | y)\) and the joint probability \(p(x, y)\) \\[p(x | y)=\frac{p(x, y)}{p(y)}\\] show that the formula for mutual information given in Equation 21.77 can be used to derive the formula used in the chapter (Equation 21.17 ), namely \\[I=\sum_{x, y} p(x, y) \log _{2}\left[\frac{p(x, y)}{p(x) p(y)}\right]\\].

Protein mutation rates Random mutations lead to amino acid substitutions in proteins that are described by the Poisson probability distribution \(p_{s}(t) .\) Namely, the probability that \(s\) substitutions at a given amino acid position in a protein occur over an evolutionary time \(t\) is \\[p_{s}(t)=\frac{e^{-\lambda t}(\lambda t)^{s}}{s !}\\] where \(\lambda\) is the rate of amino acid substitutions per site per unit time. For example, some proteins like fibrinopeptides evolve rapidily, and \(\lambda_{F}=9\) substitutions per site per \(10^{9}\) years. Histones, on the other hand, evolve slowly, with \(\lambda_{H}=0.01\) substitutions per site per \(10^{9}\) years. (a) What is the probability that a fibrinopeptide has no mutations at a given site in 1 billion years? What is this probability for a histone? (b) We want to compute the average number of mutations \((s)\) over time \(t\) \\[ \langle s\rangle=\sum_{s=0}^{\infty} s p_{s}(t) \\] First, using the fact that probabilities must sum to 1 compute the sum \(\sigma=\sum_{s=0}^{\infty}(\lambda t)^{s} / s !\). Then, write an expression for \((s),\) making use of the identity \\[\sum_{s=0}^{\infty} s \frac{(\lambda t)^{s}}{s !}=(\lambda t) \sum_{s=1}^{\infty} \frac{(\lambda t)^{s-1}}{(s-1) !}=\lambda t \sigma\\] (c) Using your answer in (b), determine the ratio of the expected number of mutations in a fibrinopeptide to that of a histone, \((s)_{F} /(s)_{H}\) (Adapted from Problem 1.16 of \(\mathrm{K}\). Dill and S. Bromberg. Molecular Driving Forces, 2nd ed. Garland Science, 2011.)

See all solutions

Recommended explanations on Biology Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free