[PDF] The effect of unequal transversion rates on the accuracy of





Previous PDF Next PDF



Estimation of the Transition/Transversion Rate Bias and Species

transition/transversion (ti/tv) bias is known to be a gen- eral property of DNA sequence evolution Another definition of the ti/tv rate ratio is.



Parsimony

Implementing Transversion. Parsimony. • Ambiguity codes: – R means purine (A or G). – Y means pyrimidine (C or T). • Replace nucleotides with either R or Y.



The Impact of the Transversion/Transition Ratio on the Optimal

Definition 2.1. For a given weighted graph G let W be a weight of transversion connections in G and S be a subset of V . The W -conductance of S is defined 



OVIDIAN TRANSVERSIONS

She is currently working on a series of articles that explores what it means to study the early modern past 'from here' as well as a book on the early twentieth 



Determinants of Base Editing Outcomes from Target Library Analysis

12 juin 2020 Although the mean transversion frequency in the comprehensive context library in mESCs was unchanged relative to eA3A-BE4 we observed a 2.9- ...



The effect of unequal transversion rates on the accuracy of

Assumption 4 is the balanced-transversion assumption. Because it is applied to an unrooted tree the biological meaning that this assumption has in the 



Genetic distances and nucleotide substitution models

First by definition it provides a measure of the similarity between sequences. Note that some programs use the transition/transversion.



The Triangle Inequality and Character Analysis

Transition-Transversion Ratio. If a matrix is defined in which only transitions and transversions are distinguished. (fig. 1 b left)



Clinical effect and antiviral mechanism of T-705 in treating severe

mean (±SD) sequencing depth of 3090 (±840) fold genome coverage obtained. Before treatment the rates of total



Distance Methods

The distance between two sequences is defined rates of transitions and transversions ... The transition/transversion ratio R



Transversion - Wikipédia

La transversion en biologie moléculaire fait référence à une mutation ponctuelle de l'ADN dans laquelle une purine (à deux anneaux) (A ou G) est remplacée 



[PDF] Les mutations 1 Définition - Catalogue des cours en ligne UFMC1

Définition Le terme « mutation » désigne n'importe quel changement intervenu dans la séquence de l'ADN Les mutations sont des changements permanents dans 



Définition de transversion Dictionnaire français - La langue française

(Génétique) Mutation au cours de laquelle une base purique est remplacée par une base pyrimidique ou vice versa La transversion c'est la substitution d'un 



[PDF] 5- LES MUTATIONS PONCTUELLES 1- Définition - univ-ustodz

1- Définition: une mutation ponctuelle est une altération (modification) de la séquence d'ADN au niveau d'une transversion b) Les délétions/insertion 



[PDF] Rapport sur les pratiques de thérapies de conversion - OHCHR

Les « thérapies de conversion » ciblent un groupe spécifique en se fondant uniquement sur son orientation sexuelle et son identité de genre et visent 



Transitions vs transversions

Transversions are interchanges of purine for pyrimidine bases which therefore involve exchange of one-ring two-ring structures Although there are twice as 



[PDF] La transposition des gros vaisseaux - Orphanet

www orpha net/data/patho/Pub/fr/TranspositionGrosVaisseaux-FRfrPub3463v01 pdf Décembre 2010 1 Maladies Rares Info Services 01 56 53 81 36 La maladie



Transversion : définition illustrée et explications - AquaPortail

7 août 2020 · En génétique la transversion est une mutation ponctuelle correspondant au remplacement d'une base purine par une base pyrimidique ou 



Glossaire de Génétique Médicale et Moléculaire

Conversion (Anglais : gene conversion) Interaction entre séquences alléliques pendant la méiose aboutissant à un échange inégal d'information génétique

:

The Effect of Unequal Transversion

of Evolutionary Parsimony 1

W. C. Navidi and L. Beckett-Lemus

Rates on the Accuracy

Department of Mathematics, University of Southern California Evolutionary parsimony is an easy-to-use method of phylogenetic inference that is based on nucleic acid sequences and that does not require the assumption that evolutionary processes in the various sites on the molecule are identical. It does, however, require a parameter constraint, known as the "balanced transversion" assumption. We show that the accuracy of the procedure is fairly insensitive to moderate violations of this assumption-and that the procedure thus is applicable under more general conditions than previously thought. Introduction The method of evolutionary parsimony, developed by Lake ( 1987 ) , is a procedure to reconstruct a phylogenetic tree on the basis of aligned nucleic acid sequences. The method is primarily applicable to groups of four species and chooses the unrooted tree that is most compatible with the sequence data. The method does not choose a tree if the data are inadequate to distinguish among the three possible unrooted trees. One advantage of evolutionary parsimony over many other methods of phylo- genetic inference based on nucleic acid sequences is that the method does not require the assumption that the evolutionary process is identical at every site on the sequences being studied. The validity of evolutionary parsimony does, however, require (as do most other commonly used methods) the assumption that the sites evolve indepen- dently. It also requires a "balanced transversion" assumption which is not required by other methods and which will be described more fully below. The main purpose of this paper is to investigate the accuracy of evolutionary parsimony in cases where the balanced-transversion assumption is violated. We find the accuracy not to be greatly affected except in extreme cases. A Stochastic Model for Evolutionary Parsimony To measure the frequency with which a phylogenetic inference procedure gives a correct result, one must model evolution as a stochastic process. The model we use is similar to models described by Barry and Hartigan ( 1987) and Cavender and Fel- senstein ( 1987). A complete description is given in Navidi et al. ( 199 1) . We present here a somewhat condensed version of that description. There are four species, labeled I-IV. Figure 1 shows one of the three possible unrooted trees, which we will call tree

1. Tree 2 is obtained from figure 1 by interchanging species II and III,

and tree 3 is

obtained by interchanging species II and IV. I. Key words: evolutionary parsimony, phylogenetic inference, linear invariants.

Address

for correspondence and reprints: William C. Navidi, Department of Mathematics, University of Southern California, Los Angeles, California 90089.

Mol. Bid. Evol. 9(6):1163-l 175. 1992.

0 1992 by The University of Chicago. All rights reserved.

0737-4038/92/0906-0011$02.00

1163 Downloaded from https://academic.oup.com/mbe/article/9/6/1163/1073680 by guest on 14 October 2023

1164 Navidi and Beckett-Lemus

FIG.

I .-Four-species unrooted tree

Restrict attention to a single site on the molecule. We arbitrarily choose one of the internal nodes [labeled with an asterisk (* )] as the initial node, and we specify a direction for each branch, as indicated by the arrows (see fig. 1) . Let r = ( rA, rG, rc, ru), where rA, rG, rc, and ru are the probabilities of observing A, G, C, and U, re- spectively, at the initial node. For each branch i, define the substitution probability mi (A, G) to be the conditional probability of observing a G at the ending node of the branch, given that an A was present at the beginning node. Make a similar definition for each of the other 15 pairs of bases A, G, C, and U. The Markov matrix

Mi for the

ith branch is given by mi(A, A) mi(A, G) mi(A, Cl mi(A, U) mi(G,A) mi(G,G) mi(G, C) mi(G, U) 1 m,(C, A) mi(C, G) mi(C, C) mi(C U) ' (1) m;(U, A) mi(U, G) mi(U, C) mi(U, U) Denote the base at the beginning node of branch 5 by b5, and denote the base at the ending node by b6. Then the probability of observing bases b, , b2, b3, and b4 at nodes 1, 2, 3, and 4, respectively, is

We make the following assumptions:

ASSUMPTION 1. The evolutionary processes in the various sites act independently of one another. ASSUMPTION 2. All the sites evolve according to the same tree. ASSUMPTION 3. Given the base at a site at an internal node, the collections of bases formed by removing that node are distributed independently of each other. ASSUMPTION 4. The Markov matrices for the outer branches (branches l-4 in fig.

1) are of the following form: Downloaded from https://academic.oup.com/mbe/article/9/6/1163/1073680 by guest on 14 October 2023

Accuracy of Evolutionary Parsimony 1165

A G C U A

1 e f g g G h i j j . (3)

C k k 1 m

U n n P 9

Assumption 4 is the balanced-transversion assumption. Because it is applied to an unrooted tree, the biological meaning that this assumption has in the branch that contains the most recent common ancestor is different from its meaning in the other branches. In a branch not containing the most recent common ancestor, the direction of the branch from interior to exterior node follows the flow of time. In such a branch, the balanced-transversion assumption states that, if the net effect of the evolutionary process during the time interval spanned by the two nodes is to change the original base by a transversion, then either of the two possible transversions is equally likely. In contrast, in the branch containing the most recent common ancestor, the bases at the nodes represent outcomes of evolutionary processes in two distinct lineages, and the balanced-transversion assumption refers to a comparison of outcomes of these two processes. Since, in an unrooted tree, we may not know which branch contains the most recent common ancestor, we may not know precisely what biological assumption we are making in any given branch. Fortunately, it is possible to weaken assumption 4 in such a way as to retain the validity of evolutionary parsimony while removing the ambiguity in the biological meaning of the assumption. By algebraic manipulation of expression (2), Beckett-Lemus ( 199 1) showed that Lake's invariants (described be- low)-and thus the method of evolutionary parsimony-remain valid whenever the Markov matrices in any three of the outer branches are of the form of matrix (3). Since at least three of the outer branches do not contain the most recent common ancestor, assumption 4 may be replaced by assumption 4'. ASSUMPTION 4'. The Markov matrices for the outer branches not containing the most recent common ancestor are of the form of matrix (3). Either assumption 4 or assumption 4' is sufficient for the validity of evolutionary parsimony when the matrix parameters represent substitution probabilities rather than substitution rates. For an example of a parameterization using substitution rates for which assumption 4 is satisfied yet evolutionary parsimony is not valid, see the paper by Jin and Nei ( 1990). This model describes the evolutionary process at a single site. The values of r and of the matrix parameters may vary from site to site because no assumption that the sites are identically distributed is used. At each site, each species contributes an A, G, C, or U. Thus each site yields an ordered configuration of bases that falls into one of 256 categories: AAAA, UUUU. The position in the configuration depends on the species; for example, the configuration AGCU means that species I, II, III, and IV have bases A, G, C, and U, respectively, at that site.

The Method of Evolutionary Parsimony

The method of evolutionary parsimony was first described by Lake ( 1987). It

involves six subsets of the 256 configurations, as follows: Downloaded from https://academic.oup.com/mbe/article/9/6/1163/1073680 by guest on 14 October 2023

I 166 Navidi and Beckett-Lemus

Let P, be the number of sites that have one of the following configurations: AACC, AAUU, GGCC, GGUU, CCAA, CCGG, UUAA, UUGG, AGCU, AGUC,

GACU, GAUC, CUAG, CUGA, UCAG, or UCGA.

Let B1 be the number of sites that have one of the following configurations: AGCC, AGUU, GACC, GAUU, CUAA, CUGG, UCAA, UCGG, AACU, AAUC,

GGCU, GGUC, CCAG, CCGA, UUAG, or UUGA.

Let P2 be the number of sites that have one of the following configurations: ACAC, AUAU, GCGC, GUGU, CACA, CGCG, UAUA, UGUG, ACGU, AUGC,

GCAU, GUAC, CAUG, CGUA, UACG, or UGCA.

Let B2 be the number of sites that have one of the following configurations: ACGC, AUGU, GCAC, GUAU, CAUA, CGUG, UACA, UGCG, ACAU, AUAC,

GCGU, GUGC, CACG, CGCA, UAUG, or UGUA.

Let P3 be the number of sites that have one of the following configurations: ACCA, AUUA, GCCG, GUUG, CAAC, CGGC, UAAU, UGGU, ACUG, AUCG,

GCUA, GUCA, CAGU, CGAU, UAGC, or UGAC.

Let B3 be the number of sites that have one of the following configurations: ACCG, AUUG, GCCA, GUUA, CAAU, CGGU, UAAC, UGGC, ACUA, AUCA,

GCUG, GUCG, CAGC, CGAC, UAGU, or UGAU.

Lake ( 1987) showed that, if tree 1 is correct, then E [ P2-B2] = 0 and E[ P3-B3] = 0. Similarly, if tree 2 is correct, then E [ PI -B1 ] = 0 and E [ P3-B3] = 0, and, if tree

3 is correct, then E[

PI-B,] = 0 and E[ P2-B2] = 0. Determining the correct tree, therefore, is equivalent to determining the value of i for which E[

Pi-Bi ] # 0.

Lake ( 1987) suggested the following procedure: For i = 1, 2, 3, compute the statistic S; = (Pi-Bj)2/( Pj+Bi). If the number of sites is sufficiently large, then, if tree i is incorrect, Si will have an approximate x: distribution, whereas, if tree i is correct, S; will tend to be larger. Lake's procedure is to declare tree i to be correct if Si is significant at the 5% level while Sj, j # i, is not. Otherwise, no tree is preferred. This procedure is two sided, because Si will be significant if (Pi-Bi ) is greatly different from 0 in either direction. It is asymptotic, because the x2 distribution is justified by the central limit theorem. Navidi et al. ( 199 1) showed that, under the following biologically reasonable assumption, a one-sided exact binomial test may be used, which enhances the power of the procedure. ASSUMPTION 5. In each branch of the tree, the probability of no difference between the nodes at either end is greater than the probability of a transition difference. The one-sided exact procedure, referred to as "Procedure A" by Navidi et al. ( 199 1)) is as follows:

1. Choose a critical value a (e.g., 5%)) and compute

PI, P2, and P3.

2. Compute the one-sided upper-tail significance level Of Pi on the basis of the binomial

distribution with number of trials

Pi + Bi and success probability '12.

3. Declare a tree to be correct if its significance level is less than a while the significance

levels of the other two trees are greater than a. Make no decision otherwise. Throughout the rest of this paper, we will make assumption 5 and use the one- sided exact procedure with critical value

a = 5%. Downloaded from https://academic.oup.com/mbe/article/9/6/1163/1073680 by guest on 14 October 2023

Table 1

Proportion of Simulations: Transversion Ratios

PROPORTION OF SIMULATIONS

Model A Model B Model C Model D Model E

TRANSVERSION Correct Incorrect No Correct Incorrect No Correct Incorrect No Correct Incorrect No Correct Incorrect No

RATIO Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree 1:l 2:1 3:1 4:1 9:1

19:l 0.977 0 0.023 0.788 0.012 0.200 0.230 0.043 0.727 0.732 0.017 0.25 1 0.690 0.018 0.292

0.965 0 0.035 0.766 0.013 0.221 0.242 0.073 0.685 0.707 0.023 0.270 0.716 0.022 0.262

0.952 0 0.048 0.723 0.014 0.263 0.27 1 0.139 0.590 0.662 0.032 0.306 0.717 0.024 0.259

0.911 0 0.089 0.642 0.023 0.335 0.268 0.160 0.572 0.536 0.053 0.411 0.704 0.022 0.274

0.77 1 0 0.229 0.348 0.022 0.630 0.207 0.226 0.567 0.216 0.059 0.725 0.526 0.018 0.456

0.669

0 0.331 0.210 0.020 0.770 0.130 0.199 0.671 0.078 0.04 1 0.881 0.380 0.007 0.613

NoTEO.-One thousand trials were conducted at each ratio. All sequences had 1,000 nucleotide positions. The column labeled "No Tree" gives the proportion of trials in which no tree or more

than one tree was significant at the 5% level. In model A, ai = 0.9 and

b, = 0.05 for branches I, 2, 3, and 4, while a 5 = 0.8 and b, = 0.1. In model B, a, = 0.8 and bi = 0. I for all branches. In model

C, ai = 0.7 and

bi = 0. I5 for branches 1, 2, 3, and 4, while a5 = 0.8 and bs = 0. I. In model D, ai = 0.7 and bi = 0.15 for branches I and 3, n, = 0.9 and b, = 0.05 for branches 2 and 4, and al = 0.8

and

b5 = 0. I In model E, ai = 0.7 and bi = 0.15 for branches 1 and 2, ai = 0.9 and b, = 0.05 for branches 3 and 4, and as = 0.8 and b5 = 0.1. In all models, the initial probabilities r,, ro, rc, and ru

were each taken to be 0.25. Downloaded from https://academic.oup.com/mbe/article/9/6/1163/1073680 by guest on 14 October 2023

1 I68 Navidi and Beckett-Lemus

Table 2

Proportion of Simulations: Balanced and Unbalanced Transversion

Ratios in 10,000 Trials

PROPORTION OF SIMULATIONS

TRANSVERSION RATIO Correct Tree

Balanced 0.558

Unbalanced

0.55 1

Incorrect Tree

0.028 0.055

No Tree

0.415 0.394 NOTE.-All sequences had 1,00 nucleotide positions. The column labeled "No Tree" gives the proportion of trials in which no tree or more than one tree was significant at the 5% level. The true parameter values were generated randomly, subject to the constraint 0.5 I a, < I-and to the constraint c, 2 d, in the unbalanced case and ci = d, in the balanced case. In the unbalanced case, the expected values of c, and d, are 0.09375 and 0.03 125, respectively, and the expected value of c,/(ci + d,) is 0.75. In the balanced case, both c, and di have expected value 0.0625. In both cases, the expected values of a, and b, are 0.75 and 0.125, respectively, and the expected values of r*, ro, rc, and r, are each 0.25.

A Sample Calculation

We now examine

the effect that the violation of assumption 4 has on the accuracy of evolutionary parsimony. We begin with a calculation in the simple case where no constraints are placed on r and where the matrices Ml-MS are all the same and of the form (4) Assumption 4 is satisfied only when c = d. Matrix form (4) was first presented by Kimura ( 198 1)) although in Kimura's model the parameters a, 6, c, and d are allowed to vary from branch to branch, whereas, for the sake of computational sim- plicity, we are requiring them to be the same in each branch. Although our model is unrealistically simple, it provides insight into the behavior of more realistic models.

Table 3

Proportion of Simulations: Balanced and Unbalanced Transversion

Ratios in 1,000 Trials

PROPORTION OF SIMULATIONS

TRANSVER~~ON RATIO Correct Tree Incorrect Tree No Tree

Balanced 0.301 0.041 0.658

Unbalanced

0.268 0.060 0.672

NOTE.-All sequences had 1,352 nucleotide positions. The column labeled "No Tree" gives the proportion of trials in which no tree or more than one tree was significant at the 5% level. The true parameter values were generated from a maximum-likelihood fit to data, subject to the constraint c, =

d, in the balanced case. Downloaded from https://academic.oup.com/mbe/article/9/6/1163/1073680 by guest on 14 October 2023

Accuracy of Evolutionary Parsimony 1169

Assume, without loss of generality, that tree 1 is correct. Let N be the number of sites. Direct calculation using equation (2) shows that

UPI) =

E(P,+B,) =

WP2) =

E(P2+B2) =

WP3) =

E(P3+B3) =

For i = 1,2, 3, define N{2(a+b)[(a2+b2)(c2+d2)+4abcd] +(c+d)[(a2+b2)2+4(a2b2+c2d2)+(c2+d2)2]}

2N[ (uc+bd)2+( ud+bc)2] (5)

2N[(u+6)(c+d)]2

WP2)

E(P2+B2)

E(Pi) zi = E(Pi+Bi) and ,. pi I =.=Pi+Bi* (6) (7) With tree 1 correct, Zi > r/z, while 't2 = z3 = r/z if the transversions are balanced.

When transversions are unbalanced, in general zl,

t2, and r3 are all different from r/z. The reliability of evolutionary parsimony in a given instance depends roughly on the differences zi - r/z, where the differences are measured in units of standard deviations of Zi. The farther zI is from L/2, and the closer 22 and z3 are to l/2, the more reliable evolutionary parsimony will be. The exact standard deviations of the Qi are tedious to calculate, because both numerator and denominator in equation (7) are random. However, an approximation is obtained by conditioning on

Pi+Bi = E( Pi+Bi ). The

conditional standard deviation of Qi is given by

T,( 1-c) Gi =

E(P;+B;) .

For i = 1, 2, 3, define

7; -'/2 2. = - I

CSi . (8) The quantity zi is the difference between zi and '12, in units of oi . The larger the value of Zi, the more likely it is that

Pi - Bj will be significantly different from 0.

Thus, when z, is large while z2 and z3 are near 0, tree 1 is likely to be chosen as the correct tree. When all zi are large or near 0, evolutionary parsimony will often fail to

choose a tree, and, when a tree is chosen, it is more likely to be incorrect. Downloaded from https://academic.oup.com/mbe/article/9/6/1163/1073680 by guest on 14 October 2023

1170 Navidi and Beckett-Lemus

We now compute the values of zI , z2, and z3 in some specific examples. In each example we take N, the number of sites, to be 1,000. EXAMPLE 1. Consider the case a = 0.8, b = 0.1, c = d = 0.05. Since c = d, assumption

4 is satisfied, and the method of evolutionary parsimony is valid. Equations ( 5), (6),

(8), and (9) show that E(P,+&) = 80.2, r1 = 0.650, and E(P2+B2) = E(P3+B3) = 16.2, z2 = z3 = 0.5, so zI = 2.82 and z2 = z3 = 0. EXAMPLE 2. Now consider a = 0.8, b = 0.1, c = 0.06, and d = 0.04. This represents a moderate deviation from assumption 4. Here equations ( 5 ), (6 ) , ( 8 ) , and (9 ) show that E(P,+B,) = 80.2, zI = 0.652, E(P2+B2) = E(P3+B3) = 16.2, and z2 = r3 = 0.512, so zI = 2.86 and z2 = z3 = 0.10. EXAMPLE 3. Let a = 0.8, b = 0.1, c = 0.08, and d = 0.02. This represents a substantial deviation from assumption 4. Now equations ( 5 ) , ( 6 ) , ( 8 ) , and ( 9 ) show that E( Pi +& ) = 80.2, ri = 0.669, E(P2+B2) = E(P3+B3) = 16.2, r2 = z3 = 0.609, so z1 = 3.22 and z2 = z3 = 0.90. EXAMPLE 4. Finally, let a = 0.8, b = 0.1, c = 0.095, and d = 0.005. This represents an extreme deviation from assumption 4. Now equations ( 5), (6), (8), and (9) show that E(P,+Bi) = 80.2, zI = 0.694, and E(P2+B2) = E(P3+B3) = 16.2, r2 = z3 = 0.745, so z1 = 3.76 and z2 = z3 = 2.26. Comparing the results of examples 1 and 2 suggests that a moderate deviation from the balanced-transversion assumption (e.g., a 3 : 2 ratio rather than 1: 1) will not have much effect on the accuracy of evolutionary parsimony, since the values of Zi differ only slightly in the two examples. In example 3, representing a 4:l ratio, the values of the zi have increased. The frequency with which an incorrect tree is chosen may increase somewhat in these circumstances, as may the frequency with which no tree is chosen. The value of zI is still considerably larger than z2 or ~3, so the effect may not be great. In example 4, representing a 19 : 1 ratio, all the Zi are considerably larger than 0, and the gap between zI and the others has narrowed substantially. We may expect a large decrease in the accuracy of evolutionary parsimony in this case. The simulations described below provide more precise information about these ex- amples and others.

Simulations

We designed three groups of several simulation studies to assess the impact of unbalanced transversions on the accuracy of evolutionary parsimony. The first group consists of five simulations (models A-E) involving Kimura's three-parameter model, described in the previous section. The substitution probability matrices are all of form of matrix (4)) but the values of the parameters a, b, c, and d are allowed to vary from branch to branch. Denote the values in the ith branch (for branch numbering, see fig. 1) ai, bi , Ci , and di . Table 1 gives the results of five simulations. Each simulation was performed for varying ratios of the transversion probabilities ci and di . In model A, twice as many substitutions occur in the middle branch as occur in the outer branches, with the result that the evolutionary-parsimony procedure usually clearly favors the correct tree. When the transversion ratio is very large, the results are more likely to be inconclusive, but the frequency with which an incorrect tree is chosen remains negligible. In model B, substitutions occur equally often in all five branches. The data there- fore less often indicate the correct tree and occasionally indicate an incorrect tree. The accuracy of the procedure is scarcely affected by a transversion ratio of 3

: 1. When Downloaded from https://academic.oup.com/mbe/article/9/6/1163/1073680 by guest on 14 October 2023

Accuracy of Evolutionary Parsimony 117 1

the ratio is 4: 1, the frequency with which an incorrect tree is chosen increases somewhat, but the procedure is still reasonably accurate. More extreme ratios significantly increase the frequency with which the procedure is inconclusive but do not have much effect on the frequency with which an incorrect tree is chosen. In model C, substitutions occur 50% more often in the outer branches than in the middle branch. Phylogenetic inference procedures, including evolutionary parsi- mony, tend to be less reliable in these circumstances. The frequency with which an incorrect tree is chosen ( -4%), is rather high even when the transversions are balanced, especially when compared with the frequency with which the correct tree is chosen ( - 23%). A transversion ratio of 2 : 1 has some effect, and the accuracy becomes no- ticeably worse when the transversion ratio is 3 : 1. In models D and E, two of the outer branches are 50% longer than the middle branch, and two are half as long. In model D, one short branch and one long branch are on each side of the middle branch, while, in model E, both long branches are on one side, and both short branches are on the other. The accuracy of model D declines when the ratio is - 3 : 1, but it is still reasonably high at this level. In the case of model E, the procedure remains accurate even when the ratio is quite large. The results from models D and E give a good indication of the performance of evolutionary parsimony in the situation where two of the outer branches are much longer than the other two. Only in model D is the probability of choosing a wrong tree >2.5%-and then only when the transversion ratio is >2 : 1. Taken together, these simulation results indicate that, in situations where evo- lutionary parsimony is reasonably accurate, it remains so when the transversions are unbalanced, as long as the ratio is no greater than - 3 : 1. Ratios as high as 9 : 1 can noticeably reduce the accuracy of the procedure. It is interesting that, in all models, when the transversion ratio becomes very high, the frequency with which an incorrect tree is chosen declines somewhat, while the frequency with which the procedure is inconclusive increases. The second group of simulations takes into account the realistic expectation that transversion ratios will vary from site to site on the nucleic acid molecule, from branch to branch on the tree, and within rows of the substitution matrix for each branch. A very general model of evolution would allow the substitution process at each site to be described by a distinct parameter set containing five matrices plus the parameter vector r. Thus a set of four aligned sequences of length 1,000 would require 5,000 matrices and 1,000 r vectors to describe the evolutionary processes that account for the differences among them. Our next simulation is based on such a model and was carried out as follows: One thousand parameter sets were generated. Each parameter set contained five matrices of the form

A G C U

A a1 bl Cl dl

G b2 a2 d2 ~2 ,

C [ 1 ~3 d3 a3 b3

U d4 c4 b4 a4

where, in each row, ai was generated randomly on the interval (0.5,

I ), bi was generated

randomly on the interval (0, 1 -ai ), Ci was generated randomly on the interval [( l-ai-bi)/2, l-ai-bi], and

di = l-ai-bi-ci. The condition ai > .5 was chosen Downloaded from https://academic.oup.com/mbe/article/9/6/1163/1073680 by guest on 14 October 2023

1172 Navidi and Beckett-Lemus

to make the parameter values biologically more realistic. The condition on ci ensures that 0.5 G Ci/( Ci+di) G 1 and that Ci >, di. In addition to the five matrices, each parameter set contained four identically distributed random values r,.+ , rG , rc, and ru, which were generated by choosing three independent uniform random values in [0, l] and setting ,-A, rG, rc, and ru equal to the lengths of the four line segments thus produced. Each parameter set was used to generate a random configuration at a single site. Thus a total of 1,000 random configurations were generated, representing four aligned sequences of length 1,000. The method of evolutionary parsimony was applied to the 1,000 configurations to choose a tree. This process was repeated 10,000 times, with the same 1,000 parameter sets used each time. This simulation involved a very general model. The only assumption used, except for the biologically realistic con- straints on the parameter values, was that the evolutionary processes at the various sites act independently of each other. For purposes of comparison, the simulation was repeated under the same con- ditions except that the transversions were required to be balanced; that is, Ci = di. This was accomplished by generating a, and bi as above and setting Ci = di = ( 1 -Ui-bi )/

2. Under these conditions the assumptions of evolutionary parsimony are satisfied,

so comparing the two results gives an indication of the effect of unbalanced trans- versions. Table 2 shows the comparison. Transversion imbalance slightly increased the frequency with which the correct tree was chosen, and it increased the frequency with which the incorrect tree was chosen, from -3% to 5.5%. Under these general con- ditions, evolutionary parsimony is not very powerful, since it fails to choose a tree -40% of the time. The procedure is somewhat less reliable in the presence of unbal- anced transversions, but the decrease is not great. We repeated this simulation three more times, using different randomly generated parameter sets each time. In each case, the results came out approximately the same as those for the simulation reported in table 2, which shows that those results are not particular to the parameters used in that simulation. We also did a version of this simulation in which the value ci /( ci+di ) varied between 0 and 1. This was accomplished by generating ci on the interval (0,l -ai-bi ). In this case there was almost no difference between the balanced and unbalanced cases. A possible explanation is that the average value of the Ci was approximately equal toquotesdbs_dbs41.pdfusesText_41
[PDF] lancelot ou le chevalier de la charrette

[PDF] chapitre 2 les mutations des sociétés depuis 1850

[PDF] séquence 6ème récits de création

[PDF] mon amie se mutile comment l'aider

[PDF] comment aider quelqu'un qui se mutile

[PDF] comment arreter quelqu'un de se mutiler

[PDF] texte de science fiction court

[PDF] comment arreter de se mutiler

[PDF] vocabulaire science fiction cycle 3

[PDF] récit de science fiction cm1

[PDF] automutilation

[PDF] scarification chez adulte

[PDF] garantie hospitalisation seule

[PDF] séquence science fiction cycle 3

[PDF] garantie hospitaliere harmonie mutuelle