Gene 586 (2016) 105–114
Contents lists available at ScienceDirect
Gene
journal homepage: www.elsevier.com/locate/gene
Research paper
Codon usage trend in mitochondrial CYB gene
Arif Uddin, Supriyo Chakraborty ⁎
Departments of Biotechnology, Assam University, Silchar 788011, Assam, India
a r t i c l e i n f o a b s t r a c t
Article history: Here we reported the pattern of codon usage and the factors which influenced the codon usage pattern in
Received 4 August 2015 mitochondrial cytochrome B (MT-CYB) gene among pisces, aves and mammals. The F1 axis of correspondence
Received in revised form 11 March 2016 analysis showed highly significant positive correlation with nucleobases A3, C and C3 and significant negative
Accepted 2 April 2016
correlation with T and T3 while F2 of correspondence analysis showed significant positive correlation with C
Available online 6 April 2016
and C3 and significant negative correlation with A and A3. From the neutrality plot, it was evident that the
Keywords:
GC12 was influenced by mutation pressure and natural selection with a ratio of 0.10/0.90 = 0.11 in pisces,
Codon usage 0.024/0.976 = 0.0245 in aves and in mammals 0.215/0.785 = 0.273, which indicated that the role of natural se-
MT-CYB gene lection was more than mutation pressure on structuring the bases at the first and second codon positions. Natural
Mutation pressure selection played the major role; but compositional constraint and mutation pressure also played a significant role
Natural selection in codon usage pattern. Analysis of codon usage pattern has contributed to the better understanding of the mech-
Pisces anism of distribution of codons and the evolution of MT-CYB gene.
Aves © 2016 Elsevier B.V. All rights reserved.
Mammals
1. Introduction nonsynonymous substitution is driven by selection because it alters
the amino acids and thus affects protein's biochemical nature
Synonymous codons encoding a particular amino acid are not used (Plotkin and Kudla, 2011).
with equal frequency regardless of the degeneracy of the genetic code In fast-growing organisms with huge population size, the codon
due to a phenomenon known as codon bias (Ikemura, 1981). It is an usage pattern is mainly driven by selection (Green et al., 2003;
evolutionary relic. The frequencies of codon usage are found to be Ikemura, 1982, 1985; Sharp and Li, 1987). However, the effect of natural
species specific and also specific across genomes and within the same selection in codon usage in the mammalian genome is considered to be
genome. Its role is significant to understand the evolution of genome low (Duret, 2002; Sharp et al., 1995). This is due to small population size
(Jenkins and Holmes, 2003). Codon usage pattern is affected by various in many mammalian species, and the codon usage pattern is due to the
factors such as compositional bias (GC% and GC skew), mutation pres- effect of genetic drift (Keightley et al., 2005; Sharp et al., 1995). But with
sure, natural selection, gene length, expression level, replication, RNA the exception, in non mammalian species highly expressed genes with
stability, hydrophobicity and hydrophilicity of the protein (Akashi, high codon usage bias are under selection pressure to diminish the
1997; Moriyama and Powell, 1998; Powell and Moriyama, 1997; error in expression level (Hershberg and Petrov, 2008). Essentially, the
Powell et al., 2003). Among these, the compositional constraints in the efficiency of gene expression is due to the redundancy of genetic code
presence of mutation pressure and natural selection are the major fac- tuned by selective forces (Gingold and Pilpel, 2011). Moreover, codon
tors which vary across species (Sharp et al., 1986, 1993). The modifica- usage declines the proofreading expenses by reducing the time and en-
tions of biochemical mechanism i.e. more frequent changes of certain ergy required to discard the non-cognate tRNAs (Bulmer, 1991). Use of
bases than others cause mutational biases (Francino and Ochman, unpreferred codons would increase proofreading expenses and would
2001; Green et al., 2003). Mutation pressure is mainly responsible result in a net decline in the protein levels.
for codon usage pattern in some prokaryotes and in many mammals The association between the codon bias and the level of gene expres-
with high AT or GC contents (Sharp et al., 1993; Zhao et al., 2007). sion has been experimentally established in Esherichia coli (Andersson
However, in Drosophila and in some plants, the codon usage pattern and Kurland, 1990). Moreover, the in-vitro expression proficiency has
is mainly governed by translational selection (Liu et al., 2004). The been shown to be significantly increased by using the preferred codons
of the host cell in heterologous genes in cultured eukaryotic cells (Kim
et al., 1997).
Abbreviations: MT-CYB, mitochondrial cytochrome b gene; RSCU, relative The mitochondrial genes are a subset of the frequently expressed
synonymous codon usage; CAI, codon adaptation index; ENC, effective number of codon.
⁎ Corresponding author.
genes in eukaryotes, and its genome is usually ideal as the molecular
E-mail addresses:
[email protected] (A. Uddin), marker for species identification, systematic phylogeny, and evolutionary
[email protected] (S. Chakraborty). studies. The choice of mitochondrial DNA in all these studies is primarily
http://dx.doi.org/10.1016/j.gene.2016.04.005
0378-1119/© 2016 Elsevier B.V. All rights reserved.
106 A. Uddin, S. Chakraborty / Gene 586 (2016) 105–114
due to its small size, easy amplification and conserved gene content, lack the National Center for Biotechnology Information (NCBI) GenBank da-
of recombination, maternal inheritance pattern and high evolutionary tabase (http://www.ncbi.nlm.nih.gov/Genbank/). The study was carried
rate (Harrison, 1989). The metazoan mitochondrial DNA is circular, out on 45 species. The sequences having correct start and stop codons
16 kb in size, covalently closed and consists of 37 genes. The majority of with an exact multiple of three bases were used in this analysis. The de-
the mitochondrial proteins are encoded by nuclear genes while only 2 tails of the accession numbers of MT-CYB gene from 45 species of pisces,
rRNA, 22 tRNAs and 13 proteins involved in the respiratory chain are aves and mammals were shown in S1.
encoded by mitochondrial genomes (Wolstenholme, 1992). Further the
genetic code of mitochondria differs from that of standard genetic code. 2.2. Compositional properties
The standard genetic code consists of 64 codons, wherein 61 sense codons
encode 20 standard amino acids and three codons namely TAA, TAG and The compositional properties of MT-CYB gene such as overall nucle-
TGA act as termination signal. In mitochondrial genetic code there are otide composition (A%, C%, T% and G%), nucleotide composition at the
four termination codons such as TAA, TAG, AGA and AGG. Patterns of third position of each codon (A3%, C3%, T3% and G3%), overall GC
codon usage in nuclear genomes are extensively studied whereas studies content and GC content at the 1st, 2nd and 3rd codon positions, AT,
on mitochondrial genomes or genes are very scanty. GC, purine, pyrimidine, amino and keto skew were analyzed for pisces,
The oxidative phosphorylation is one of the most important bio- aves and mammals using a perl script developed by SC (corresponding
chemical processes operating in the mitochondria in which the aerobic author).
eukaryotic cell uses oxygen to synthesize ATP. The 13 protein-coding
genes of mitochondria are universal and encode for the protein subunits 2.3. Measures of synonymous codon usage bias
of the different complexes of oxidative phosphorylation. The oxidative
phosphorylation is the multienzymatic system which creates the proton Some of the most important and widely used indices of the codon
gradient required for ATP synthesis (or heat generation). The seven usage bias that were analyzed in this study are discussed below.
mitochondrially encoded proteins (Nd1, Nd2, Nd3, Nd4, Nd4l, Nd5,
Nd6) form the complex I in which Nd1 and Nd2 play an essential struc- 2.3.1. Relative synonymous codon usage (RSCU)
tural role between the membrane-embedded and peripheral arms of Relative synonymous codon usage (RSCU) is the observed frequency
the complex whereas the role of Nd2, Nd4, and Nd5 is to transfer elec- of a codon to the expected frequency if all synonymous codons of a par-
tron (da Fonseca et al., 2008). The nuclear encoded proteins form the ticular amino acid are used evenly. RSCU value N1.0 indicates that the
complex II while complex III is formed by CYB gene, the only mitochon- corresponding codons are used more frequently than the expected fre-
drial protein-coding gene. quency whereas the RSCU values b1.0 indicate that the particular co-
Mitochondrial cytochrome B (MT-CYB) gene contains more conser- dons were used less frequently. Besides, the RSCU value N 1.6 was
vative as well as rapidly evolving codon position and variable region and treated as over represented codon while RSCU value b0.6 was treated
so this gene is also used in systematics (Meyer and Wilson, 1990; Moritz as under-represented codon (Behura and Severson, 2012; Sharp and
et al., 1992). The catalytic core of the complex III of electron transport Li, 1986a).
chain is formed by CYB protein along with cytochrome c1 and it helps
in the assembly and function of the complex. Moreover, the pattern of
codon usage in MT-CYB gene is yet to be reported despite being a vital Xij
protein of complex III, coded by mitochondrial DNA. In addition, many RSCUij ¼
1X ni
reports have indicated the phylogenetic usefulness of MT-CYB gene in Xij
ni j¼1
different vertebrates (Degli Esposti et al., 1993; Irwin et al., 1991).
The mitochondrial respiratory chain plays a central role in satisfying
the energy demand of an organism. It will be interesting to analyze the
where, Xij is the frequency of occurrence of the jth codon for the ith
pattern of codon usage in MT-CYB gene among pisces, aves and mam-
amino acid and ni is the number of codons for the ith amino acid (ith
mals residing at different habitats with diverse energy requirements.
codon family).
The pisces, aves and mammals are the three classes of chordates,
which live in three entirely different environments namely aquatic, ae-
2.3.2. Effective number of codons (ENC)
rial and terrestrial. The mode of respiration and the demand of energy in
The effective number of codons (ENC) is the commonly used param-
these chordates are also different (Ellington, 2001). Therefore, under-
eter to measure the usage bias of synonymous codons (Wright, 1990).
standing the patterns of synonymous codon usage in three chordates
The ENC value ranges from 20 (when only one codon is used for each
would improve our understanding of the mechanisms underlying the
amino acid) to 61 (when all codons are used randomly). A higher ENC
distribution of codons and their differential usage in MT-CYB gene and
value means low codon usage bias and vice-versa. ENC values b35 are
would elucidate the factors affecting the codon usage pattern.
generally considered as the significant codon usage bias. The ENC is
Analysis of codon usage is a useful technique to understand the ge-
measured as
netic and evolutionary relationship of different species belonging to di-
verse habitats. Moreover, mitochondrial genes are very significant and
suitable tools for such studies. In the current study, we investigated
the codon usage pattern in MT-CYB gene among 15 species each of pi- ENC ¼ 2 þ S þ 29=S2 þ 1−S2
sces, aves and mammals thriving in different habitats to understand
the pattern of codon usage. Moreover, this study would give insight in
to the factors influencing the codon usage pattern among the species where s represents the given (G + C) 3% value (Wright, 1990)
under study.
2.3.3. Codon adaptation index (CAI)
2. Materials and methods The codon adaptation index (CAI) is a very extensively used param-
eter to measure the codon usage bias and the gene expression level. Its
2.1. Sequence data value ranges from 0 to 1; with high value indicating a higher proportion
of the most abundant codons coupled with high expression level and
The coding sequences of MT-CYB gene for 15 different species each vice-versa. CAI is a measure of the relative adaptedness of the codon
of pisces, aves and mammals (in FASTA format) were retrieved from usage of a gene to the codon usage of the highly expressed genes
A. Uddin, S. Chakraborty / Gene 586 (2016) 105–114 107
(Sharp and Li, 1987). The relative adaptiveness (ω) of each codon is the each base at 3rd codon position. All the statistical analyses were com-
codon usage of each codon, to that of the most abundant codon within pleted using the SPSS software.
the same synonymous family. The CAI is calculated as Ethics statement: Not applicable. The study is based on DNA se-
quences retrieved from the publicly available database of NCBI, USA.
! The study involves only soft computing and no wet lab experiment.
1X L
CAI ¼ exp lnωk
L k¼1 3. Result and discussion
3.1. Nucleotide composition analysis in MT-CYB gene of different species of
where ωk is the relative adaptiveness of the kth codon and L is the pisces, aves and mammals:
number of synonymous codons in the gene.
Previous studies reported that nucleotide composition may affect
2.4. Hierarchial clustering the codon usage pattern of a gene (Knight et al., 2001). In the present
study, we therefore analyzed the compositional properties of CYB
The RSCU values of codons from different species of pisces, aves and gene in different species of pisces, aves and mammals. From Table 2, it
mammals were clustered by hierarchial clustering method using the was suggested that the distribution of A, T, G, and C% among the codons
XLSTAT software. was unequal in different species of pisces, aves and mammals with more
preference of T/C (mean ± SD, 29.74 ± 4.37 and 29.12 ± 4.46 respec-
2.5. Principal component analysis and clustering tively) ending codons in pisces, C/A (mean ± SD, 34.43 ± 1.50 and
27.74 ± 1.33 respectively) ending codons in aves whereas A/C
Principal component analysis was used to explore the major trend in (mean ± SD, 29.93 ± 1.05 and 29.9 ± 2.58 respectively) ending codons
codon usage in MT-CYB gene among different species of pisces, aves and in mammals (S2). The nucleobase G ending codons were less in all the
mammals using the RSCU value. It was performed by using the SPSS three classes. The overall GC content was the lower than AT contents
software. in pisces, aves and mammals i.e. the MT-CYB gene is AT rich as shown
in S3. On the other hand, analysis of nucleotide composition at the 3rd
2.6. Neutrality plot position of codons gives more lucid depiction about the choice of
nucleobase in different species of pisces, aves and mammals. Surprising-
Synonymous mutation generally occurs in the codon's 3rd position ly, in pisces and aves, the choice of nucleobase C/A (mean ± SD,
while nonsynonymous mutations mostly occur in the codon's 1st and 36.60 ± 10.33 and 30.96 ± 5.88, 47.08 ± 4.09 and 37.04 ± 3.72) was
2nd positions. The nonsynonymous mutation alters the gene function the highest while in mammals A/C base (mean ± SD, 40.28 ± 2.55
or gene activity which results from the altered amino acid sequence. and 38.9 ± 5.21) was more preferred than others. The nucleobase G
The mutation is expected to be homogeneous if there is no external was the lowest in all three classes. From the S3, it was evident that the
pressure on DNA and there should be no preference for base composi- greatest difference of GC contents was found between the 1st and 2nd
tion in three codon positions. However due to presence of selection codon positions. These three classes viz pisces, aves and mammals
pressure, the base preferences in three codon position are different showed variation in the GC content which suggests that the biological
(Sueoka, 1988). When neutrality plot is drawn between GC12 versus function might also vary. Mirsafian et al. reported that nucleotide com-
GC3, the points in diagonal distribution suggest no significant difference position of albumin superfamily also exhibits low GC content
in the three codon position, and that selection pressure is weak. There- (b44.63%). They also reported that two albumin gene families ALB and
fore, neutrality plot is used to resolve the neutral degree when selection AFP show similar nucleotide composition suggesting that they share
pressure plays a major role in evolution (Sueoka, 1988, 1999). structural and biological functional similarities. Though AFM and
VDBP are in the same albumin superfamily, but they show discrepancy
2.7. Correspondence analysis (COA) in their nucleotide composition indicating that their biological functions
differ in comparison to the other members of albumin superfamily
Correspondence analysis is a statistical method used to study the (Mirsafian et al., 2014). Nucleotide composition has a close relationship
major trends in codon usage variation in coding sequences and distrib- with gene function (Garcia et al., 2011). These results indicate that com-
utes the codons in axes with these trends (Greenacre, 1984; Shields and positional constraint might affect the codon usage in MT-CYB gene. The
Sharp, 1987). AT ending codons were also found to be more in Plasmodium falciparum
(P. falciparum) (Peixoto et al., 2004; Zhao et al., 2000). Butt et al. report-
2.8. PR2-bias plot analysis ed that GC ending codons are more preferred than AT ending codons in
the codon usage pattern of CHIKV genomes (Butt et al., 2014).
The parity rule 2 (PR2) plot analysis was performed to explore
the role of mutation and selection pressure on codon usage pattern 3.2. Relation between codon usage bias and expression level
of genes. The parity rule 2 (PR2) plot is presented with AT-bias [A /
(A + T)] as the ordinate and GC-bias [G / (G + C)] as the abscissa The effective number of codons (ENC) was used to quantify the
(Sueoka, 1995). In this plot, the centre refers to A = T and G = C degree of the codon usage bias in MT-CYB gene. The ENC values in dif-
(PR2) i.e. both coordinates are 0.5, where reveals no bias between ferent species of pisces, aves and mammals for CYB gene were
the two complementary strands of DNA for mutation and selection (mean ± SD) 58.33 ± 2.94, 59.66 ± 0.61, and 58.33 ± 2.12 respectively
rates (substitution rates). (Table S3). The elevated ENC value indicates that the codon usage bias is
low and that codon bias is maintained almost at the same level for MT-
2.9. Software used CYB gene among the species in our current study. The values of ENC in
MT-ATP8 in mammals range from 42 to 60, so the ENC value of MT-
The above mentioned codon usage indices were estimated in a PERL CYB gene was relatively higher than that of MT-ATP8 in mammals
program developed by SC (corresponding author) to measure the CUB (Uddin and Chakraborty, 2014). The ENC values ranged from 51 to 60
on the selected coding sequences of MT-CYB gene in different species in Bombyx mori with an average of 29.47 (Wei et al., 2014). Liu Xudong
of pisces, aves and mammals. Correlation analysis was performed to et al. reported that the codon usage bias was low in H9N2 virus (Liu et
identify the relationship between overall nucleotide composition and al., 2010). The low codon usage bias might be helpful for efficient
108 A. Uddin, S. Chakraborty / Gene 586 (2016) 105–114
replication in vertebrates with different cell types having different prefer- codons, the less frequently used codons and the under-represented co-
ences of codon (Jenkins and Holmes, 2003). dons are clearly evident as shown in Fig. 2. Based on RSCU and nucleo-
The CAI is a widely used measure for predicting the gene expression tide composition analysis, we deduced that the existence of preferred
level, and its higher value reveals an elevated gene expression level and codons in coding sequences of CYB gene has been mostly influenced
vice-versa. CAI measures the gene expression level with respect to a ref- by compositional constraints supporting the result of Butt et al. (2014).
erence set of genes (Sharp and Li, 1987). The mean ± SD of CAI in pisces,
aves and mammals were 0.8463 ± 0.040, 0.7737 ± 0.040, and 3.5. Trends of codon usage variation in MT-CYB gene among pisces, aves
0.7687 ± 0.02 respectively which suggest that the expression level of and mammals
CYB gene in all three classes was high (Table 3). The expression level
of CYB gene in pisces was much higher than aves and mammals but Principal component analysis (PCA) for MT-CYB gene was carried
the expression level was almost similar in both aves and mammals. So out in this study. PCA detected one major trend in axis 1 and the other
pisces which thrive in aquatic habitat and use gill as respiratory organ major trend in axis 2 accounting for the total variation (Fig. 3). From
possibly require more CYB protein than aves and mammals while aves this figure, it is clear, that the pattern of points in the plots was different
and mammals thriving in two different habitats but use lungs as respi- in three different classes. These results reveal that the pattern of codon
ratory organ need almost a similar CYB gene product. Jia et al. also re- usage was different among different classes, further suggesting that the
ported high expression level in genes of B. mori similar to our findings codon usage was genetically quite distinct among pisces, aves and
(Jia and Higgs, 2008). Sharp et al. reported that genes which are highly mammals for MT-CYB gene (Chen et al., 2013; Hair et al., 2006).
expressed are likely to use optimal codons for rapid translational effi-
ciency and accuracy (Sharp et al., 1993). 3.6. Effect of mutation pressure in shaping the codon usage pattern in
Correlation was performed between ENC and CAI to estimate the MT-CYB gene
relationship between the nucleotide composition and the codon
selection in MT-CYB gene (Vicario et al., 2007). Positive correlation Two major evolutionary forces namely, mutation pressure and natu-
was found between ENC and CAI in pisces (r = 0.307, p N 0.05) ral selection are the two evolutionary forces which influence the codon
and aves (r = 0.196, p N 0.05), while negative correlation was usage pattern of genomes (Zhou et al., 2005). Therefore we performed
found in mammals (r = −0.188, p N 0.05) which suggests that nucleotide correlation analysis between the overall nucleotide composition and
composition bears a weak relationship with codon selection. Karlin and the nucleotide composition at the 3rd codon position to explore whether
Mrezek reported that the codon usage bias and expression level are not the evolutionary process in CYB gene was driven by mutation pressure
correlated to each other. Several studies indicate that the codon bias in alone or by both mutation pressure and natural selection. In pisces, a
MT-CYB gene is influenced by natural selection at the translation level highly significant positive correlation was found between A and A3%
in Homo sapiens, Caenorhabditis elegans, and Drosophila melanogaster but (r = 0.985⁎⁎, p b 0.01), T and T3% (r = .991⁎⁎, p b 0.01), C and C3%
not attributed to translational selection (Duret and Mouchiroud, 1999; (r = .994⁎⁎, p b 0.01), and GC and GC3% (r = .982⁎⁎, p b 0.01) and similar
Karlin and Mrázek, 1996; Stenico et al., 1994). in aves, between A and A3% (r = 0.948⁎⁎, p b 0.05), T and T3% (r =
0.961⁎⁎, p b 0.01), C and C3% (r = .977⁎⁎, p b 0.01), and GC and GC3%
3.3. Codon usage among MT-CYB gene in pisces, aves and mammals (r = 0.966⁎⁎, p b 0.05). In mammals, significant correlation was observed
between A and A3% (r = .948⁎⁎, p b 0.05), T and T3% (r = 0.961⁎⁎,
We analyzed correlation between codon usage and GC3 to understand p b 0.05), C and C3% (r = 0.977⁎⁎, p b 0.01), and GC and GC3% (r =
the general codon usage (codon usage of each codon in S4, S5, and S6 .966⁎⁎, p b 0.01) but negative correlation for most of the other nucleotide
Supplementary material) difference and GC bias. From Fig. 1(a) and (b), comparisons. These results indicate that the compositional constraint
it was found that the majority of the AT ending codons were negative arising from mutation pressure determines the pattern of codon usage
except in aves and that most of the GC ending codons were positive in in MT-CYB gene.
pisces, aves and mammals. These indicate that GC ending codons would Furthermore, in pisces, positive correlation was observed between
have increasing usage with increasing GC3 content and similarly, AT ENC and GC% (0.799, p b 0.01), ENC with GC2 (0.602, p b 0.05) and
ending codons would show decreasing usage with increasing the ENC with GC3 (0.814**, p b 0.01). In aves, no significant correlation
GC3 bias. Thus, the analysis of codon usage pattern has tremendous was found among them. Positive correlation was observed between
importance in understanding the molecular organization of the gene ENC and GC% (0.939**, p b 0.01), ENC with GC1 (0.909**, p b 0.01) and
and provides significant insights into the molecular biology of CYB ENC and GC3 (0.933**, p b 0.01). A previous study in the mitochondrial
gene (Hassan et al., 2010). genome of ribbon worms, also found negative correlation between
GC12 and GC3 (Chen et al., 2014). Zhicheng et al. also reported signifi-
3.4. Relative synonymous codon usage analysis of MT-CYB gene in pisces, cant correlation between nucleotide composition and its 3rd position
aves and mammals of codons in TTSuV1 virus (Zhang et al., 2013). These findings jointly
again support our hypothesis that compositional constraint under
We performed relative synonymous codon usage analysis of 60 mutation pressure significantly contributes to the codon usage pattern
codons for MT-CYB gene in pisces, aves and mammals. The overall in MT-CYB gene. Liu and Chen reported that mutation pressure is the
RSCU values for the 60 codons in MT-CYB gene indicated that A and C main determining factor in codon usage pattern of H9N2 subtype
occurred most frequently than T and G at the third codon position in pi- virus based on ENC–GC3 plot and correlation analysis (Liu et al.,
sces, aves and mammals supporting the result of Dass et al. and Zhang et 2010). Chen and Chen (2014) reported that compositional constraint
al. (Dass and Sudandiradoss, 2012; Zhang et al., 2013). It was observed and/or mutational pressure and natural selection are major factors
that 32, 27 and 29 codons in pisces, aves and mammals respectively influencing the codon usage bias patterns of duck hepatitis A virus
were more frequently used among 60 codons (mean RSCU values in (HAV) (Chen Youhua and Chen (2014)).
each pisces, aves and mammals). The RSCU values of 60 sense codons
again support the conclusion that MT-CYB gene has a weak codon 3.7. Natural selection influences the codon usage bias as a major role in
usage bias. Furthermore, we separated the RSCU data into four groups: MT-CYB gene
(a) RSCU value N1.6: over-represented codons, (b) RSCU value N1:
more frequently used codons, (c) RSCU value b 1: less frequently used To determine the extent of mutation pressure against natural selec-
codons, and (d) RSCU valueb0.6: under-represented codons. From the tion in the codon usage pattern in mitochondrial MT-CYB gene, the neu-
heat map, the over-represented codons, the more frequently used trality plot was drawn. Neutrality plot is the regression of GC12
A. Uddin, S. Chakraborty / Gene 586 (2016) 105–114 109
Fig. 1. (a). Heat maps of correlation coefficient between codon usage and GC3 in different species of pisces, aves and mammal for AT ending codons. The type and degree of correlation is
indicated by different colors. The green color indicates negative correlation, red color indicates positive correlation and black indicates stop and no correlation. (b). Heat maps of correlation
coefficient between codon usage and GC3 in different species of pisces, aves and mammals for GC ending codons.
110 A. Uddin, S. Chakraborty / Gene 586 (2016) 105–114
Fig. 2. Hierarchial clustering of heat map for MT-CYB in pisces, aves and mammals. Each rectangular box on the map represents the RSCU value of a codon (shown in rows) corresponding
to different species (shown in columns). The color and the degree of intensity represent the RSCU value. indicates RSCU value b0.06, indicates RSCU vale b1, indicates RSCU value N1
and RSCU value N1.6.
A. Uddin, S. Chakraborty / Gene 586 (2016) 105–114 111
(average of GC1 and GC2) on GC3. The regression coefficient of GC12 on codon bias and AT skew while significant negative correlation was
GC3 for MT-CYB gene is 0.10 which reveals that the relative neutrality is found among GC, pyrimidine, keto and amino skew as shown in
10% and relative constraint is 90% for GC3. The GC12 was influenced by Table 1. These results suggest that in all the three classes skewness
mutation pressure and natural selection with a ratio of 0.10/0.90 = 0.11 might affect the codon bias.
in pisces. In aves, the regression coefficient of GC12 on GC3 is 0.024 in-
dicating the relative neutrality of 2.4% and relative constraint of 97.6% 3.9. Correlation of codon usage bias with GRAVY and length of protein
for GC3. The GC12 in aves was affected by mutation pressure and natu-
ral selection with a ratio of 0.024/0.976 = 0.0245. In mammals, the re- In order to explore the association of the codon usage bias with hy-
gression coefficient of GC12 on GC3 for MT-CYB gene is 0.215 which drophobicity (GRAVY), aromaticity and length of protein in MT-CYB
suggests that the relative neutrality is 21.5% and relative constraint is gene, Pearson's correlation analysis was carried out. From Table 2, it is
78.5% for GC3. The GC12 was influenced by mutation pressure and nat- evident that aromaticity of protein showed significant negative correla-
ural selection with a ratio of 0.181/0.785 = 0.273 (Sueoka, 1988) in tion with ENC which suggests that aromaticity values of protein are
mammals. These results suggest that natural selection played a major associated with codon usage bias in mitochondrial MT-CYB gene i.e. a
role while mutation pressure played a minor role in shaping the decrease in aromatic amino acids in the protein increases the codon
codon usage pattern of MT-CYB gene. Further as shown in Fig. 4(a), usage bias of the gene. Jia et al. reported that aromaticity of protein
(b), (c) some points are in diagonal distribution but others are not in di- had a significant positive correlation with the codon usage bias in B.
agonal distribution suggesting that GC12 versus GC3 is due to both mu- mori (Jia et al., 2015).
tational bias and natural selection (Hebert et al., 2003). Chen (2013)
also reported that natural selection was more important than mutation 3.10. Correspondence analysis in MT-CYB gene
pressure on structuring the first and the second codon positions in
codon usage pattern of DNC and RNA viral genome (Chen, 2013). Correspondence analysis (COA) is the most widely used multivariate
Chen (2014) reported that four major subtypes of influenza A virus analysis (Mardia et al., 1979). It is used to study the major trends in
(IAV) showed distinct clustering patterns suggesting that different sub- sequence variation and distribute the coding sequences along with
types of IAV possess different preferential codons. The subtype cluster- these trends. Each coding sequence was represented as a 60-
ing pattern indicated that natural selection is important, which could be dimensional vector, each dimension corresponding to the RSCU
further evidenced by GC12 versus G3 plot (Chen, 2014). value of each codon. In mitochondrial MT-CYB the 1st axis contributed
34.53% of the total variation, and the 2nd axis contributed 12.79% of the
3.8. Relation between skewness and codon usage bias total variation which leads to 1st axis as the major contributor in synony-
mous codon usage pattern as shown in Fig. 5. The positions of most of the
In our study we found that the AT skew in most of the species of codons are closer to the axes which indicate that compositional constraint
pisces, aves and mammals was positive while GC skew was negative under mutation might correlate to the codon usage pattern in MT-CYB
in some species (S7) which suggests that in most species A was abun- gene. However, several codons are in scattered distribution indicating
dant over T but in others C was abundant over G. These suggest that that other factors such as natural selection might also affect the codon
asymmetrical compositional pattern between the two strands of DNA usage pattern.
(Baisnée et al., 2002). Base composition is connected to transcription Different factors that enhance the pattern of synonymous codon
process, which is exposed from skewness (Beletskii and Bhagwat, usage bias include gene function, compositional constraint, natural se-
2001). We performed correlation analysis among ENC, CAI and skew- lection, translational selection and mutation pressure (Sharp and Li,
ness. In pisces, except amino skew, the codon bias showed highly sig- 1986b; Wei et al., 2014). Neuraminidase (NA) and hemagglutinin
nificant positive correlation with AT skew and purine skew, while (HA) genes have different codon usage bias as revealed from correspon-
negative correlation among GC, pyrimidine and keto skew. In aves, sig- dence analysis (Liu et al., 2010). Earlier report suggested that selective
nificant positive correlation was found among purine skew and amino pressure was not random in mtDNA (Blier et al., 2001). Correlation anal-
skew. In mammals, significant positive correlation was found between ysis was done between F1 (axis 1), F2 (axis 2) and A, T, G, C, GC, A3, T3,
Fig. 3. Principal component analysis for MT-CYB gene.
112 A. Uddin, S. Chakraborty / Gene 586 (2016) 105–114
Table 2
Correlation analysis among GRAVY, ARO, ENC, GC3, GC and the first two principle axes of
correspondence analysis.
F1 F2 GC GC3 ENC
CAI r .025 .243 .108 .104 .053
p .872 .108 .480 .495 .728
ARO r .306⁎ .257 −.377⁎ −.347⁎ −.455⁎⁎
p .041 .088 .011 .020 .002
Gravy r .424⁎⁎ .017 −.090 −.171 −.188
p .004 .914 .557 .260 .217
Laa r −.094 −.437⁎⁎ .096 .036 .084
p .538 .003 .531 .813 .582
**
p b 0.01.
*
p b 0.05.
significant correlation with compositional properties. These results in-
dicate that compositional constraint influenced the codon usage pattern
in mitochondrial MT-CYB gene supporting the finding of Butt et al.
(2014).
Moreover, correlation analysis was done among GRAVY (hydro-
pathicity of protein), ARO, CAI, Laa, F1, F2, ENC, GC and GC3. ARO showed
significant positive correlation with F1 whereas significant negative cor-
relation with GC, GC3 and ENC. GRAVY showed significant positive corre-
lation with F1 axis of COA. Length of amino acids revealed significant
negative correlation with F2 axis of COA (Table 2) which might be due
to effect of natural selection in synonymous codon usage pattern
supporting the finding of Wei et al. (2014).
3.11. PR2-bias plot analysis
To find out whether the biased codon preferences are restricted to
highly biased genes, the association between A and T content and G
and C content in four-fold degenerate codon families (alanine, arginine,
glycine, leucine, proline, serine, threonine and valine) was analyzed by
PR2 plot. It was observed from Fig. 6, that A and T were used more
frequently than G and C in four-fold degenerate codon families in CYB
gene among pisces, aves and mammals. Differences between C and G
and between A and T contents were observed for CYB gene among pi-
sces, aves and mammals. The frequencies of AT were not equal to GC
at the 3rd codon position which suggests that preferences of codon
choices are influenced by both mutation pressure and natural selection
in codon usage pattern in MT-CYB gene.
3.12. Conclusion
The current study on the analysis of synonymous codon usage
among pisces, aves and mammals exhibited low codon usage bias of
Fig. 4. (a). Neutrality plot between GC12 and GC3% in pisces, regression equation y = MT-CYB gene. This may be due to the interaction of mutation pressure
2.943X − 84.54; R2 = 0.532. (b). Neutrality plot between GC12 and GC3% in aves, and natural selection with high level of MT-CYB gene expression. The
regression equation y = 0.887X + 10.23; R2 = 0.0018. (c). Neutrality plot between most frequent codons in MT-CYB gene of pisces, aves and mammals fa-
GC12 and GC3% in mammals, regression equation y = −0.122X − 52.15; R2 = 0.0004. vored A or C at the 3rd codon position which firmly indicates the role of
G3, C3 and GC3 as shown in Table 3. The F1 (axis 1) showed highly sig-
nificant positive correlation with T, G, T3 and G3 but significant negative
correlation with A, C, GC, A3, C3, and GC3. The F2 (axis 2) had no
Table 1
Correlation coefficients among ENC, GC, AT, purine, pyrimidine, keto and amino skew.
GC_skw AT_skw PU_skw PY_skw Ko_skw Am_skw
Pisces ENC r −.870⁎⁎ .753⁎⁎ .537⁎ −.922⁎⁎ −.742⁎⁎ −.465
p .000 .001 .039 .000 .002 .081
Aves ENC r −.450 −.141 .591⁎ .344 .204 .616⁎
p .093 .617 .020 .210 .466 .014
Mammals ENC r −.557⁎ .684⁎⁎ −.483 −.910⁎⁎ −.862⁎⁎ −.826⁎⁎
p .031 .005 .068 .000 .000 .000
**
p b 0.01.
*
p b 0.05. Fig. 5. Correspondence analysis of codon usage patterns of MT-CYB gene.
A. Uddin, S. Chakraborty / Gene 586 (2016) 105–114 113
Table 3
Correlation coefficients between the first two principle axes and nucleotide constraints in MT-CYB gene.
A T G C GC A3 T3 G3 C3 GC3
F1 r −.503⁎⁎ .696⁎⁎ .650⁎⁎ −.637⁎⁎ −.338⁎ −.500⁎⁎ .699⁎⁎ .641⁎⁎ −.666⁎⁎ −.402⁎⁎
p .000 .000 .000 .000 .023 .000 .000 .000 .000 .006
F2 r −.142 .185 .229 −.192 −.085 −.192 .133 .297⁎ −.138 −.006
p .353 .225 .130 .206 .580 .206 .385 .047 .364 .971
**
p b 0.01.
*
p b 0.05.
Fig. 6. (A), (B), (C). PR2-bias plot A3 / (A3 + T3) against G3 / (G3 + C3) for pisces, aves and mammals respectively in CYB gene.
compositional constraint in the presence of mutation pressure. The
study revealed that natural selection played a dominant role among References
the factors affecting the codon usage pattern while mutation pressure
Akashi, H., 1997. Codon bias evolution in Drosophila. Population genetics of mutation–
had a minor role in codon usage pattern in MT-CYB. The present inves- selection drift. Gene 205, 269–278.
tigation enhances our understanding of the mechanisms underlying Andersson, S., Kurland, C., 1990. Codon preferences in free-living microorganisms.
codon usage and increases our knowledge on the evolution of MT-CYB Microbiol. Rev. 54, 198–210.
Baisnée, P.-F., Hampson, S., Baldi, P., 2002. Why are complementary DNA strands
gene among the species under study. symmetric? Bioinformatics 18, 1021–1033.
Behura, S.K., Severson, D.W., 2012. Comparative analysis of codon usage bias and codon
context patterns between dipteran and hymenopteran sequenced genomes. PLoS
Disclosure One 7, e43111.
Beletskii, A., Bhagwat, A.S., 2001. Transcription-induced cytosine-to-thymine mutations
are not dependent on sequence context of the target cytosine. J. Bacteriol. 183,
The first author is thankful to UGC for providing the MANF-SRF. No 6491–6493.
fund was received from DBT or DST, Govt. of India for this research Blier, P.U., Dufresne, F., Burton, R.S., 2001. Natural selection and the evolution of mtDNA-
work. encoded peptides: evidence for intergenomic co-adaptation. Trends Genet. 17, 400–406.
Bulmer, M., 1991. The selection–mutation-drift theory of synonymous codon usage. Ge-
netics 129, 897–907.
Butt, A.M., Nasrullah, I., Tong, Y., 2014. Genome-wide analysis of codon usage and
Conflict of interests influencing factors in chikungunya viruses. PLoS One 9, e90905.
Chen, Y., 2013. A comparison of synonymous codon usage bias patterns in DNA and RNA
virus genomes: quantifying the relative importance of mutational pressure and nat-
The authors declare that there is no conflict of interests regarding ural selection. BioMed Res. Int. http://dx.doi.org/10.1155/2013/406342.
the publication of this manuscript. Chen, Y., 2014. Natural selection determines synonymous codon usage patterns of neur-
aminidase (NA) gene of the different subtypes of influenza A virus in Canada.
J. Viruses http://dx.doi.org/10.1155/2014/329049.
Chen, Y., Chen, Y.-F., 2014. Analysis of synonymous codon usage patterns in duck hepatitis
Author contributions
A virus: a comparison on the roles of mutual pressure and natural selection. Virus Dis.
25, 285–293.
SC and AU conceived and designed the experiments. SC and AU Chen, H.-T., Gu, Y.-X., Liu, Y.-S., 2013. Analysis of synonymous codon usage in dengue vi-
ruses. J. Anim. Vet. Adv. 12, 88–98.
performed the experiments. AU analyzed the data and wrote the
Chen, H., Sun, S., Norenburg, J.L., Sundberg, P., 2014. Mutation and selection cause codon
manuscript. Both authors have read and approved the final manuscript. usage and bias in mitochondrial genomes of ribbon worms (Nemertea). PLoS One 9
(1), e85631.
Dass, J.F.P., Sudandiradoss, C., 2012. Insight into pattern of codon biasness and nucleotide
Acknowledgments base usage in serotonin receptor gene family from different mammalian species.
Gene 503, 92–100.
Degli Esposti, M., De Vries, S., Crimi, M., Ghelli, A., Patarnello, T., Meyer, A., 1993. Mito-
We are thankful to Assam University, Silchar, Assam India for pro- chondrial cytochrome b: evolution and structure of the protein. Biochim. Biophys.
viding the necessary facilities to carry out the work. Acta Bioenerg. 1143, 243–271.
Duret, L., 2002. Evolution of synonymous codon usage in metazoans. Curr. Opin. Genet.
Dev. 12, 640–649.
Duret, L., Mouchiroud, D., 1999. Expression pattern and, surprisingly, gene length shape
Appendix A. Supplementary data codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. 96,
4482–4487.
Ellington, W.R., 2001. Evolution and physiological roles of phosphagen systems. Annu.
Supplementary data to this article can be found online at http://dx.
Rev. Physiol. 63, 289–325.
doi.org/10.1016/j.gene.2016.04.005.
114 A. Uddin, S. Chakraborty / Gene 586 (2016) 105–114
da Fonseca, R.R., Johnson, W.E., O'Brien, S.J., Ramos, M.J., Antunes, A., 2008. The adaptive Moritz, C., Schneider, C.J., Wake, D.B., 1992. Evolutionary relationships within the Ensatina
evolution of the mammalian mitochondrial genome. BMC Genomics 9, 119. eschscholtzii complex confirm the ring species interpretation. Syst. Biol. 41, 273–291.
Francino, M.P., Ochman, H., 2001. Deamination as the basis of strand-asymmetric evolu- Moriyama, E.N., Powell, J.R., 1998. Gene length and codon usage bias in Drosophila
tion in transcribed Escherichia coli sequences. Mol. Biol. Evol. 18, 1147–1150. melanogaster, Saccharomyces cerevisiae and Escherichia coli. Nucleic Acids Res. 26,
Garcia, J.A., Fernández-Guerra, A., Casamayor, E.O., 2011. A close relationship between pri- 3188–3193.
mary nucleotides sequence structure and the composition of functional genes in the Peixoto, L., Fernandez, V., Musto, H., 2004. The effect of expression levels on codon usage
genome of prokaryotes. Mol. Phylogenet. Evol. 61, 650–658. in Plasmodium falciparum. Parasitology 128, 245–251.
Gingold, H., Pilpel, Y., 2011. Determinants of translation efficiency and accuracy. Mol. Syst. Plotkin, J.B., Kudla, G., 2011. Synonymous but not the same: the causes and consequences
Biol. 7, 481. of codon bias. Nat. Rev. Genet. 12, 32–42.
Green, P., Ewing, B., Miller, W., Thomas, P.J., Green, E.D., 2003. Transcription-associated Powell, J.R., Moriyama, E.N., 1997. Evolution of codon usage bias in Drosophila. Proc. Natl.
mutational asymmetry in mammalian evolution. Nat. Genet. 33, 514–517. Acad. Sci. 94, 7784–7790.
Greenacre, M.J., 1984. Theory and Applications of Correspondence Analysis. Acedemic Powell, J.R., Sezzi, E., Moriyama, E.N., Gleason, J.M., Caccone, A., 2003. Analysis of a shift in
Press, London. codon usage in Drosophila. J. Mol. Evol. 57, S214–S225.
Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E., Tatham, R.L., 2006. Multivariate Data Anal- Sharp, P.M., Li, W.-H., 1986a. An evolutionary perspective on synonymous codon usage in
ysis. Pearson Prentice Hall, New York. unicellular organisms. J. Mol. Evol. 24, 28–38.
Harrison, R.G., 1989. Animal mitochondrial DNA as a genetic marker in population and Sharp, P.M., Li, W.-H., 1986b. Codon usage in regulatory genes in Escherichia coli does not
evolutionary biology. Trends Ecol. Evol. 4, 6–11. reflect selection for ‘rare’codons. Nucleic Acids Res. 14, 7737–7749.
Hassan, S., Mahalingam, V., Kumar, V., 2010. Synonymous codon usage analysis of thirty Sharp, P.M., Li, W.-H., 1987. The codon adaptation index—a measure of directional synon-
two mycobacteriophage genomes. Adv. Bioinforma. http://dx.doi.org/10.1155/2009/ ymous codon usage bias, and its potential applications. Nucleic Acids Res. 15,
316936. 1281–1295.
Hebert, P.D., Cywinska, A., Ball, S.L., 2003. Biological identifications through DNA barcodes. Sharp, P.M., Averof, M., Lloyd, A.T., Matassi, G., Peden, J.F., 1995. DNA sequence evolution:
Proc. R. Soc. Lond. B Biol. Sci. 270, 313–321. the sounds of silence. Philos. Trans. R. Soc. B Biol. Sci. 349, 241–247.
Hershberg, R., Petrov, D.A., 2008. Selection on codon bias. Annu. Rev. Genet. 42, 287–299. Sharp, P.M., Stenico, M., Peden, J.F., Lloyd, A.T., 1993. Codon usage: mutational bias, trans-
Ikemura, T., 1981. Correlation between the abundance of Escherichia coli transfer RNAs lational selection, or both? Biochem. Soc. Trans. 21, 835.
and the occurrence of the respective codons in its protein genes. J. Mol. Biol. 146, Sharp, P.M., Tuohy, T.M., Mosurski, K.R., 1986. Codon usage in yeast: cluster analysis clear-
1–21. ly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14, 5125–5143.
Ikemura, T., 1982. Correlation between the abundance of yeast transfer RNAs and the oc- Shields, D.C., Sharp, P.M., 1987. Synonymous codon usage in Bacillus subtilis reflects both
currence of the respective codons in protein genes: differences in synonymous codon translational selection and mutational biases. Nucleic Acids Res. 15, 8023–8040.
choice patterns of yeast and Escherichia coli with reference to the abundance of Stenico, M., Lloyd, A.T., Sharp, P.M., 1994. Codon usage in Caenorhabditis elegans: delinea-
isoaccepting transfer RNAs. J. Mol. Biol. 158, 573–597. tion of translational selection and mutational biases. Nucleic Acids Res. 22,
Ikemura, T., 1985. Codon usage and tRNA content in unicellular and multicellular organ- 2437–2446.
isms. Mol. Biol. Evol. 2, 13–34. Sueoka, N., 1988. Directional mutation pressure and neutral molecular evolution. Proc.
Irwin, D.M., Kocher, T.D., Wilson, A.C., 1991. Evolution of the cytochromeb gene of Natl. Acad. Sci. 85, 2653–2657.
mammals. J. Mol. Evol. 32, 128–144. Sueoka, N., 1995. Intrastrand parity rules of DNA base composition and usage biases of
Jenkins, G.M., Holmes, E.C., 2003. The extent of codon usage bias in human RNA viruses synonymous codons. J. Mol. Evol. 40, 318–325.
and its evolutionary origin. Virus Res. 92, 1–7. Sueoka, N., 1999. Two aspects of DNA base composition: G + C content and translation-
Jia, W., Higgs, P.G., 2008. Codon usage in mitochondrial genomes: distinguishing context- coupled deviation from intra-strand rule of A = T and G = C. J. Mol. Evol. 49, 49–62.
dependent mutation from translational selection. Mol. Biol. Evol. 25, 339–351. Uddin A, Chakraborty S., 2014. Mutation pressure dictates codon usage pattern in mito-
Jia, X., Liu, S., Zheng, H., Li, B., Qi, Q., Wei, L., Zhao, T., He, J., Sun, J., 2015. Non-uniqueness of chondrial Atpase8 in some mammalian species.
factors constraint on the codon usage in Bombyx mori. BMC Genomics 16, 356. Vicario, S., Moriyama, E.N., Powell, J.R., 2007. Codon usage in twelve species of Drosophila.
Karlin, S., Mrázek, J., 1996. What drives codon choices in human genes? J. Mol. Biol. 262, BMC Evol. Biol. 7, 226.
459–472. Wei, L., He, J., Jia, X., Qi, Q., Liang, Z., Zheng, H., Ping, Y., Liu, S., Sun, J., 2014. Analysis of
Keightley, P.D., Lercher, M.J., Eyre-Walker, A., 2005. Evidence for widespread degradation codon usage bias of mitochondrial genome in Bombyx mori and its relation to evolu-
of gene control regions in hominid genomes. PLoS Biol. 3, e42. tion. BMC Evol. Biol. 14, 262.
Kim, C.H., Oh, Y., Lee, T.H., 1997. Codon optimization for high-level expression of human Wolstenholme, D.R., 1992. Animal mitochondrial DNA: structure and evolution. Int. Rev.
erythropoietin (EPO) in mammalian cells. Gene 199, 293–301. Cytol. 141, 173–216.
Knight, R.D., Freeland, S.J., Landweber, L.F., 2001. A simple model based on mutation and Wright, F., 1990. The ‘effective number of codons’ used in a gene. Gene 87, 23–29.
selection explains trends in codon and amino-acid usage and GC composition within Zhang, Z., Dai, W., Dai, D., 2013. Synonymous Codon Usage in TTSuV2: Analysis and
and across genomes. Genome Biol. 2 (research0010). Comparison With TTSuV1.
Liu, Q., Feng, Y., Xa, Zhao, Dong, H., Xue, Q., 2004. Synonymous codon usage bias in Oryza Zhao, X., Huo, K., Y, Li, 2000. Synonymous codon usage in Pichia pastoris. Sheng wu gong
sativa. Plant Sci. 167, 101–105. cheng xue bao 16, 308–311.
Liu, X., Wu, C., Chen, A.Y.-H., 2010. Codon usage bias and recombination events for neur- Zhao, S., Zhang, Q., Chen, Z., Zhao, Y., Zhong, J., 2007. The factors shaping synonymous
aminidase and hemagglutinin genes in Chinese isolates of influenza A virus subtype codon usage in the genome of Burkholderia mallei. J. Genet. Genomics 34, 362–372.
H9N2. Arch. Virol. 155, 685–693. Zhou, T., Gu, W., Ma, J., Sun, X., Lu, Z., 2005. Analysis of synonymous codon usage in H5N1
Mardia, K.V., Kent, J.T., Bibby, J.M., 1979. Multivariate Analysis. Academic press. virus and other influenza A viruses. Biosystems 81, 77–86.
Meyer, A., Wilson, A.C., 1990. Origin of tetrapods inferred from their mitochondrial DNA
affiliation to lungfish. J. Mol. Evol. 31, 359–364.
Mirsafian, H., Mat Ripen, A., Singh, A., Teo, P.H., Merican, A.F., Mohamad, S.B., 2014. A
comparative analysis of synonymous codon usage bias pattern in human albumin
superfamily. Sci. World J. http://dx.doi.org/10.1155/2014/639682.