Thursday 9 June 2016

A new InterPro member database

CDD joins InterPro

We are pleased to announce that the NCBI Conserved Domain Database (CDD) has joined the InterPro consortium as a member database, and has begun to be integrated into the resource.  This is the first new member database to be integrated into InterPro since HAMAP was included, back in 2009.  As you can see, it has been 7 years since a new database has been added, and this is a first for the current InterPro team.

While there are some similarities between CDD and InterPro, with both resource aggregating data from Pfam, TIGRFAMs and SMART, CDD also utilises COGs and PRKs, whereas InterPro incorporates eight other resources.  Unlike InterPro, the CDD team also curate their own models, using position specific scoring matrixes (PSSMs) to represent protein domains, and it is these models that have been prioritised for integration into InterPro.

What does CDD bring?

This release contains all 11,273 CDD models, with 318 already integrated into InterPro.  It will take time for the rest of the entries to be curated into the InterPro hierarchy, ensuring that we are consistent with both CDD’s and our own relationship trees, as well as assigning GO terms for InterPro2GO. The NCBI models are often functionally specific, with multiple CDD entries covering the same sequence set as a single Pfam profile hidden Markov model (HMM), for example.

Having such algorithmic and database diversity helps capture as much knowledge as possible about the function of a protein.  CDD uses a derivative of RPS-BLAST that performs the appropriate assignment of proteins to database entries. This software, rpsbproc (also known as CD-search), has been incorporated into the latest version of InterProScan, making CDD accessible via our web services too.


Figure 1. InterProScan result for UniProt protein A0AM81. CDD entry cd00770 extends coverage of the protein and adds more functionally specific information about the C-terminal aminoacyl-tRNA synthetase domain (in this case, that it is a serine-tRNA ligase domain).

What Next?

With summer fast approaching, we have our typical break from database releases, with the next InterPro release not anticipated until the beginning of September.  In the meantime, we have a number of planned infrastructural changes that will broaden the scope of InterPro further... We will describe these changes in detail once they are ready!

Rob Finn
on behalf of the InterPro team

Thursday 14 April 2016

Navigating the ever-changing ocean of biological knowledge



The removal of annotation from biological databases is often taken to mean that the annotation was wrong in the first place. Why else would diligent biocuators remove information that had been painstakingly added to database entries? In our recent paper, 'GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations', we look at some of the diverse, data-driven changes that can underlie the deletion or update of Gene Ontology annotations in the InterPro database, and highlight some of the consequent effects of these changes on UniProt protein annotations. We also explain why these changes don't necessarily mean that the original annotations were unreliable. Alternatively, we argue that they signify a curation effort committed to annotation accuracy, attempting to navigate an ever-changing ocean of biological knowledge.

Alex Mitchell
on behalf of the InterPro team

Wednesday 24 February 2016

Zika Virus and Microcephaly



You have probably been as horrified and saddened as me to see the shocking abnormality that affects newborn babies whose mothers have been infected with the Zika virus.  The skulls and brains of the babies have not grown properly, and the babies appear to have small heads, a condition known as "microcephaly".  The standard definition is that the circumference of the head is two (or three) standard deviations below average for age and sex [1,2] (Fig. 1).  

Fig. 1. Diagram to show size of a baby’s head with microcephaly compared to a normal baby’s head.  From  https://prezi.com/iwv4kvehmhbv/microcephaly-then-and-now/.

Origin and spread of the Zika virus

Zika virus has been known since the 1940s, and originally occurred in the equatorial regions of Africa.  It is named after the Zika Forest near the Ugandan capital of Entebbe.  Analysis of the various sequenced genomes has shown an origin in central Africa (a strain from Uganda isolated in 1947 being the oldest), spreading elsewhere in Africa (Senegal (1984), Nigeria (1968)  and the Central African Republic (1976)) and then spread westwards to Malaysia (1966), Cambodia (2010), Micronesia (2007), French Polynesia (2013) and then Suriname and Brazil (2015) [http://virological.org/t/initial-Zika-phylogeography/202].  The virus is transmitted by mosquitoes such as Aedes aegypti (Fig. 2) and A. albopictus.  These mosquitoes are active during the day, mainly at dawn and dusk and when the weather is cloudy, and transmit the virus from patient to patient when the females take a blood meal.  A. aegypti is known as the yellow fever mosquito, and is particularly distinctive with white rings around the leg joints and white markings on the body.  This mosquito originated in Africa but has since spread throughout the tropics [3].  There is also evidence that Zika virus can be transmitted sexually via the semen of an infected man [4].

Fig. 2. An Aedes aegypti mosquito (photo taken by Muhammad Mahdi Karim in Dar es Salaam, Tanzania, 2009).


Zika fever, which has mild influenza-like symptoms, had been thought to be a trivial disease.  Now there are a several questions that require answers.  If there a causal link between microcephaly and viral infection or are the symptoms coincidental?  If the disease causes the symptoms, is this an effect of viral enzymes, or a consequence of the body's own immunological system attacking more than just the virus?

Microcephaly in Brazil

Microcephaly is not a new condition, and can result from chromosomal abnormalities as well as environmental conditions that can affect brain growth.  Mutations in the genes MCPH1, which encodes the protein microcephalin, and ASPM, which encodes abnormal spindle-like microcephaly-associated protein, can cause primary microcephaly when the gene is homozygous [5- 7].  Microcephaly is associated with other viral diseases, such as chickenpox [8], but incidences are rare because women rarely get the disease when pregnant because of the innate immunity they acquired during childhood infection.  It is possible, of course, that the same may be true of Zika virus, which would explain why microcephaly is not prevalent in Africa, because women acquire immunity as girls, and would also explain the dramatic increase in the condition in Brazil, where the disease arrived recently and pregnant women have no immunity.  The rates of Zika infection and microcephaly in Brazil really are alarming.  It has been estimated that 1.5 million cases of Zika fever occurred in Brazil between April 2015 and January 2016, and 3718 cases of microcephaly (38 of which led to death) [9], which is one case per 403 infections, and one case per 793 births (the population of Brazil is 204 million and the annual birth rate is 14.46 per 1000 [https://www.cia.gov/library/publications/the-world-factbook/geos/br.html]).  This is considerably higher than the known incidence of microcephaly in the UK (where the Zika virus is absent): approximately 1 in 10,000 births in the UK [http://www.rightdiagnosis.com/m/microcephaly/basics.htm]. 

Zika virus polyprotein

The Zika virus is a flavivirus, a group that includes the viruses that cause yellow fever, dengue fever, Japanese encephalitis and West Nile fever.  These viruses contain single-stranded RNA as their genetic material, and the RNA encodes a single polyprotein.  This polyprotein consists of several enzymes and structural proteins, and processing by an endogenous serine endopeptidase is required to separate the individual proteins.  By submitting the Zika virus polyprotein to InterProScan, it is possible to identify all the components.  These are shown below.  There is no component with an unknown function or one expected to affect brain development directly.

Fig. 3.  Zika virus polyprotein domains identified by InterProScan.



How polyprotein processing progresses in the Zika vuris polyprotein is unknown, but some of the cleavage sites have been mapped in both the yellow fever virus and West Nile virus [10, 11].  All known cleavages are performed by the endogenous serine endopeptidase, but one cleavage can be performed by unrelated host serine endopeptidases normally responsible for processing host protein precursors [12].  The specificity for both the viral and host endopeptidases is similar: cleavage follows a pair of basic residues (lysine or arginine) and precedes glycine, serine or threonine.  A pairwise alignment of the West Nile and Zika virus polyprotein sequences, shows that the known cleavage sites are conserved (Fig. 4).   

Fig. 4 Conservation of polyprotein cleavage sites
Sites of cleavage are indicated by an arrow.  Residues highlighted in pink are conserved between West Nile virus (W Nile) and Zika virus.  Residue numbers are shown above and below each sequence.
      60        70        80        90       100      110        
W Nile APTRAVLDRWRGVNKQTAMKHLLSFKKELGTLTSAINRRSTKQKKRGGTAGFTILLGLIA
        :. ....::  :.:. :.: : .:: ..::.   :: :.::  :::  .:. ..:.:..
Zika   KPSTGLINRWGKVGKKEAIKILTKFKADVGTMLRIINNRKTK--KRGVETGI-VFLALLV
      60        70        80        90       100          110     

     180       190       200         210 ↓     220       230      
W Nile AAGNDPEDIDCWCTKSSVYVRYGRCTK--TRHSRRSRRSLTVQTHGESTLANKKGAWLDS
           .:::.::::....... :: ::.  : ..::::::.:. .:. . : .....::.:
Zika   EPQYEPEDVDCWCNSTAAWIVYGTCTHKTTGETRRSRRSITLPSHASQKLETRSSTWLES
        180       190       200       210       220       230     

     1370     1380      1390      1400      1410      1420       
W Nile DPNRKRGWPATEVMTAVGLMFAIVGGLAELDIDSMAIPMTIAGLMFVAFVISGKSTDMWI
         ..::.:: .::::::::. ::::::.. ::: :: ::.  ::. :..:.::::.::.:
Zika   TASKKRSWPPSEVMTAVGLICAIVGGLTKTDID-MAGPMAAIGLLVVSYVVSGKSVDMYI
       1370      1380      1390       1400      1410      1420    


     1490       1500    ↓ 1510      1520      1530      1540      
W Nile ILPSVIGFW-ITLQYTKRGGVLWDTPSPKEYKKGDTTTGVYRIMTRGLLGSYQAGAGVMV
       : : . . : . ..  ::.:..:: :::.: :::.::.:::::::: :::: :.:::::
Zika   I-PFAAAAWFVYIKSGKRSGAMWDIPSPREVKKGETTAGVYRIMTRKLLGSTQVGAGVMH
         1490      1500      1510      1520      1530      1540   

      2090      2100      2110      2120       2130      2140     
W Nile ITKLGERKILRPRWADARVYSDHQALKSFKDFASGKRS-QIGLVEVLGRMPEHFMGKTWE
        ::.::.:::.::: :::. ::: .:::::.::.:::.   ::.:..: .: :.  .  :
Zika   WTKFGEKKILKPRWMDARICSDHASLKSFKEFAAGKRTIATGLIEAFGMLPGHMTERFQE
         2090      2100      2110      2120      2130      2140   

            2510      2520        2530      2540      2550 
                                          
W Nile HIMRGGWLSCLSITWTLIKNMEKPGL--KRGGAKGRTLGEVWKERLNHMTKEEFTRYRKE
       .:.::..:.  :. .:. .:    :.  ::::..:.:.:: ::::::.::  ::  :..
Zika   NIFRGSYLAGPSLIYTVTRNA---GIMKKRGGGNGETVGEKWKERLNRMTALEFYAYKRS
    2500      2510      2520         2530      2540      2550     

Is microcephalin a substrate for the Zika virus endopeptidase?

Could it be that the viral endopeptidase is processing host proteins at similar sites?  There are at least 24 human proteins known to be cleaved by viral endopeptidases.  Cleaving eukaryotic translation initiation factors and polyadenylate-binding protein 1 switches off the host cell's own protein synthesis mechanism, ensuring that only viral proteins are made, and the endopeptidases from retroviruses, enteroviruses and foot-and-mouth disease virus all cleave these proteins [13-17].   Nuclear pore glycoprotein p62 is also cleaved by the rhinovirus endopeptidase picornain 2A peptidase, and this disrupts trafficking from the nucleus to the cytoplasm [18].  Both microcephalin (http://www.uniprot.org/uniprot/Q8NEM0) and ASPM (http://www.uniprot.org/uniprot/Q8IZT6) have regions that conform to the specificity of the Zika virus endopeptidase (Fig. 5) so either could be a potential substrate and be inactivated by cleavage.  If cleavage of these proteins has the same effect as mutations in the genes, then cleavage could lead to microcephaly.

Fig. 5 Potential cleavage sites in microcephalin and ASPM
MCPH1  66  QSTWDKAQKR+GVKLVSVLWV
MCPH1 375  PPKEKCKRKR+STRRSIMPRL
MCPH1 379  KCKRKRSTRR+SIMPRLQLCR
MCPH1 467  MSDFSCVGKK+TRTVDITNFT
MCPH1 486  TAKTISSPRK+TGNGEGRATS
MCPH1 639  LIKPHEELKK+SGRGKKPTRT

ASPM  148  NAEEQKKKKR+SLWDTIKKKK
ASPM  243  ATCLPLSVRR+STTYSSLHAS
ASPM  431  VPQSPEDWRK+SEVSPRIPEC
ASPM  576  TTASVARKRK+SDGSMEDANV
ASPM  616  SEPKTSAVKK+TKNVTTPISK
ASPM  639  NREKLNLKKK+TDLSIFRTPI
ASPM  655  RTPISKTNKR+TKPIIAVAQS
ASPM 1081  FLKHTKSIKK+TISLLSCHSD
ASPM 1098  HSDDLINKKK+GKRDSGSFEQ
ASPM 1584  DRVRFLNLKK+TIIKFQAHVR
ASPM 2095  QHKEYLNLKK+TAIKIQSVYR
ASPM 2184  ASFRGVRVRR+TLRKMQTAAT
ASPM 2287  MRRRFLSLKK+TAILIQRKYR
ASPM 2712  RAKVDYETKK+TAIVVIQNYY
ASPM 3081  ERIKYIEFKK+STVILQALVR
ASPM 3252  IREENKLYKR+TALALHYLLT

Conclusions

The incidences of microcephaly in babies born to mothers infected by the Zika virus in Brazil are not only alarmingly high, but much higher than the background mutation rate that causes microcephaly in the UK; there seems to be little doubt that the condition and Zika fever are related.  Whether this relationship is because the disease is new to Brazil, mothers have no immunity and microcephaly results from the body’s own immune response, as has been observed previously in chickenpox, or because of the presence of a viral toxin, is not known.  If the latter, then it is possible that the proteins derived from genes in which mutations are known to cause microcephaly are susceptible to digestion by the Zika virus polyprotein processing enzyme, which is predicted to have a specificity similar to that of host prohormone convertases: inactivating the proteins may have the same results as mutations in the genes.  Further research is required to understand the mechanisms causing microcephaly, which might include characterization of the viral endopeptidase.  If the symptoms are due to the response of the immune system, then microcephaly might be a transitory phenomenon, and once the population builds up immunity, such incidences could become very rare in the future.

References

1. Leviton, A., Holmes, L. B., Allred, E. N. & Vargas, J. (2002). Methodologic issues in epidemiologic studies of congenital microcephaly. Early Hum. Dev. 69:91-105. doi:10.1016/S0378-3782(02)00065-8. PMID:12324187.
2. Opitz, J. M. & Holt, M. C. (1990). Microcephaly: general considerations and aids to nosology. J. Craniofac. Genet. Dev. Biol. 10:75-204. PMID:2211965.
3. Mousson, L.,  Dauga, C., Garrigues, T., Schaffner, F., Vazeille, M.  & Failloux, A. (2005). Phylogeography of Aedes (Stegomyia) aegypti (L.) and Aedes (Stegomyia) albopictus (Skuse) (Diptera: Culicidae) based on mitochondrial DNA variations. Genetics Research 86:1-11. doi:10.1017/S0016672305007627. PMID:16181519.
4. Musso, D., Roche, C., Robin, E., Nhan, T., Teissier, A. & Cao-Lormeau, V.M.  (2015) Potential sexual transmission of Zika virus.  Emerg Infect Dis 21:359-61. doi: 10.3201/eid2102.141363. PMID:25625872.
5. Jackson, A. P., Eastwood, H., Bell, S. M., Adu, J., Toomes, C., Carr, I. M., Roberts, E., Hampshire, Daniel J., et al. (2002). Identification of Microcephalin, a Protein Implicated in Determining the Size of the Human Brain. Am. J. Human Genetics 71:136-142. doi:10.1086/341283. PMC:419993. PMID:12046007.
6. Jackson, A. P., McHale, D. P., Campbell, D. A., Jafri, H., Rashid, Y., Mannan, J., Karbani, G., Corry, P., et al. (1998). Primary Autosomal Recessive Microcephaly (MCPH1) Maps to Chromosome 8p22-pter. Am. J. Human Genetics 63:541-546. doi:10.1086/301966. PMC:1377307. PMID:9683597.
7. Bond, J., Roberts, E., Mochida, G.H., Hampshire, D.J., Scott, S., Askham, J.M., Springell, K., Mahadevan, M., Crow, Y.J., Markham, A.F., Walsh, C.A. & Woods, C.G. (2002) ASPM is a major determinant of cerebral cortical size. Nat. Genet. 32:316-320.  PMID:14574646.
8. Mirlesse V. & Lebon P. (2003 ) [Chickenpox during pregnancy]. Arch. Pediatr. 10:1113-1118. PMID:14643554.
9. World Health Organization (8 January 2016) Microcephaly - Brazil.
10. Chappell, K. J., Stoermer, M. J., Fairlie, D. P. & Young, P. R. (2006) Insights to substrate binding and processing by West Nile Virus NS3 protease through combined modeling, protease mutagenesis, and kinetic studies. J. Biol. Chem. 281:38448-38458. PMID:17052977.
11. Shiryaev, S. A., Ratnikov, B. I., Chekanov, A. V., Sikora, S., Rozanov, D. V., Godzik, A., Wang, J., Smith, J. W., Huang, Z., Lindberg, I., Samuel, M. A., Diamond, M. S. & Strongin, A. Y. (2006) Cleavage targets and the D-arginine-based inhibitors of the West Nile virus NS3 processing proteinase. Biochem. J.  393:503-511. PMID:16229682.
12. Remacle, A. G., Shiryaev, S. A., Oh, E. S., Cieplak, P., Srinivasan, A., Wei, G., Liddington, R. C., Ratnikov, B. I., Parent, A., Desjardins, R., Day, R., Smith, J. W., Lebl, M. & Strongin, A. Y. (2008) Substrate cleavage analysis of furin and related proprotein convertases. A comparative study. J. Biol. Chem. 283:20897-20906. PMID:18505722.
13. Alvarez, E., Menéndez-Arias, L., & Carrasco, L. (20030 The eukaryotic translation initiation factor 4GI is cleaved by different retroviral proteases. J. Virol. 77:12392-12400.
14. Gradi, A., Foeger, N., Strong, R., Svitkin, Y. V., Sonenberg, N., Skern, T., Belsham, G. J. (2004) Cleavage of eukaryotic translation initiation factor 4GII within foot-and-mouth disease virus-infected cells: identification of the L-protease cleavage site in vitro. J. Virol. 78:3271-3278.
15. Gradi, A., Svitkin, Y. V., Sommergruber, W., Imataka, H., Morino, S., Skern, T. & Sonenberg, N. (2003) Human rhinovirus 2A proteinase cleavage sites in eukaryotic initiation factors (eIF) 4GI and eIF4GII are different. J. Virol. 77:5026-5029. PMID:15016848.
16. Foeger, N., Schmid, E. M. & Skern, T. (2003) Human rhinovirus 2 2Apro recognition of eukaryotic initiation factor 4GI. Involvement of an exosite. J. Biol. Chem. 278:33200-33207. PMID:12791690.
17. Kuyumcu-Martinez, N. M., Joachims, M. & Lloyd, R. E. (2002) Efficient cleavage of ribosome-associated poly(A)-binding protein by enterovirus 3C protease. J. Virol. 76:2062-2074. PMID:11836384.
18. Park, N., Skern, T. & Gustin, K. E. (2010) Specific cleavage of the nuclear pore complex protein Nup62 by a viral protease.  J. Biol Chem. 285:28796-805. doi:10.1074/jbc.M110.143404. PMID:20622012.