Thursday, 9 June 2016

A new InterPro member database

CDD joins InterPro

We are pleased to announce that the NCBI Conserved Domain Database (CDD) has joined the InterPro consortium as a member database, and has begun to be integrated into the resource.  This is the first new member database to be integrated into InterPro since HAMAP was included, back in 2009.  As you can see, it has been 7 years since a new database has been added, and this is a first for the current InterPro team.

While there are some similarities between CDD and InterPro, with both resource aggregating data from Pfam, TIGRFAMs and SMART, CDD also utilises COGs and PRKs, whereas InterPro incorporates eight other resources.  Unlike InterPro, the CDD team also curate their own models, using position specific scoring matrixes (PSSMs) to represent protein domains, and it is these models that have been prioritised for integration into InterPro.

What does CDD bring?

This release contains all 11,273 CDD models, with 318 already integrated into InterPro.  It will take time for the rest of the entries to be curated into the InterPro hierarchy, ensuring that we are consistent with both CDD’s and our own relationship trees, as well as assigning GO terms for InterPro2GO. The NCBI models are often functionally specific, with multiple CDD entries covering the same sequence set as a single Pfam profile hidden Markov model (HMM), for example.

Having such algorithmic and database diversity helps capture as much knowledge as possible about the function of a protein.  CDD uses a derivative of RPS-BLAST that performs the appropriate assignment of proteins to database entries. This software, rpsbproc (also known as CD-search), has been incorporated into the latest version of InterProScan, making CDD accessible via our web services too.

Figure 1. InterProScan result for UniProt protein A0AM81. CDD entry cd00770 extends coverage of the protein and adds more functionally specific information about the C-terminal aminoacyl-tRNA synthetase domain (in this case, that it is a serine-tRNA ligase domain).

What Next?

With summer fast approaching, we have our typical break from database releases, with the next InterPro release not anticipated until the beginning of September.  In the meantime, we have a number of planned infrastructural changes that will broaden the scope of InterPro further... We will describe these changes in detail once they are ready!

Rob Finn
on behalf of the InterPro team

Thursday, 14 April 2016

Navigating the ever-changing ocean of biological knowledge

The removal of annotation from biological databases is often taken to mean that the annotation was wrong in the first place. Why else would diligent biocuators remove information that had been painstakingly added to database entries? In our recent paper, 'GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations', we look at some of the diverse, data-driven changes that can underlie the deletion or update of Gene Ontology annotations in the InterPro database, and highlight some of the consequent effects of these changes on UniProt protein annotations. We also explain why these changes don't necessarily mean that the original annotations were unreliable. Alternatively, we argue that they signify a curation effort committed to annotation accuracy, attempting to navigate an ever-changing ocean of biological knowledge.

Alex Mitchell
on behalf of the InterPro team

Wednesday, 24 February 2016

Zika Virus and Microcephaly

You have probably been as horrified and saddened as me to see the shocking abnormality that affects newborn babies whose mothers have been infected with the Zika virus.  The skulls and brains of the babies have not grown properly, and the babies appear to have small heads, a condition known as "microcephaly".  The standard definition is that the circumference of the head is two (or three) standard deviations below average for age and sex [1,2] (Fig. 1).  

Fig. 1. Diagram to show size of a baby’s head with microcephaly compared to a normal baby’s head.  From

Origin and spread of the Zika virus

Zika virus has been known since the 1940s, and originally occurred in the equatorial regions of Africa.  It is named after the Zika Forest near the Ugandan capital of Entebbe.  Analysis of the various sequenced genomes has shown an origin in central Africa (a strain from Uganda isolated in 1947 being the oldest), spreading elsewhere in Africa (Senegal (1984), Nigeria (1968)  and the Central African Republic (1976)) and then spread westwards to Malaysia (1966), Cambodia (2010), Micronesia (2007), French Polynesia (2013) and then Suriname and Brazil (2015) [].  The virus is transmitted by mosquitoes such as Aedes aegypti (Fig. 2) and A. albopictus.  These mosquitoes are active during the day, mainly at dawn and dusk and when the weather is cloudy, and transmit the virus from patient to patient when the females take a blood meal.  A. aegypti is known as the yellow fever mosquito, and is particularly distinctive with white rings around the leg joints and white markings on the body.  This mosquito originated in Africa but has since spread throughout the tropics [3].  There is also evidence that Zika virus can be transmitted sexually via the semen of an infected man [4].

Fig. 2. An Aedes aegypti mosquito (photo taken by Muhammad Mahdi Karim in Dar es Salaam, Tanzania, 2009).

Zika fever, which has mild influenza-like symptoms, had been thought to be a trivial disease.  Now there are a several questions that require answers.  If there a causal link between microcephaly and viral infection or are the symptoms coincidental?  If the disease causes the symptoms, is this an effect of viral enzymes, or a consequence of the body's own immunological system attacking more than just the virus?

Microcephaly in Brazil

Microcephaly is not a new condition, and can result from chromosomal abnormalities as well as environmental conditions that can affect brain growth.  Mutations in the genes MCPH1, which encodes the protein microcephalin, and ASPM, which encodes abnormal spindle-like microcephaly-associated protein, can cause primary microcephaly when the gene is homozygous [5- 7].  Microcephaly is associated with other viral diseases, such as chickenpox [8], but incidences are rare because women rarely get the disease when pregnant because of the innate immunity they acquired during childhood infection.  It is possible, of course, that the same may be true of Zika virus, which would explain why microcephaly is not prevalent in Africa, because women acquire immunity as girls, and would also explain the dramatic increase in the condition in Brazil, where the disease arrived recently and pregnant women have no immunity.  The rates of Zika infection and microcephaly in Brazil really are alarming.  It has been estimated that 1.5 million cases of Zika fever occurred in Brazil between April 2015 and January 2016, and 3718 cases of microcephaly (38 of which led to death) [9], which is one case per 403 infections, and one case per 793 births (the population of Brazil is 204 million and the annual birth rate is 14.46 per 1000 []).  This is considerably higher than the known incidence of microcephaly in the UK (where the Zika virus is absent): approximately 1 in 10,000 births in the UK []. 

Zika virus polyprotein

The Zika virus is a flavivirus, a group that includes the viruses that cause yellow fever, dengue fever, Japanese encephalitis and West Nile fever.  These viruses contain single-stranded RNA as their genetic material, and the RNA encodes a single polyprotein.  This polyprotein consists of several enzymes and structural proteins, and processing by an endogenous serine endopeptidase is required to separate the individual proteins.  By submitting the Zika virus polyprotein to InterProScan, it is possible to identify all the components.  These are shown below.  There is no component with an unknown function or one expected to affect brain development directly.

Fig. 3.  Zika virus polyprotein domains identified by InterProScan.

How polyprotein processing progresses in the Zika vuris polyprotein is unknown, but some of the cleavage sites have been mapped in both the yellow fever virus and West Nile virus [10, 11].  All known cleavages are performed by the endogenous serine endopeptidase, but one cleavage can be performed by unrelated host serine endopeptidases normally responsible for processing host protein precursors [12].  The specificity for both the viral and host endopeptidases is similar: cleavage follows a pair of basic residues (lysine or arginine) and precedes glycine, serine or threonine.  A pairwise alignment of the West Nile and Zika virus polyprotein sequences, shows that the known cleavage sites are conserved (Fig. 4).   

Fig. 4 Conservation of polyprotein cleavage sites
Sites of cleavage are indicated by an arrow.  Residues highlighted in pink are conserved between West Nile virus (W Nile) and Zika virus.  Residue numbers are shown above and below each sequence.
      60        70        80        90       100      110        
        :. ....::  :.:. :.: : .:: ..::.   :: :.::  :::  .:. ..:.:..
      60        70        80        90       100          110     

     180       190       200         210 ↓     220       230      
           .:::.::::....... :: ::.  : ..::::::.:. .:. . : .....::.:
        180       190       200       210       220       230     

     1370     1380      1390      1400      1410      1420       
         ..::.:: .::::::::. ::::::.. ::: :: ::.  ::. :..:.::::.::.:
       1370      1380      1390       1400      1410      1420    

     1490       1500    ↓ 1510      1520      1530      1540      
       : : . . : . ..  ::.:..:: :::.: :::.::.:::::::: :::: :.:::::
         1490      1500      1510      1520      1530      1540   

      2090      2100      2110      2120       2130      2140     
        ::.::.:::.::: :::. ::: .:::::.::.:::.   ::.:..: .: :.  .  :
         2090      2100      2110      2120      2130      2140   

            2510      2520        2530      2540      2550 
       .:.::..:.  :. .:. .:    :.  ::::..:.:.:: ::::::.::  ::  :..
    2500      2510      2520         2530      2540      2550     

Is microcephalin a substrate for the Zika virus endopeptidase?

Could it be that the viral endopeptidase is processing host proteins at similar sites?  There are at least 24 human proteins known to be cleaved by viral endopeptidases.  Cleaving eukaryotic translation initiation factors and polyadenylate-binding protein 1 switches off the host cell's own protein synthesis mechanism, ensuring that only viral proteins are made, and the endopeptidases from retroviruses, enteroviruses and foot-and-mouth disease virus all cleave these proteins [13-17].   Nuclear pore glycoprotein p62 is also cleaved by the rhinovirus endopeptidase picornain 2A peptidase, and this disrupts trafficking from the nucleus to the cytoplasm [18].  Both microcephalin ( and ASPM ( have regions that conform to the specificity of the Zika virus endopeptidase (Fig. 5) so either could be a potential substrate and be inactivated by cleavage.  If cleavage of these proteins has the same effect as mutations in the genes, then cleavage could lead to microcephaly.

Fig. 5 Potential cleavage sites in microcephalin and ASPM



The incidences of microcephaly in babies born to mothers infected by the Zika virus in Brazil are not only alarmingly high, but much higher than the background mutation rate that causes microcephaly in the UK; there seems to be little doubt that the condition and Zika fever are related.  Whether this relationship is because the disease is new to Brazil, mothers have no immunity and microcephaly results from the body’s own immune response, as has been observed previously in chickenpox, or because of the presence of a viral toxin, is not known.  If the latter, then it is possible that the proteins derived from genes in which mutations are known to cause microcephaly are susceptible to digestion by the Zika virus polyprotein processing enzyme, which is predicted to have a specificity similar to that of host prohormone convertases: inactivating the proteins may have the same results as mutations in the genes.  Further research is required to understand the mechanisms causing microcephaly, which might include characterization of the viral endopeptidase.  If the symptoms are due to the response of the immune system, then microcephaly might be a transitory phenomenon, and once the population builds up immunity, such incidences could become very rare in the future.


1. Leviton, A., Holmes, L. B., Allred, E. N. & Vargas, J. (2002). Methodologic issues in epidemiologic studies of congenital microcephaly. Early Hum. Dev. 69:91-105. doi:10.1016/S0378-3782(02)00065-8. PMID:12324187.
2. Opitz, J. M. & Holt, M. C. (1990). Microcephaly: general considerations and aids to nosology. J. Craniofac. Genet. Dev. Biol. 10:75-204. PMID:2211965.
3. Mousson, L.,  Dauga, C., Garrigues, T., Schaffner, F., Vazeille, M.  & Failloux, A. (2005). Phylogeography of Aedes (Stegomyia) aegypti (L.) and Aedes (Stegomyia) albopictus (Skuse) (Diptera: Culicidae) based on mitochondrial DNA variations. Genetics Research 86:1-11. doi:10.1017/S0016672305007627. PMID:16181519.
4. Musso, D., Roche, C., Robin, E., Nhan, T., Teissier, A. & Cao-Lormeau, V.M.  (2015) Potential sexual transmission of Zika virus.  Emerg Infect Dis 21:359-61. doi: 10.3201/eid2102.141363. PMID:25625872.
5. Jackson, A. P., Eastwood, H., Bell, S. M., Adu, J., Toomes, C., Carr, I. M., Roberts, E., Hampshire, Daniel J., et al. (2002). Identification of Microcephalin, a Protein Implicated in Determining the Size of the Human Brain. Am. J. Human Genetics 71:136-142. doi:10.1086/341283. PMC:419993. PMID:12046007.
6. Jackson, A. P., McHale, D. P., Campbell, D. A., Jafri, H., Rashid, Y., Mannan, J., Karbani, G., Corry, P., et al. (1998). Primary Autosomal Recessive Microcephaly (MCPH1) Maps to Chromosome 8p22-pter. Am. J. Human Genetics 63:541-546. doi:10.1086/301966. PMC:1377307. PMID:9683597.
7. Bond, J., Roberts, E., Mochida, G.H., Hampshire, D.J., Scott, S., Askham, J.M., Springell, K., Mahadevan, M., Crow, Y.J., Markham, A.F., Walsh, C.A. & Woods, C.G. (2002) ASPM is a major determinant of cerebral cortical size. Nat. Genet. 32:316-320.  PMID:14574646.
8. Mirlesse V. & Lebon P. (2003 ) [Chickenpox during pregnancy]. Arch. Pediatr. 10:1113-1118. PMID:14643554.
9. World Health Organization (8 January 2016) Microcephaly - Brazil.
10. Chappell, K. J., Stoermer, M. J., Fairlie, D. P. & Young, P. R. (2006) Insights to substrate binding and processing by West Nile Virus NS3 protease through combined modeling, protease mutagenesis, and kinetic studies. J. Biol. Chem. 281:38448-38458. PMID:17052977.
11. Shiryaev, S. A., Ratnikov, B. I., Chekanov, A. V., Sikora, S., Rozanov, D. V., Godzik, A., Wang, J., Smith, J. W., Huang, Z., Lindberg, I., Samuel, M. A., Diamond, M. S. & Strongin, A. Y. (2006) Cleavage targets and the D-arginine-based inhibitors of the West Nile virus NS3 processing proteinase. Biochem. J.  393:503-511. PMID:16229682.
12. Remacle, A. G., Shiryaev, S. A., Oh, E. S., Cieplak, P., Srinivasan, A., Wei, G., Liddington, R. C., Ratnikov, B. I., Parent, A., Desjardins, R., Day, R., Smith, J. W., Lebl, M. & Strongin, A. Y. (2008) Substrate cleavage analysis of furin and related proprotein convertases. A comparative study. J. Biol. Chem. 283:20897-20906. PMID:18505722.
13. Alvarez, E., Menéndez-Arias, L., & Carrasco, L. (20030 The eukaryotic translation initiation factor 4GI is cleaved by different retroviral proteases. J. Virol. 77:12392-12400.
14. Gradi, A., Foeger, N., Strong, R., Svitkin, Y. V., Sonenberg, N., Skern, T., Belsham, G. J. (2004) Cleavage of eukaryotic translation initiation factor 4GII within foot-and-mouth disease virus-infected cells: identification of the L-protease cleavage site in vitro. J. Virol. 78:3271-3278.
15. Gradi, A., Svitkin, Y. V., Sommergruber, W., Imataka, H., Morino, S., Skern, T. & Sonenberg, N. (2003) Human rhinovirus 2A proteinase cleavage sites in eukaryotic initiation factors (eIF) 4GI and eIF4GII are different. J. Virol. 77:5026-5029. PMID:15016848.
16. Foeger, N., Schmid, E. M. & Skern, T. (2003) Human rhinovirus 2 2Apro recognition of eukaryotic initiation factor 4GI. Involvement of an exosite. J. Biol. Chem. 278:33200-33207. PMID:12791690.
17. Kuyumcu-Martinez, N. M., Joachims, M. & Lloyd, R. E. (2002) Efficient cleavage of ribosome-associated poly(A)-binding protein by enterovirus 3C protease. J. Virol. 76:2062-2074. PMID:11836384.
18. Park, N., Skern, T. & Gustin, K. E. (2010) Specific cleavage of the nuclear pore complex protein Nup62 by a viral protease.  J. Biol Chem. 285:28796-805. doi:10.1074/jbc.M110.143404. PMID:20622012.

Tuesday, 7 April 2015

What's new in InterPro release 50.0 and 51.0

Faster InterPro member database processing:
InterPro releases 50.0 and 51.0 have brought some important developments from an InterPro production point of view, which we thought would be worth sharing. Release 50.0 saw the incorporation of a new version of PIRSF, which has importantly been migrated to use the HMMER3.1b analysis algorithm. This version of HMMER runs approximately one thousand times faster than the previous version used by PIRSF (HMMER2.0), helping to ensure that InterPro can continue to calculate UniProtKB match data in a timely manner. In a related development, as part of InterPro release 51.0, we debuted a sequence database pre-filtering heuristic to reduce the amount of time it takes to calculate matches against the HAMAP database (the heuristic is based on HMMER3.0, but the analysis still uses the core HAMAP algorithm, and is all implemented within the InterProScan software).  This again speeds up our protein match generation process and helps to safeguard against future data growth. The PIRSF and HAMAP databases were identified as being the slowest databases to calculate matches at at the start of 2014, but after work from both the database maintainers and the InterPro team, but this is no longer the case.

A leaner UniProtKB: 
At the same time, the number of proteins in UniProtKB has decreased significantly, where some 47 million sequences from highly redundant bacterial proteomes have been deleted (for details, see here, described half way down the page).

Faster and fitter InterPro production:
The majority of these developments have taken place under the hood, so it is unlikely that you will have been aware of our fitter and faster production system. What we hope you will notice, however, are more regular InterPro releases and more frequent member database updates in future, as these and other optimisations come into effect.

Alex Mitchell
on behalf of the InterPro team

Tuesday, 31 March 2015

The sweetest thing

By Hsin-Yu Chang

A famous cola company launched a new product contained in a gleaming green can last year. As a regular cola drinker, I was intrigued by the packaging. After doing some research, I discovered that this variety of cola contains a sweetener called Stevia.
    Figure 1. Stevia rebaudiana
  Ethel Aardvark, Wikimedia

Stevia is extracted from a plant, Stevia rebaudiana, found in Brazil and Paraguay. The leaves of the Stevia plant have been used for hundreds of years in both countries to sweeten local teas and medicines. The sweet taste is mainly from steviol glycoside compounds, which have up to 150 times the sweetness of sugar, but zero calories 1.

The story of Stevia gave me, a protein database curator, the idea to search for the sweetest proteins to date. I found one such protein, thaumatin (IPR001938), produced by Thaumatococcus daniellii (also known as Katemfe), a shrub from West Africa. Thaumatin is around 2,000 times sweeter than sugar 2 !

Similar to Stevia, Katemfe plants have been used by the locals for a long time; they use its leaves for wrapping food and its fruits for sweetening breads, palm wine and sour food. Their sweet proteins, thaumatin I and thaumatin II, were first identified in the 1970s in the search for non-toxic, non-calorific 'natural' sweeteners to replace synthetic ones 3.

Figure 2. Katemfe plant
~from Engler et al. Marantaceae, vol. 48: [Heft 11], p. 40, fig. 8 (1902).

Why do plants like Katemfe produce extremely sweet proteins? The answer may lie in the plant defence systems. Under environmental stresses or pathogen attack, plants can produce proteins that help them stay alive. In the case of Katemfe, attack by a viroid (a sub-viral pathogen) induces thaumatin production. Thaumatin has also been shown to have antifungal activities, which suggests it may be part of a defence mechanism that prevents further pathogen attacks 4.

In fact, thaumatin shares a conserved site (IPR017949) with a group of pathogenesis-related proteins, also known as thaumatin-like proteins (TLPs) 5, including tobacco salt-induced protein osmotin 6 and maize antifungal protein zeamatin 7. Like thaumatin, this group of proteins plays an essential part in plant defence against either environment stress or pathogen attack 8.

Another question is, why does thaumatin taste sweet to us? This is down to the sweetness receptors in our taste buds on our tongues. The sweet molecules (chemicals or proteins) are perceived by G-protein-coupled receptors, consisting of  two subunits, T1R2 and T1R3. Certain amino acid residues in these subunits affect their ability to recognise the sweet molecules 9. Interestingly, apes and Old World monkeys can perceive thaumatin as a sweet protein, while New World monkeys and rodents cannot 2.  In other words, the sweet taste of thaumatin for us humans could be just an evolutionary coincidence.

Figure 3. Sweet receptor, the peptide region involved in the response for thaumatin is shown in red 2.

So far, several chemical sweeteners have been commercialised, such as aspartame, sucralose and saccharin, and many more products may yet emerge. We all know that our sugar consumption causes health problems like obesity, diabetes and tooth decay. To avoid such health issues, scientists have searched far and wide to find alternatives. Natural sweeteners, such as Stevia and thaumatin, have provided new options for us. However, with so many different products on the market, as a consumer, I am still sitting on the fence to see which ones provide the best health benefits.

Figure 4. What should be in yours?
Additional information:
I. Katemfe:
Katemfe is a 3-4 metre tall shrub from the rain forests of West Africa. It bears light purple flowers and a soft fruit containing shiny black seeds. The fruit is covered in a fleshy red aril, the part that contains thaumatin.

II. E numbers:
Thaumatin has been approved by the European as a sweetener, known as E957. It is usually used in processed foods and has a slight licorice aftertaste.

III. Calories:
Despite thaumatin containing 4 calories/gram (3.87 calories/gram for sucrose), the amount needed to be used in food or drink is extremely small, due to its high potency.

IV.Other sweet proteins:
Besides thaumatin, there are a few other sweet proteins such as monellin (IPR015283), pentadin, mabinlin and brazzein 10,11.

V. Further reading:
How did stevia get mainstream?  -By Tom Heyden

Are sweeteners really bad for us? -By Claudia Hammond

1. Cardello HM, Da Silva MA, Damasio MH., Measurement of the relative sweetness of stevia extract, aspartame and cyclamate/saccharin blend as compared to sucrose at different concentrations. Plant Foods Hum Nutr. 54(2):119-30., 1999. [PMID:10646559]

2. Masuda T, Taguchi W, Sano A, Ohta K, Kitabatake N, Tani F., Five amino acid residues in cysteine-rich domain of human T1R3 were involved in the response for sweet-tasting protein, thaumatin. Biochimie. 95(7):1502-5., 2013. [PMID:23370115]

3. van der Wel H, Loeve K., Isolation and characterization of thaumatin I and II, the sweet-tasting proteins from Thaumatococcus daniellii Benth. Eur J Biochem. 31(2):221-5., 1972. [PMID:4647176]

4. Rodrigo I, Vera P, Frank R, Conejero V., Identification of the viroid-induced tomato pathogenesis-related (PR) protein P23 as the thaumatin-like tomato protein NP24 associated with osmotic stress. Plant Mol Biol. 16(5):931-4., 1991. [PMID:1859873]

5. Liu JJ, Sturrock R, Ekramoddoullah AK., The superfamily of thaumatin-like proteins: its origin, evolution, and expression towards biological function. Plant Cell Rep. 29(5):419-36., 2010. [PMID:20204373]

6. Subramanyam K, Arun M, Mariashibu TS, Theboral J, Rajesh M, Singh NK, Manickavasagam M, Ganapathi A., Overexpression of tobacco osmotin (Tbosm) in soybean conferred resistance to salinity stress and fungal infections. Planta. 236(6):1909-25., 2012. [PMID:22936305]

7. Schimoler-O'Rourke R, Richardson M, Selitrennikoff CP., Zeamatin inhibits trypsin and alpha-amylase activities. Appl Environ Microbiol. 67(5):2365-6., 2001. [PMID:11319124]

8. Monteiro S, Barakat M, Piçarra-Pereira MA, Teixeira AR, Ferreira RB. Osmotin and thaumatin from grape: a putative general defense mechanism against pathogenic fungi. Phytopathology. 93(12):1505-12, 2003. [PMID:18943614]

9. Masuda T, Mikami B, Tani F., Atomic structure of recombinant thaumatin II reveals flexible conformations in two residues critical for sweetness and three consecutive glycine residues. Biochimie. 106:33-8, 2014. [PMID:25066915]

10. Faus I, Recent developments in the characterization and biotechnological production of sweet-tasting proteins. Appl Microbiol Biotechnol. 53(2):145-51., 2000. [PMID:10709975]

11. Masuda T, Kitabatake N. Developments in biotechnological production of sweet proteins. 102(5):375-89.  J Biosci Bioeng. 2006. [PMID:17189164]

Wednesday, 26 November 2014

In the pipeline – streamlined InterPro production

You may have noticed that InterPro has had fewer releases than usual this year. It is not that we haven’t been working as hard as ever, integrating member database signatures into InterPro entries and adding Gene Ontology terms - we have! But a number of things have been going on behind the scenes, which we thought you might be interested in knowing about.

Sequence growth 
InterPro release 1.0, back in 2000, was built using a version of Swiss-Prot/TrEMBL that contained just over 300 thousand sequences. Our current InterPro release (49.0) is built using over 77 million Swiss-Prot/TrEMBL sequences. That is a massive amount of sequence growth - and even more remarkable is the fact that almost half of these sequences have been added in the last year.

A new InterPro production pipeline
As you might imagine, processing this number of sequences can cause all kinds of problems for computational pipelines that were developed when sequence data volumes were orders of magnitudes smaller. To make sure that we can handle the kind of data volume growth we have been seeing - and expect to see in the future - we have been busy rebuilding our production pipeline. The new system is built entirely on InterProScan, which, for a variety of complicated historical reasons, the previous version was not. This change helps streamline the production process, removes a number of bottlenecks, and generally makes many things associated with data production a lot less complicated.

Further pipeline developments and a new data centre 
To put these changes in place, we have had to focus a lot of our efforts on pipeline development, with knock-on effects on our release schedule. As a consequence, while we have maintained our usual rate of database integrations, these have been squeezed into slightly fewer InterPro releases. And, as a further complication, we have also recently moved all of our data (in the form of hard drives on the back of a truck - no, really!) to a new data centre, as part of EMBL-EBI’s consolidation of its Web infrastructure. This has impacted our release schedule further still. However, we believe that we are now much better placed to calculate and provide match data for our users. We think we are also better prepared for future data production challenges - as the number of protein sequences hits 100 million, and beyond.

Alex Mitchell
on behalf of the InterPro team