Journal of Applied Bioinformatics & Computational BiologyISSN: 2329-9533

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Research Article, J Appl Bioinforma Comput Biol Vol: 8 Issue: 1

Phylogenetic and In Silico Analysis of Niemann-pick type C1 (NPC1) Transporter Required for Ebola Viral Entry

Onyeka Solomon Chukwudozie*

Department of Cell Biology and Genetics, University of Lagos, Yaba, Lagos, Nigeria

*Corresponding Author: Onyeka Solomon Chukwudozie
Department of Cell Biology and Genetics, University of Lagos, Nigeria
Tel: 08188293195, 07018188570
E-mail: [email protected]

Received: January 08, 2019 Accepted: February 19, 2019 Published: February 27, 2019

Citation: Chukwudozie OS (2019) Phylogenetic and In Silico Analysis of Niemann-pick type C1 (NPC1) Transporter Required for Ebola Viral Entry. J Appl Bioinforma Comput Biol 8:1.


Ebola virus (EBOV) has recently emerged as a highly virulent organism of concern to humans. The pathology behind the hemorrhagic fever within an EBOV host remains largely unclear, and demands rather novel therapeutics to effectively eradicate this virus. The Niemann-pick type C1 (NPC1) protein cholesterol transporter is required for EBOV entry and internalization into a mammalian host cell. On this premise, an organism’s NPC1 can decide the virus-species specificity. Here, the phylogenetic relationships among twelve selected chordates were examined to determine their predicted ligand binding site (LBS). LBS delineates the functional glycosylated part of the NPC1 transporter and mediates in molecular processes. The 13-helix transmembrane of NPC1 and its sub-domains were also assessed, alongside potential sites of direct interaction with the Ebola glycoprotein. The study revealed that eight functional glycosylated residues: Y420, Q421, Y423, P424, S425, G426, D428 and N452 in the human NPC1 domain C (NPC1-C) surface can form protein complexes with EBOV cleaved glycoprotein (GPcl). Mutation analysis of the NPC1-C in these chordates also revealed multiple deletions and insertions in snakes, but single amino acid changes in the bat, rat and carnivore species. The mutation attributes of the chordates can remarkably contribute to their refractory capacity to EBOV infection. More studies on the orthologs of NPC1 are required to delineate the key sequences that aid or inhibit how susceptible organisms are to Ebola virus infection.

Keywords: Cholesterol; Ebola virus; Niemann-pick type C1 (NPC1); Glycosylation; Filovirus.


Cholesterol; Ebola virus; Niemann-pick type C1 (NPC1); Glycosylation; Filovirus.


Ebola virus (EBOV) is a filovirus that belongs to the Filoviridae family which contains negative RNA strands [1,2]. The Ebola virus genus contains 5 species: Bundibugyo (BDBV), Reston (RESTV), Sudan (SUDV), Tai Forest (TAFV) and Zaire Ebolavirus (ZEBOV) [3]. EBOV has gained importance within the last decade for its fatal endemic cases that claimed thousands of human lives in Africa and some other parts of the world, particularly in the tropics. Up till date, a validated vaccine is yet to be developed to combat EBOV, because there is very little knowledge about the receptor interactions between EBOV and its host organism. Ebola viral genomes are usually 19 kb long [2]. The virus particle contains 7 genes that encode for its 7 functional proteins, which are a class of virion proteins (VP): VP24, VP30, VP35, VP40, nucleoprotein (NP), glycoprotein (GP), and polymerase (L) [2]. The proteins are responsible for the virus budding and assembly into host cells to necessarily facilitate its lifecycle and regeneration. However, the VPs are very important because of their suppression of the host immunity [2]. To this effect, the VPs are potential drug targets for EBOV eradication.

Niemann-Pick type C1 (NPC1) is a membrane protein that facilitates intracellular cholesterol transport in mammals. In humans, the NPC1 gene is encoded on chromosome 22 at the 18q11 location [4,5]. The mammalian NPC1 cholesterol transporter aids EBOV spread and infection. Studies have shown that the EBOV gains access into human cells through micropinocytosis upon binding with NPC1 at small luminal domain sites [6,7]. After internalization into the cell, EBOV containing vesicles are delivered to the endosomes, protease cathepsin B/L selectively removes the GP1 cap and mucin domain to expose the NPC1‐binding sites [8]. Then, a GP2 drives the fusion of the viral and endosome membranes to release the viral genetic material into the host cell cytoplasm and further initiate viral replication. In this process, only the NPC1 luminal domain C is required for EBOV glycoprotein binding [6].

Cote et al. [7] investigated human patients with cells that lack NPC1 transporter and exposed them to EBOV under laboratory conditions for 21 days. They demonstrated that cells that survived after this period were largely impervious to EBOV, and concluded that Ebola solely relies on NPC1 to penetrate its host cells [7]. Thus, the role of NPC1 as a critical filovirus receptor in mediating EBOV infection of host organisms is strengthened, especially by its provision of a direct binding site for the viral envelope glycoprotein. There is also the co-facilitation of the second lysosomal domain of NPC1 in the EBOV-host binding interaction [9]. NPC1 is therefore an indispensable therapeutic target in the development of an Ebola antiviral drug, with the sole aim to prevent host cell invasion and possible replication in the host.

It is commonly speculated that cats, dogs and carnivores in general are immune to EBOV infection [10]. Yet, this hypothesis has not been clinically tested by researchers in this field. As a result, it is largely unverified if carnivores are included in the potential unaffected host of EBOV. On the other hand, Megachiroptera (fruit bats) are suspected to be the major reservoirs of EBOV [11,12]. Fruit bats are asymptomatic to EBOV infection, while the roots of this ability in this organism is not clear, it is likely that this may have arisen from evolutionary adaptations within an ancestral clade Chiroptera that gave rise to all modern fruit bats, including recurrent mutations that continue to confer EBOV resistance to this group. The current study is therefore timely to elucidate the evolutionary implications and inclusion of carnivores, reptiles, aves, rodents and other mammalian cells in benign investigations of their susceptibility and immunity to EBOV infections. The aim of this study is to (i) Detect the functional protein residues in the NPC1 transporter involved in the EBOV glycoprotein binding complex, (ii) Evaluate the mutational shifts in the test chordate’s NPCI protein sequences that contribute to either their susceptibility or refractory capacity to EBOV infection, and (iii) Investigate the phylogenetic relationships among the selected test chordates.

Materials and Methods

Protein retrieval, alignment and structural prediction

The chordates selected for this study were human, bat, horse, dog, cat, snake, pig, pigeon, gorilla, chimpanzee, monkey and rat. The NCP1 protein sequences of the test chordates were retrieved from the UniProt Protein Knowledge Base (UniProtKB database). Protein sequence alignment followed the apt consideration of conserved and mutated sequence positions using Constraint-based Multiple Alignment Tool (COBALT). Amino acid similarities at different sites were evaluated using BLOSUM62 scores. Protein models with partially predicted structures or not published in the protein database (PDB) were predicted using the Raptor X software, with attendant p-values to evaluate the relative quality of the models. The lower the p-value, the higher the predicted model. For alpha proteins, p-values below 10-3 is a good indicator of the model, whereas for beta proteins, p-values below 10-4 is a good model indicator. Raptor X uses a non-linear scoring function to combine homologous and structural information for a given templatesequence alignment [13]. Finally, membrane protein topology was predicted using Phyre2.

Global Distance Test (GDT) score was applied to the standard prediction of the protein model, and defined as [13]:

1*N (1) +0.75*N (2) +0.5*N (4) +0.25*N             [8]

Where N(x) is the number of residues with estimated modelling error (in Å) smaller than x.

For a protein with more than 100 residues, a GDT (>50) is a good indicator, whereas for a protein with less than 100 residues, GDT (>50) indicates that only a small part of the model is good or reliable. To ascertain protein structure quality, both p-values and GDT were considered. The PROTTER online server and the 3D ligand binding site prediction server were used to predict the ligand binding site of the functional sites of the proteins. Ramachandran plots were then obtained from the Swiss dock online server. The subcellular location of the protein was accessed with the aid of Deep Loc-1.0 [25].

Protein docking and scoring

The protein-protein complex docking and its corresponding algorithmic scorings were done using the pyDOCK server (https:// The visualization was conducted with the aid of Jmol. The PDB code of the already present crystallized protein structure in the protein database (http: was inserted into the required box field, before selecting the binding amino residues and chain ID of the protein molecules. Protein molecules were treated as a rigid structure, then the electrostatic and desolvation scores of the protein-protein interaction (PPI) considered [14]. Mostly ten models were predicted to have the option of selecting the most accurate result.

Comparison of organisms

The glycoprotein of the EBOV with the NPC1 protein structure of the M. lucifugus (bat), F. catus (cat), C. familiaris (cat) and H. sapiens (human) were further assessed to study the similarities between the organism’s protein structures which could portray their molecular evolutionary relationships.


This study mainly investigated EBOV infection using the NPC1 protein sequence in the 12 selected chordates. The total amino acid sequences of the test chordates ranged from 1-1353 amino acid residues (Table 1). Ophiophagus hannah (cobra) was the outgroup of the test chordates as it was characterized by frequent deletions detected at various sequence positions, particularly between positions 800-1100 of the amino acid residue (Figure 1). Felis catus (cat) and Canis familiaris (dog) showed close evolutionarily relatedness to Myotis lucifugus, in comparison to the other chordates which showed distant evolutionarily relatedness. Ape species, such as Gorilla gorilla (gorilla), Pan troglodytes (chimpanzee), Homo sapiens (human), and Chlorocebus sabeaus (green monkey) were all more closely related and distant from the other chordates (Figure 2).

Figure 1: NPC1 protein sequence alignment of 12 test chordates. Sequence alignment between the organisms was similar, except in Ophiophagus hannah (cobra) that showed several deletions at specific amino positions. Note: Red indicates fully aligned and conserved sites (monomorphic sequence), Ash indicates polymorphic sequences, Black highlights the chordates, Spaces are deletions.

Figure 2:Phylogenetic relationships of species’ NPC1 proteins. The siister group comprising F. catus and C. familiaris showed little protein sequence divergence from M. lucifugus, whereas O. hannah showed high evolutionary divergence from all of the chordates. The ape family showed close evolutionary relatedness with little divergence, but extreme distances from M. lucifugus. Note: M. lucifugus (bat) was the template organism used for comparisons.

S/N Species Number of sequences
1 Homo sapiens (Human) 1278
2 Myotis lucifugus (Bat) 1283
3 Rattus norvegicus (Rat) 1278
4 Canis familiaris (Dog) 1277
5 Pan troglodytes (Chimpanzee) 1277
6 Chlorocebus sabaeus (Monkey) 1278
7 Patagioenas monilis (Pigeon) 1287
8 Equus caballus (Horse) 1277
9 Gorilla gorilla (Gorilla) 1277
10 Sus scrofa (Pig) 1277
11 Felis catus (Cat) 1276
12 Ophiophaghus hannah (Snake) 1196

Table 1: Studied chordate’s NPC1 proteins.

The analysis of the subcellular location of the NPC1 transporter in the chordates revealed that the main compartment of the protein in the cell is the lysosome (Figure 3a and 3b).

Figure 3a and 3b; NPC1 subcellular locations and positional importance. i: Hierarchical tree of the subcellular location of NPC1 proteins localized within the membrane compartments of cellular lysosomes and endosomes. The probability that NPC1 protein is located at the lysosome is higher than the other cellular organelles. The tree also revealed that it is an extracellular intermediate compartment, involved in a secretary pathway ii: Total amino sequences and positional importance. Positions within 300 to 350 showed a higher peak than other positions.

The glycoprotein of the EBOV was predicted as the structural function is important for the binding of the virus to the NPC1 protein of its targeting host. The Ramachandran plots (Figure 4) and membrane topology were obtained (Figure 4-9).

Figure 4a, 4b and 4c: Ebola virus glycoprotein, Ramachandran plot for its non-glycine and proline residues, and the protein membrane topology. i: The protein structure was predicted with best template of 5jq3A. The secondary structure of the EBOV glycoprotein comprises of 20% helix, 15% Beta-sheet, 63% loop. The solvent attributes of the protein are 58% Exposed, 19% Medium and 22% Buried. ii: The Ramachandran plot as generated by Swiss Dock showed 94.4% favored. The favored regions are depicted in the green region, while allowed region in mint color. The disallowed region is shown in white and the residues found there are the outliers which at 1.44% are A204 ASN, A211 SER, A203 VAL, A205 ALA respectively. While at 0.83% the outliers are A258 GLU and A257 ASN. The Q-mean was -1.51, Cß interaction energy is -1.46, solvation -1.12 and torsion at -1.07. The G-parameters showed that the structure is in its best representation iii: The membrane protein topology is a 676-residues, two transmembrane spanning fusion glycoproteins, with 27 signal peptides.

Figure 5: Disordered and secondary prediction of the EBOV glycoprotein. The disordered region is 48 %, 27 % of alpha helix constitute the protein structure, 14 % of beta strand and 6 % of the transmembrane helix.

The protein structure of M. lucifigus consists of 56% alpha helix, 6% beta strand, 26% transmembrane (TM) helix and 19% of the protein is disordered. The glycosylated site of M. lucifigus sequence which depicts the functional site was predicted at H452 (histidine residue at sequence position 452) with a p-value of 3.53e- 20 and GDT of 860 (83%) (Figure 6a). The solvent accession of the protein was 32% exposed (E), 41% medium (M) and 25% buried (B) respectively.

Figure 6a): M. lucifugus 3D NPC1 protein structure. 6b): The ligand binding site of M. lucifugus NPC1 protein depicting its functional glycosylated site.

Figure 7: Felis catus 3D NPC1 protein structure. The protein structure of Felis catus had Disordered (20%), Alpha helix (54%), Beta strand (6%) and TM helix (28%) respectively. The functional binding residue of the Felis catus sequence was N452 (Asparagine at position 452) with a p-value of 4.30e-20 and a GDT score of 861(84%). The solvent accessibility was 32% exposed, 42% medium and 25% buried respectively.

Figure 8 :Canis familiaris 3D NPC1 protein structure. The 3D protein structure of Canis familiaris revealed that 19% of the protein was disordered, prevalence of the alpha helix (54%), 7% of the beta strand and 27% of the TM helix. The functional binding residue of the Canis familiaris sequence with a p-value of 3.25e-20 and a GDT score of 855(83%). The solvent accessibility was 32% exposed, 42% medium and 25% buried.

Figure 9a :Homo sapiens 3D NPC1 protein structure (PDB: 5U73). The protein structure had 213 (16%) positions which were disordered, 48% Helix, 7% beta-sheets and 43% loop were predicted for the secondary structure of the protein. The solvent accessibility as 32% exposed, 42% medium and 25% buried.

Figure 9b :Transmembrane helix structure of the human NPC1 protein depicting its topology. The membrane protein topology of NPC1 is 13 transmembrane spanning helices. The extracellular and cytoplasmic sides of the membrane are labelled at the beginning and end of each membrane helix illustrated with a number indicating the residues of the sequence. Colour annotations were also cited at the top far right of the structure.

Positions 420 to 428 constitutes the human NPC1-C domain, and the amino residues that mediates its interaction with the EBOV glycoprotein complex was identified. While the EBOV cleaved glycoprotein (GPcl) residues from positions 79 to 170 was involved in the binding complex with the NPC1-C. The predicted binding residues of the human NPC1-C are Y420, Q421, Y423, P424, S425, G426, D428 and N452 (Figure 10a and 10b) while that of the EBOV GPcl are V79, P80, T83, W86, G87, F88, L11, E112, I113, V141, G145, P146, C147, A152, I170, K114, G118, S142, G143 and T144 (Figure 10a).

Figure 10a :Cartoon representations of the pre-small/secreted or cleaved Ebola glycoprotein (GPcl) with binding amino residues. The disulfide bond is represented in yellow, loops in white and the helices in cyan.

Figure 10b :Crystal structure of the human NPC1-luminal C domain. The NPC1-C is required for the Ebola viral cell internalization, alongside its binding residues and glycosylated sites. The helices (α and η) are colored in magenta while the β strands are represented in cyan. The loops are colored in white while the disulfide bonds are represented in yellow. NPC1-C displays a helical core structure surrounded by several β strands with two extended loops.

Some of the amino complex between NPC1-C/GPcl amino residues were summarized in Table 2 which were predicted by ligand protein contacts and contacts of structural units (LPC CSU) online server.

V79 Y423
S142 Q421
F88 D428
V141 P424
T114 Y420, Q421
S142 Q421, Y424
G87 F503, Q421
P80 D501
T83 D501
K114 Q421

Table 2: The protein interaction between EBOV GPcl and NPC1-C.


Biologically, viruses do not have a well-defined metabolic pathway, and thus are reliant on other organisms, specifically target hosts for their survival and replication. This study investigated the role of the EBOV and NPC1-C domain in twelve chordates. The EBOV replicates rapidly after invading its host cell through interactions with the NPC1 transporter. The study corroborated the capacity of Myotis lucifugus (bats) as better resistant host organisms for EBOV than other chordates. The major result of this study strongly suggests that a non-synonymous substitution or perhaps a single mutation in the amino acid composition of the bat protein may be responsible for their role in harboring the EBOV without fatal implications. Furthermore, the phylogeny of the 12 chordates revealed that the sister group F. catus and C. familiaris were closely related to M. lucifugus. Several workers have proposed that the cells of carnivores also harbor the EBOV without any fatal consequences to the organisms [10]. Yet, only a few studies have tested the efficacy of these propositions. In this study, I demonstrate that cats and dogs are equally asymptomatic to EBOV infection, which may likely arise from their close evolutionary relatedness to bats and high protein sequence similarity. This validates the proposition that carnivore cells are not infected by the virus.

The phylogenetic analysis also revealed that O. hannah (cobra) was an outgroup of the chordates, showing several deletions in its protein sequence. This potentially signifies that cobras might also exhibit strong resistance to EBOV infection. Mammalian cells are generally prone to filovirus infections, but reptilian and amphibian cells are reported to be refractory to such infections [15,16]. A study showed that wild-type (WT) EBOV and Sudan virus (SUDV) did not infect VH-2 cells derived from Russell’s viper Daboia russellii [17]. Emily et al. [17] reported that VH-2 cells are resistant to EBOV infection because Russell’s viper NPC1 ortholog bound poorly to the EBOV spike glycoprotein. Summarily, the phylogenetic analysis suggests that EBOV infection may strongly affect only primate species, which exhibited high evolutionary relatedness to each other.

The membrane topology of the NPC1 protein is a 13-helix structure with three luminal domains, namely a cholesterol‐binding N‐terminal domain A, domain C (loop2) and domain I. Also, there is a sterol sensing domain in the NPC1 anatomy. Thus, only the NPC1 luminal domain C is required for viral glycoprotein binding [6]. This study corroborated this view with the protein structure generated (Figure 10c). Interestingly too, it was shown recently that EBOV assembly at the plasma membrane is cholesterol‐dependent, requiring cholesterol concentrations to stabilize the virus particle [18]. Recently, a crystal structure of a complex between the middle luminal domain (MLD) of NPC1 and the cleaved glycoprotein (GP) of the EBOV was reported [9,19]. In this study, I demonstrated the second luminal domain (C) of M. lucifugus, F. catus, C. familiaris and H. sapiens NPC1 are required for the direct contact with the cleaved form of EBOV GP. In brief, this signifies that NPC1 domain C binding is important for EBOV invasion and internalization into a primate cell environment.

Figure 10c :The protein-protein interaction complex of the EBOV GPcl and the human NPC1-C. The complex also revealed that the loop 2 of the NPC1 structure mediates the GPcl binding. The electrostatic energy for the binding complex was scored at -17.940, the desolvaton value was 1.411 and the Vander Waal force was estimated at 72.099%.

Several mutations in the functional glycosylated amino acid composition of the NPC1-C domain were considered as the precise functional sites represent the species-specific relationship with the viral particle. Various positional residues that constitute the NPC1-C domain in the studied chordates revealed the protein conformational change at recognizing the EBOV glycoprotein. Positions 420, 421, 424, 425 and 452 were not conserved in all of the studied chordates (Figure 11). At position 421, the chordates exhibited mutations from the wild type which consist of glutamine residue. Collectively, in all of the studied protein positional residues, mutations were more in snakes compared to other chordates (Figure 11), and this observation signifies that a mutation at position 421 could retrospect the refractory abilities of an organism towards the EBOV infection.

Figure 11 :Amino acid residues of the twelve chordates’ NPC1-C domain and mutational changes at different positions. At position 420, tyrosine was common in all of the organisms except phenylalanine residue that was found in snake. Glutamine was common at position 421 but mutated to glutamic acid in bat and rat, serine in pigeon, histidine in pig and lysine in snake. Position 423 was monomorphic in all of the chordates which was dominated with tyrosine. Tyrosine residues was also common in position 424 except for snake that had valine while phenylalanine was also exhibited by all chordate in position 425 except snake that had lysine. Positions 423 and 428 were conversed in all of the chordates. The glycosylated amino residues at position 452 of the studied chordates tend towards monomorphism as it was dominated with asparagine. But the presence of histidine for M. lucifugus and serine for R.norvegicus rendered the sequence polymorphic, indicating it’s not fully conserved.
Note: *Conserved site of the predicted N-glycosylated sites of the protein. The various positions are indicated by the arrow

The twelve chordates all had a conserved glycosylated asparagine amino composition at position 452 of the protein sequence (N452), except Myotis lucifugus which had histidine in place of asparagine and Rattus norvegicus with serine in place of asparagine (Figure 11-12). This connotes that the wild type NCP1 protein which has asparagine residue position 452 is mutated in both bats and rats. The amino acid sequence of the other chordates apart from M. lucifugus and R. norvegicus might have undergone mutation through the course of evolution. A mutated form of NCP1 is liable to exhibit some NPC1 inhibitors which can prevent viral infection [20] and the Ebola virus was completely noncontagious in Npc1−/− mice [18]. This could as well delineate that the NPC1-middle luminal C domain could have undergone a form protein modification in the chordates that exhibited a mutant type. This modification could bring about an alteration or change in their functional state which includes recognition of ligands and enzymatic activities. The biosynthesis of Asparagine in mammalian cells is aided by asparagine synthase and the inhibition of asparagine synthase either through chemical or genetic approach, has shown to impair some viral replication like that of vaccinia virus [22]. According to Anil et al. [22] they concluded that vaccinia virus infection enhanced asparagine synthetase expression in the cell [22]. The VP35 of the EBOV glycoprotein is known to antagonize innate immune system by binding to dsRNA, therefore, it is possible that a single amino acid substitution could inhibit VP35’s ability to antagonize the immune system [23].

Figure 12 :Schematic representation of NPC1 three luminal domains. It depicts the A, C, I sites including the sterol sensing domains (SSD). Also shown, is the direct contact of the Ebola virus glycoprotein with the NPC1 domain C necessary for its entry into the cell, precisely at loop 2.


The incident of Ebola virus outbreak in tropical West Africa and other parts of the world was a threat to human lives which propelled the scientific community to investigate the molecular processes that govern its virulence. This study, therefore, utilized the scope of computational modeling by analyzing the proteomic properties of the selected chordates’ NPC1 protein structure to understand the immunity and organisms’ vulnerability to the virus. The in-silico analysis of the NPC1-C domain revealed that it is required for the binding of the Ebola virus glycoprotein. The chordates’ susceptibility and refractory abilities to the viral infection can be studied with residues at positions 420, 421, 423,424, 425,426, 428 and 452, as these amino positions are localized at the NPC1-C interface which deeply constitutes a binding complex with the EBOV GPcl. A clinical approach of this study will further help to shrink the multi-faceted mechanisms at which the Ebola virus gain access to the host cell. Also, with the advent and proper study of the three-dimensional (3D) structures which helps interprets available experimental data and sequence variation of the Ebola virus species, will not only allows researchers to understand detailed mechanisms for cell entry, virus assembly, and immune suppression, but also provides promising leads for structure-based drug design.


I sincerely appreciate the undiminished effort of Ndiribe, C. (Ph.D.) University of Lagos, Nigeria, for her preliminary and continuous comments on the development of this manuscript. I also extend my gratitude to Treeline Innovations agency for their technical supports in helping out in the retrieving of sequences from the Uniprot and other databases. Finally, I thank Mr. Okugbesan Adetokunbo for his constant support in seeing to the completion of this manuscript.

Conflict of Interest

I declare that there is no conflict of interest


international publisher, scitechnol, subscription journals, subscription, international, publisher, science

Track Your Manuscript

Recommended Conferences

19th World Congress on Structural Biology

Paris, France