Journal of Virology & Antiviral ResearchISSN: 2324-8955

Research Article, J Virol Antivir Res Vol: 4 Issue: 3

Control of Host Gene Expression by a Herpesvirus Transcription Factor

Jay C Brown*
Department of Microbiology Immunology and Cancer Biology, University of Virginia Health System, Charlottesville, USA
Corresponding author : Jay C Brown
Department of Microbiology Immunology and Cancer Biology, University of Virginia Health System, Charlottesville, USA
Tel: 434-924-1814
E-mail: [email protected]
Received: July 20, 2015 Accepted: August 20, 2015 Published: August 25, 2015
Citation: Brown JC (2015) Control of Host Gene Expression by a Herpesvirus Transcription Factor. J Virol Antivir Res 4:3. doi:10.4172/2324-8955.1000140


Herpes simplex virus and other alpha-herpesviruses encode a transcription factor, VP16, able to activate expression of genes containing its binding site (TAATGARAT) in the promoter region. VP16 protein is present inside the infectious virion and it enters the host cell with the virus nucleocapsid. Once inside the cell, VP16 is able to cause the prompt expression of virus genes adjacent to its binding sites. I have examined the possibility that host gene expression may have the potential to be affected by the presence of VP16, a situation that could have important effects on alpha-herpesvirus replication. Bioinformatic methods were used to examine five human genome regions (each 16-73Mb in length) for the presence of genes with upstream TAATGARAT sequences. A total of fourteen characterized genes were identified indicating VP16 has the potential to activate their expression. The identified genes varied considerably in function, and did not appear to support a common theme or goal. The presence of an upstream TAATGARAT sequence was found to be well conserved in the homologous genes of chimpanzee where 11 of 14 homologous genes had upstream TAATGARAT’s. Conservation was poor, however, in three other species examined, mouse (2 of 14 genes), horse (2) and chicken (1). The observed pattern of conservation is interpreted to suggest that alpha-herpesviruses evolved the ability to benefit from expression of TAATGARAT-containing host genes and that this process was complete at or before the time chimpanzees and humans diverged evolutionarily (~7Mya).

Keywords: Alpha-herpes virus; Gene expression; Transcription factor; Bioinformatics; TAATGARAT; Herpesvirus evolution; VP16; Human genome


Alpha-herpesvirus; Gene expression; Transcription factor; Bioinformatics; TAATGARAT; Herpesvirus evolution; VP16; Human genome


Herpes simplex virus 1 (HSV1) encodes a transcription factor, VP16, able to activate expression of HSV1 immediate early genes. An unusual mechanism is involved. VP16 protein is present inside the infectious virion and it enters the host cell along with the virus DNA, capsid and tegument [1,2]. 600-700 VP16 molecules are involved [3]. After entering the cytoplasm, VP16 traffics to the nucleus where it binds the virus DNA and acts to initiate expression of immediate early genes [4]. IE gene products are themselves transcription factors (ICP4 and ICP8) that cause expression of HSV1 genes also activated early in virus replication.
The specificity of VP16 for immediate early genes is conferred by the presence of the VP16 binding site, TAATGARAT, in the IE gene promoter [4]. Recognition of TAATGARAT sequences (TAATGAAAT or TAATGAGAT) occurs by VP16 in a complex with two host-encoded proteins, HCF-1 and Oct1 [5,6]. Genes lacking an upstream TAATGARAT sequence are not activated. Activation of a control gene lacking a TAATGARAT site can be conferred by introducing one experimentally [4,7]. All alpha-herpesviruses encode a homolog of HSV1 VP16 and also have TAATGARAT sequences in the promoter regions of immediate early genes [8]. Beta- and gammaherpesviruses lack homologs of HSV1 VP16.
I have been pursuing the observation that HSV1 VP16 (encoded by UL48, an essential gene for HSV1 replication) is able to activate expression of host as well as virus genes as long as a TAATGARAT sequence is present. The study was motivated by the idea that TAATGARAT-containing host genes could have important consequences for virus growth or perhaps for the host response to virus infection.
A bioinformatic approach was employed. Five different regions of the human genome, each 16Mb or more in length were examined for the presence of genes containing upstream TAATGARAT sequences. The functions of such genes were then evaluated individually with the goal of understanding how they might affect HSV1 replication or the host response to replication. A total of fourteen characterized human genes were identified and some of their properties are reported here.

Materials and Methods

DNA sequences were retrieved from the NCBI database and examined with Genome Workbench ( gbench ). Human sequences were derived from build 38 (GRCh38 Primary Assembly HSCHR11_CTG1) Annotation Release 106. Accession Numbers for individual chromosomes are shown in Supplementary Table 1. Target regions of the human genome as shown in Table 1 were downloaded from the NCBI web site (http:// /) and manipulated with locally written Python scripts or with Emboss Explorer (http://cys.genomics.purdue. edu/emboss ).
Table 1: Regions of the human genome examined for genes with proximal TAATGARAT sequences.
A Python script based on lines.find was used to determine the positions of TAATGARAT sequences in human genome segments [9]. The positions of TAATGAAAT and TAATGAGAT sequences were determined on both DNA strands of target human DNA. Lists of TAATGARAT sequences generated in this way were examined visually to identify sequences in close proximity to annotated gene start sites. TAATGARAT sequences upstream of the start site were given minus numbers to indicate the distance to the gene start; plus numbers refer to TAATGARAT’s between the gene start and the protein initiating ATG. TAATGARAT sequences were ignored if they were downstream of the ATG or more than 1600bp upstream of the gene start. The upstream distance (1600bp) was chosen as a compromise between an average eukaryotic promoter length and the length of the longest ones [10]. Human genome regions were randomized using the shuffleseq program in Emboss Explorer.
Studies with non-human species began with the list of human genes having proximal TAATGARAT sequences (Table 2). Functions of the NCBI web site were used to identify and localize homologs in the genomes of chimpanzee, horse, mouse and chicken. Relevant Accession Numbers are shown in Supplementary Table 1. Upstream TAATGARAT sites were identified by displaying the chromosome sequence with Genome Workbench and using the find function to identify TAATGARAT sequences proximal to the gene start. Expected TAATGARAT frequencies were calculated assuming all human genome regions studied have a 41% GC content. Expected values used for TAATGAAAT and TAATGAGAT were (0.205)8*(0.585) or 0.00117% and (0.205)7*(0.585)2 or 0.00081%, respectively.
Table 2: Human genes with TAATGARAT-proximal sequences.


Human genes with proximal TAATGARAT sequences
Table 1 shows a list of the five human genome regions examined. Each was derived from a different chromosome and the five varied in length from 16.5Mb to 73.2Mb. The aggregate length was 188.6Mb or approximately 6% of the human genome. The five regions were chosen because each contains gene-rich regions. Together the five regions contain a total of 2846 genes and 9622 TAATGARAT sequences. Randomized versions of the five regions contained fewer TAATGARAT’s than the parent human sequences, but the difference was modest (7909 randomized vs 9622 human).
As described in Materials and Methods, a Python script was used to determine the locations of TAATGARAT sequences in the five target genome regions. Each TAATGARAT sequence was then examined individually to determine whether it was located upstream (within 1600bp) of a gene start site. Genes with upstream TAATGARAT sequences were entered into Table 2 [11-24] for genes with known functions or Table 3 for uncharacterized genes and LINC RNA’s.
Table 3: Human non-gene features with TAATGARAT-proximal sequences.
The above analysis yielded a total of 14 TAATGARAT-containing, characterized genes among the five genome regions tested (Table 2). Eight other features were also identified, 6 uncharacterized genes and 2 LINC RNA’s (Table 3). At least one TAATGARAT-containing gene was found in each of the five chromosome regions examined.
Overall, genes with upstream TAATGARAT sequences were found to be rare. For instance, among the 2846 genes present in the human genome regions examined, only 22 (14 characterized genes plus 8 other features) had TAATGARAT sequences <1600bp upstream of a gene start site. Similar upstream TAATGARAT’s were also rare compared to the expected number based on a random DNA sequence. While 22 TAATGARAT-containing genes were observed in the human genome sequences examined, 90 such genes are expected among the 2846 genes if each has 1600 randomized nucleotides of upstream human DNA. (See Materials and Methods for calculation). The result suggests TAATGARAT sequences are depleted in genecontaining regions of the human genome.
A variety of functions are observed among the 14 characterized genes with upstream TAATGARAT sequences (Table 2). For example, there are genes involved in vesicle transport (COPB1, DCTN1), amino acid uptake (SLC43A1), receptor phosphorylation (ADRBK1), gene expression (PHOX2B) and electron transport (COX7B2). The diversity of genes observed has defied efforts to identify a common goal or theme they might serve. I therefore favor the view that TAATGARAT-containing host genes are unlikely to relate to a single, unified function.
Animal homologs of human genes with proximal TAATGARAT sequences
The presence of human genes with upstream TAATGARAT sequences suggested other species may have similar genes. Genes with upstream TAATGARAT’s are expected particularly in species infected by alpha-herpesviruses as these express a VP16 homolog. This expectation was tested beginning with chimpanzee, horse, mouse and chicken genomes. Like humans, chimpanzees, horses and chickens are all able to be infected in nature with multiple alpha-herpesvirus species. Mouse, however, serves as an experimental control as mice are not naturally infected with any well characterized alpha-herpesvirus. Tests were carried out with the 14 characterized human genes found to have upstream TAATGARAT sequences (Table 2). Animal homologs of the human genes were identified and their sequences examined for the presence of upstream TAATGARAT sequences.
The results showed that all 14 human genes found to have an upstream TAATGARAT have a homolog in the chimpanzee genome (Table 4). All 14 chimpanzee genes are present on the same chromosome as the human homolog. Eleven of the fourteen were found to have upstream TAATGARAT sequences in chimpanzee as they do in humans. Those lacking TAATGARAT sequences were CYYR1, COPB1 and OR5M3 (Table 5). The distance between the gene start and the TAATGARAT sequence was found to be similar in human and chimpanzee genes; average values were 961 ± 327 bp (n=13) in human and 998 ± 382 bp (n=9) in chimpanzee.
Table 4: Chimpanzee genes with TAATGARAT-proximal sequences.
Table 5: Conservation of upstream TAATGARAT sequences in representative host species infected by multiple alpha-herpesviruses.
The results for horse, mouse and chicken are summarized in Table 5. They show that the horse and mouse genomes both have homologs for all 14 human TAATGARAT-containing genes identified in this study. No homologs for OR5M3, SLC43A1 or COX7B2, however, could be identified in the chicken genome. In contrast to the situation with chimpanzee, most of the 14 human TAATGARAT-containing genes were found to lack TAATGARAT sequences in horse, mouse and chicken. Counts were 2 of 14 in horse, 2 in mouse and 1 in chicken. The counts in horse and chicken are suggested to be at or near the background level as neither is above the value observed in mouse (2 of 14 genes), a species not infected in nature by an alpha-herpesvirus.


Host or virus evolution?
Viruses are well known for their ability to co-evolve with their host species. Relevant host evolution is directed to suppressing virus growth or eliminating it altogether. The virus part of the relationship, however, is more subtle. The virus must use the host to propagate itself while minimizing the extent of host pathogenesis involved. In principle, the presence of upstream TAATGARAT sequences in host genes could be a result of either host or virus adaptation. Evolution of the virus, for instance, could have resulted in adaptation of VP16 to make use of host sequences upstream of genes able to potentiate virus replication. In contrast, the host could adapt its promoter regions so that the presence of VP16 causes activation of anti-viral functions.
Although both processes could be active depending on the specific gene, I favor the view that the identity of TAATGARAT-containing human genes support the former view, that virus evolution has caused adaptation of VP16 to promote expression of host genes that enhance virus growth. One group of the identified genes, in particular, suggest themselves for a role in potentiating HSV1 replication. Dynactin (DCTN1) and coatomer protein (COPB1) could be involved in virus glycoprotein transport in vesicles. IL10 receptor binding protein (IL10RB) could be involved in downregulation of the immune response to infection and aspartate carbamylase (CAD) could help serve the need of pyrimidines in virus DNA replication. The SGCZ gene region encodes a microRNA (miR383) with the ability to antagonize interferon production in infected cells [25], a function that could promote HSV1 replication.
Virus-induced shut-off of host cell protein synthesis
Infection of a host cell by HSV1 involves a down-regulation of host gene expression [26,27]. Messenger RNA’s are degraded by a virus-encoded ribonuclease, and an effective mechanism is activated to prevent spliced host mRNA’s from reaching the cytoplasm where they can be translated [28]. In view of the robust mechanisms available for inhibition of host gene expression, it was surprising to observe the presence of human genes with upstream TAATGARAT sequences and therefore with the potential to be activated by HSV1- encoded VP16. Why would the virus activate expression of genes whose mRNA’s are destined for prompt degradation?
I suggest the explanation may have to do with the mechanism of gene activation. In contrast to host genes, virus genes are expressed at a high level following infection. It may be that expression of host genes by way of the VP16-TAATGARAT system causes them to take on the character of virus genes and avoid the down-regulation that affects most host genes. In support of this view, there exists experimental evidence for up-regulation of two TAATGARAT-containing human genes (COPB1 and U3 snRNA) following infection with HSV1 [29-31].
An alternative explanation emphasizes the role of the virusencoded ribonuclease (UL41 gene). Called the virion host shutoff protein, this enzyme is an important part of the mechanism by which HSV1 down-regulates host protein synthesis and favors translation of virus-encoded mRNA’s [32-34]. UL41 protein is a component of the HSV1 tegument. Like VP16, UL41 is introduced promptly after infection into the host cell cytoplasm where it causes degradation of messenger RNA’s resulting in the severe attenuation of host protein synthesis mentioned above. Late in infection, however, UL41 activity is itself attenuated [35]. It is reasonable to think therefore that attenuation of UL41 ribonuclease activity could provide an opportunity for expression of host genes with upstream TAATGARAT sequences as described here. The above explanation suggests that host genes activated by upstream TAATGARAT sequences would be expressed late in HSV1 infection, a suggestion that can be tested experimentally.
Evolution of human genes with TAATGARAT-proximal sequences
The observation of human genes with upstream TAATGARAT sequences as described here suggested other species might have similar genes, and this was found to be the case. Of 14 human genes with upstream TAATGARAT sequences, 11 chimpanzee homologs were also found to have them (Table 5). This result indicates that TAATGARAT sequences in the 11 human genes arose evolutionarily at or before the time human and chimpanzee species diverged (~7 million years ago; Mya).
The results with other animal species allow one to put a lower limit on the time TAATGARAT sequences arose in the human genes examined. Two was the highest number of TAATGARAT-containing homologs in horse, the most highly evolved of the three relevant species examined (Table 5). This observation suggests most of the 14 TAATGARAT-containing human genes arose after the evolutionary divergence of human and horse (~90 Mya). It emphasizes that upstream TAATGARAT sequences were evolved relatively recently (i.e. < 90Mya) in the 200 Mya history of the herpesvirus family [36].

Competing interests

The author declares no conflict of interest.


Author thankful to Jennifer Thompson for help during early stages of this investigation. For comments on early drafts of the manuscript I gratefully acknowledge Juliet Spencer, Fred Homa, Robert Visalli and Laura Hanson.


Track Your Manuscript

Share This Page

Media Partners