Journal of Genes and Proteins-

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Review Article, J Genes Proteins Vol: 1 Issue: 1

Metagenomics Prospective in Bio-mining The Microbial Enzymes

Santosh Thapa, Hui Li, Joshua O'Hair, Sarabjit Bhatti and Suping Zhou*

Department of Agricultural and Environmental Sciences, Tennessee State University, USA

*Corresponding Author : Suping Zhou
Research Professor, Department of Agricultural and Environmental Sciences, Tennessee State University, Nashville, TN, 37209, USA

Received: December 14, 2017 Accepted: December 20, 2017 Published: December 27, 2017

Citation: Thapa S, Li H, Joshua O, Bhatti HS, Zhou S (2017) Metagenomics Prospective in Bio-mining The Microbial Enzymes. J Genes Proteins 1:1.


Microbes are considered as the most numerous bio-chemical repertoires of inordinately diverse and novel functional bio-catalysts crucial for the development of eco-friendly biotechnological applications. Given the fact that less than 1% of the micro-organisms in the natural environment are traceable by the traditional laboratory techniques, scientists have barely scratched the vast microbial genetic insights. The metagenomics approach portrays a plethora of information on the structure, chemical composition, functionality and capability of the biotechnological potential unclassified industrial relevant enzymes. Herein, in this mini-review, we discuss the importance of novel bio-catalysts predicted from diverse microbial metagenomes and the impact it has on future research for novel industrial applications, implications of Next Generation Sequencing (NGS), advanced bio-informatic tools and future prospective of metagenomic approach.

Keywords: Metagenomics; Biotechnological; Microbial; Genomics; Bio-catalyst; Next generation sequencing


Ideal Bio-catalysts and the untapped microbial resource

Biological enzymes and catalysts have tremendous prospective in biotechnological applications. They can be used in broad spectrum ranging from active ingredients for laundry detergents, pulp and feed industry, to synthesize stereo-chemically challenging chiral SYNTHONS in bio-pharmaceutical industries [1-3]. In addition, the versatile nature of such biological enzymes further boosts their applications in the bio-degradation of natural polymers such as starch, cellulose, proteins and other chemicals. To date, besides yeast and filamentous fungi, there is limited or no access to bio-catalysts and industrial enzymes. Together with enzymatic constraints, it is conceivable to seek suitable natural bio-catalysts through the collaborative approach of unearthing microbial diversity and in vitro evolution technology [3].

In 2003, the global industrial sales were estimated to be around $2.3 billion [3]. The major profits were distributed among the pulp and leather industry to produce fine chemicals ($222 million), textile industry ($237 million), agriculture industry ($376 million), food industry ($634 million), and detergent industry ($689 million). As of today, a broad range of chemical products, synthetic fibers, fuels and solvents needed for the various agricultural, pharmaceuticals, biofuel, therapeutics, food, and pulp and feed industries are derived from petroleum resources. However, the oil crisis and embargoes of the 1970s, environmental issues like global warming and security independence are some of the immediate justification for enhancing the search towards the efficacious alternative approaches in developing precise, multi-functional enzymatic biocatalysts [4]. The discovery and development of novel biotechnological enzymes is anticipated to have a profound favorable effects towards the synthesis of industrial chemical conversion, reducing the energy consumption and the generation of less toxic side products [5]. This should uplift the market for various industrial enzymes. Thus, tailored enzyme catalyst should possess some unique attributes such as hygroscopic, high chemo and stereo selectivity, multi-functional in nature, specific and stable under process condition. This can be effective in closing the functional discrepancies and thus enhance sustainable and ecobenign environmental practices.

Microbes dominate every eco-system on earth. The microbial communities such as bacteria, fungi, archaea and protists comprise the largest terrestrial and oceanic biomass, epitomized by the existence of prodigious microbial diversity of around 166,244/24,249 (bacteria/ fungi) and 49,102 bacteria operational taxonomic units (OTU) in the Dryland and Scotland data sets respectively [6] and approximately 25, 000 various microbial genotypes in just one milliliter of seawater sample from a marine ecosystem [7]. The ruminant’s ruminal microbial community is the most complex microbiome ecosystem. It consists of diverse populations of bacteria, fungi, archaea and protozoa. Among them, bacteria accounts for about 1011 cells/ml of rumen contents in both Gram negative and Gram-positive categories [8]. The enzymatic activity of these obligate anaerobes has a significant role in the breakdown of plants polymers into its monomeric units and ultimately to volatile fatty acids [9]. Fungal biomass accounts for about 8-20% of the entire rumen microbial biomass content [10]. The anaerobic fungi present in the rumen of herbivores produce hydrolytic enzymes that are involved in the plant fiber digestion [11]. The global microbial species diversity is estimated to be in the range of 106 – 107 [12]. Thus, such microbial biomass should depict an inexhaustible source of genomic innovation in this regard [7]. However, as of today, the microbial genetic insights, their complex diversity and the population dynamics have not been fully understood yet [13,14]. The recovery of novel functional gene and their metabolite products is possible through the meticulous exploitation of such untapped microbial genetic pool. The success of this effort hinges in part on metagenomics. The genetic information akin with such microbial diversities could stimulate the un-parallel prospects of information on their molecular origin, design and evolution, novel variants of genes and their propitious exploitation in catalytic science, environmental remediation, polymer, bio-active and bio-pharmaceutical companies [15].

Metgenomics insights, progression and sequencing

Metagenomics (also referred to as ecological or community genomics) [16] has the endless potential to considerably impact agricultural, bio-actives, biomarkers, polymer science, pharmaceuticals and other various biotech productions [17]. Metagenomics, particularly incorporates the functional screening and sequence based analysis of microbial bulk DNA from the moderate to extreme environment [18].

Most of the microorganisms in the environment are unculturable. In parallel, the majority of enzymatic and bio-catalytic potential locked within this uncatalogued environmental microbial consortia were inaccessible [19]. Until metagenomics approaches have been developed, we have access to only between 1-10% of the microbial genomes. In fact, the advancement of molecular metagenomic approach provided kernel access to detect the microbial genetic insights available in different habitats, independent of cultivability, exploit the target genes and genomes biotechnologically, and functional screening of microbial genome sequences in metagenomic library. This might serve as a repertoire of novel enzymes and functional proteins [12,15,20,21]. Metagenomic approach encompasses the extraction of bulk microbial DNA from arbitrary environmental samples or enrichment cultures, archiving or cloning the metagenomes in heterologous host, generating metagenomic library, and subsequent functional screening of these libraries for gene of interest or expression of DNA followed by screening for enzymatic activities of interest [22-24]. In this regard, thus developed metagenomic approaches are employed to enable or expand the inherent limitation during isolation and cultivation of culture-based techniques [25].

Heinrich Winterberg initially reported about the microbial unculturability in 1898 [26]. Nevertheless, the advancement in genomics and molecular biology laid the foundation for metagenomics approach only in late 1998 [27]. The findings done by Staley and Konopka in 1985 [28] that the bulk of microbes were inaccessible, was not so conclusive among the microbiologists. Later, a study done by Torsvik et al. [29] provided strong evidence that culture dependent techniques could not capture the entire range of microbes. Torsvik et al. [30] further found that the microorganisms that have been cultured and isolated so far account only about 1% of total microbial populations on the planet.

The idea proposed by Woese in 1985 that the 16S rRNA imparts molecular chronometer with high degree of functionally persistence altered the advancement of microbiology during that era [31]. Due to their large size, multigene and ubiquitous nature in all bacteria, the 16s rRNA gene has been extensively exploited for molecular characterization. This approach was employed to isolate, fraction and clone the whole genomic DNA into bacteriophage lambda vector for subsequent analysis [32]. Despite the exhaustive sophistication of metagenomics approach, there was an underlying inadequacy in the exploitation of metabolic and catalytic activities of microbial consortia. In 1995, Healy et al. [33] constructed functional based screening of metagenomics library, which was further termed as “zoolibraries”, and recovered cellulase and xylanase genes.

The study of uncultured microbial genomic DNA through metagenomic screening is well developed technology in metagenomic discovery [34-36]. The incorporation of potential advanced functional genomics, bio-informatics tools, system and synthetic biology, the functional screening methods like SIGEX (substrate induced gene expression) [37], METREX (metabolite regulated expression) [38], Next Generation Sequencing (NGS) and High Throughput Screening (HTS) in metagenomic study have further enabled the exploitation of hidden microbial communities [39]. In other words, these are the invaluable complements to fully understand the functionality of the microbial communities and their interaction within the niches [15]. The drastic decrease in operational cost of NGS has made it accessible to generate megabases of sequence data, thereby facilitating the metagenomics across the globe [40]. Similarly, the application of annotation pipelines have laid the way to generate/isolate wide range of microbial DNA in all sorts of habitats such as Sargasso Sea [41], Sorcerer II Global Ocean Sampling expedition [42], soil [19,43-45], gut of ruminants [54], hot springs [47-51], glacier ice [52], and Antarctic desert soil [53].

J. W. et al. [54] in 2004 reported the sequencing of 76 Mbp of DNA from an acid mine drainage bio-film as the first of this kind. This study further delivered the insights to the bio-film community’s metabolic pathway. The sequencing of > 1 Gbp of metagemomic DNA from the Sargasso Sea was found to be more challenging [41]. The identification of about 1.2 million putative genes from the Sargasso Sea metagenomic sequencing further laid the potentiality of this novel approach in gene discovery [36]. Nevertheless, the poor sequencing coverage and highly enriched bio-diversity of the Sargasso Sea provided hinderances to the complete whole genome assembly. The increase in sequencing coverage and the use of small, medium and large insert libraries could enable the whole genome assembly [36,55,56]. Moreover, the reliability of genome assembly in prokaryotes depends on the nucleotide polymorphism, gene rearrangements and horizontal gene transfer [57]. The eukaryotic metagenome sequencing proposes further demands owing to the several factors such as greater genome size, presence of introns and junk DNA. To a certain extent, the meta-transcriptomics and cDNA libraries could address these issues.

The profound use of sophisticated and high-throughput NGS technology on metagenomics has suppressed the Sanger Sequencing as the main source of sequence data. Unlike Sanger Sequencing, the NGS captures even the very rare and low abundance microbes from the various metagenomes [58,59]. Earlier, the metagenomics study and analysis was based particularly on traditional methodologies like denaturing gradient gel electrophoresis (DGGE) [60], terminal restriction fragment length polymorphism (T-RFLP) analysis [61], and Sanger Sequencing of 16s rRNA gene clone library [7]. The latter approach was extensively dominant in accessing microbial genetic understanding from various natural habitats. The use of E. coli as the host cell enhances the potentiality of this approach [39]. The application of NGS has shown a great promise in novel genome sequencing and resequencing by producing large scale sequence data sets. To name few, the ENCODE project produced over 15 trillion bases of raw data, 1000 Genomes project offered over 20,000 Gb bases of raw data [62,63]. Similarly, the Earth Microbiome Project and the Human Microbiome Project provided over two petabytes of sequence data and over five terabytes of genomic data respectively [64,65].

Even though, this metagenomics approach has tremendously proven to be effective in unlocking the microbial world for deriving arsenal of multi-functional enzymes, the paucity of suitable enzymes, and an appropriate host for an efficient gene expression were some of the limitations for various bio-transformation processes until recently. Like-wise, the low sensitivity and low throughput of the activity based metagenomics screening while dealing natural genomic heterogeneity and cross-strain assemblies are some of the critical major issues [16]. The sequence data retrieved from the metagenomic database is surmounting the timely analysis [66]. The application of fluorescence activated cell sorting (FACS), phenotypic micro-array (PM) [67], community isotype array (CIArray) [68], fluorescence in situ hybridization (FISH) and fluorescence microscopy is effective to better understanding in biological identification within a single cell [69]. In addition, the high throughput screening methods such as SIGEX (substrate induced gene expression) [37], PIGEX (product induced gene expression) [70], METREX (metabolite regulated expression) [38] have proven to be very helpful in closing the abovementioned limitations. Till date, there is no defined gold standard for the metagenomic data analysis. Next Generation Sequencing Simulator for Metagenomics (NeSSM) developed by Jia et al. [71] in 2013 is believed to consider both the sequencing errors and sequencing coverage biasness. The betterment of various simulation systems and algorithms further enhances the extraction and analysis of metagenomic sequence data [72,73].

Concluding Remarks and Future Prospective

Metagenomics is a promising avenue of microbial genomics research and study. The information gathered from the metagenomic library is of great significance in exploring the potentialities of various microbial enzymatic networks, and linking sequence data to molecular structure and functional properties [12]. The multi-functional attributes of novel bio-catalysts acquired through metagenomic approach will indisputably entice the scientific community and the industrial experts involved in white and red bio-technology [74]. The easy access to various methods for the extraction of DNA from all sorts of environment, reducing cost of sequencing, advancement in NGS platforms, and easily available bio-analytical algorithms and simulation systems has further brought the metagenomics in another exciting phase [75]. In spite of the drastic advances in functional screening capabilities, the characterization of most of the biocatalysts at industrial scale is a major impediment in its discovery. The development of wide range of alternate host vectors could ease in undertaking the issue of heterologous expression of metagenomic DNA in functional screening to some extent. Verastegui et al. [76] in 2014 stated the use of metagenomics with metagenomic enrichment technologies could further proliferate the screening hit rate. The other challenge is the dearth of reference genome for the functional annotation of the genome sequence assembly. The sequencing of novel gene reference genomes from the uncharted branches in tree of life could address this issue [41]. In addition, the manipulation of metagenomic technique with the advent of the single cell genomics could pave the better understanding in the discovery of genes from the microbial diversity (also referred to as “microbial dark matter”). Recently researchers have proposed the favorable potentiality of single cell genomics in delivering the biomolecules of industrial significance [77,78]. The storage, processing and distribution of the majority of the metagenomic sequence data sets are another question among the scientific community. Hence, there is a great need of storage system and robust data management in search of bio-catalytic genes. The exploitation of analytical modeling viz multi-step simulation based bio-informatics approaches in combination with system biology tools and high-throughput sequencing technology is inevitable in better comprehending the biological complexity of uncharacterized microbes.


international publisher, scitechnol, subscription journals, subscription, international, publisher, science

Track Your Manuscript

Awards Nomination