Volume 16, Issue 1 p. 49-59
Minireview
Free Access

Software platforms to facilitate reconstructing genome-scale metabolic networks

Joshua J. Hamilton

Joshua J. Hamilton

Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI, 53706 USA

Search for more papers by this author
Jennifer L. Reed

Corresponding Author

Jennifer L. Reed

Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI, 53706 USA

For correspondence. E-mail [email protected]; Tel. 608 262 0188; Fax 608 262 5434.Search for more papers by this author
First published: 22 October 2013
Citations: 55

Summary

System-level analyses of microbial metabolism are facilitated by genome-scale reconstructions of microbial biochemical networks. A reconstruction provides a structured representation of the biochemical transformations occurring within an organism, as well as the genes necessary to carry out these transformations, as determined by the annotated genome sequence and experimental data. Network reconstructions also serve as platforms for constraint-based computational techniques, which facilitate biological studies in a variety of applications, including evaluation of network properties, metabolic engineering and drug discovery. Bottom-up metabolic network reconstructions have been developed for dozens of organisms, but until recently, the pace of reconstruction has failed to keep up with advances in genome sequencing. To address this problem, a number of software platforms have been developed to automate parts of the reconstruction process, thereby alleviating much of the manual effort previously required. Here, we review four such platforms in the context of established guidelines for network reconstruction. While many steps of the reconstruction process have been successfully automated, some manual evaluation of the results is still required to ensure a high-quality reconstruction. Widespread adoption of these platforms by the scientific community is underway and will be further enabled by exchangeable formats across platforms.

Introduction

Genome-scale network reconstructions (GENREs) and other metabolic network descriptions collect and codify current knowledge about the metabolism of an organism. A GENRE is an organism-specific structured collection of biochemical transformations and associated genes obtained from the genome annotation and primary literature (Feist et al., 2009a). The past decade has seen enormous growth in the number of published GENREs for a wide range of organisms [e.g. Escherichia coli (McCloskey et al., 2013), Saccharomyces cerevisiae (Osterlund et al., 2012), Shewanella oneidensis MR-1 (Fredrickson et al., 2008) and Geobacter spp. (Mahadevan et al., 2011)], and guidelines for developing a high-quality GENRE have recently been published (Thiele and Palsson, 2010).

These reconstructions serve as knowledge bases for the target organism and also serve as a platform for the development of genome-scale metabolic models (GEMs) (Price et al., 2004). GEMs provide a mathematical representation of an organism's metabolism and enable its phenotype to be evaluated and manipulated computationally. GEMs have also been used to drive and support experimental efforts in a variety of other applications (Oberhardt et al., 2009; Lewis et al., 2012), including network characterization (Oberhardt et al., 2009; Zomorrodi et al., 2012), metabolic engineering (Smolke, 2009; Zomorrodi et al., 2012), evolution (Papp et al., 2011), drug discovery (Chavali et al., 2012), contextualizing high-throughput data (Blazier and Papin, 2012; Reed, 2012) and elucidating microbial community interactions (Zengler and Palsson, 2012). For example, GEMs were used to design the first organism with direct biocatalytic routes to 1,4-butanediol (Yim et al., 2011), and to identify better antibiotics against Vibrio vulnificus (Kim et al., 2011). Other examples of GEM usage in these areas include a study of gene loss in the endosymbiont Buchnera aphidicola (Yizhak et al., 2011), predictions of cooperative and competitive potential in bacterial communities (Freilich et al., 2011) and the design of a uranium bioremediation strategy for contaminated groundwater (Zhuang et al., 2012). Mathematical techniques for analysing these GEMs have also been recently reviewed (Lewis et al., 2012; Zomorrodi et al., 2012), and while not the focus of this review, these computational techniques are useful for reconstructing metabolic networks.

Historically, network reconstruction has been a time- and labour-intensive process (Thiele and Palsson, 2010), and a number of tools have been developed to automate parts of the procedure. Most commonly, software tools have focused on developing draft reconstructions [such as metaSHARK (Pinney et al., 2005), AUTOGRAPH (Notebaart et al., 2006) and many others] or performing simulations [such as CellNetAnalyzer (Klamt et al., 2007) and OptFlux (Rocha et al., 2010), COBRA Toolbox (Schellenberger et al., 2011a), among others], with very few tools providing support for refining draft reconstructions to obtain a final, well-curated reconstruction. Reviews of many tools for drafting GENREs or simulating GEMs have recently been published (Raman and Chandra, 2009; Copeland et al., 2012; Lakshmanan et al., 2012).

In this review, we provide an overview of network reconstruction based on recently published guidelines (Thiele and Palsson, 2010), and discuss four software platforms which provide support for all stages of the reconstruction process: the SuBliMinaL Toolbox (Swainston et al., 2011), the Model SEED (Henry et al., 2010), the RAVEN Toolbox (Agren et al., 2013) and Pathway Tools (Karp et al., 2010; Latendresse et al., 2012). We conclude with a discussion of possible future advances that would improve ease of use and interoperability of the four software platforms.

Overview of network reconstruction

Researchers have divided the reconstruction process into four stages, consisting of over 90 steps, during which an annotated genome is converted to a high-quality metabolic network reconstruction and distributed to the scientific community (Fig. 1). Reconstruction is an iterative process: stages may be repeated until the reconstruction's GEM predictions agree with experimental observations.

figure

Schematic of network reconstruction process. Reconstruction is an iterative process, and stages 2–4 are repeated until the model agrees with experimental predictions. Stage numbers refer to the guidelines published by Thiele and Palsson (2010).

Reconstruction begins when an annotated genome gets converted into a draft reconstruction (stage 1), during which biochemical databases are used to identify the metabolic functions associated with a genome's content. Once a draft reconstruction has been obtained, it must be refined in light of physiochemical considerations and expert knowledge about the organism (stage 2). After a reconstruction is refined (or curated), it is then further evaluated in light of experimental evidence (stage 4). These evaluations are performed using a GEM derived from the reconstruction (stage 3), in the form of computational simulations. The results of these simulations feed back into stage 2, and the reconstruction is refined until the GEM correctly predicts experimental observations. This results in an iterative reconstruction process (Fig. 1), whose endpoint is determined by the desired scope and purpose of the reconstruction. One of the most comprehensive GENREs to date, for E. coli, has gone through four iterations since its initial publication in 2000 (Edwards and Palsson, 2000; Reed and Palsson, 2003; Feist et al., 2007; Orth et al., 2011). Other GENREs which have been subject to multiple rounds of iteration include Homo sapiens, with three reconstructions (Duarte et al., 2007; Hao et al., 2010; Thiele et al., 2013), and S. cerevisiae, with over a dozen reconstructions, including two in the past year (Heavner et al., 2013; Osterlund et al., 2013) [reviewed in (Osterlund et al., 2012)].

In the following sections, we describe the major steps in the reconstruction process, and discuss how each software platform facilitates that step. Figure 2 summarizes the discussion, listing each step in the reconstruction process and indicating the extent to which each software platform facilitates that step. Additional criteria relevant to selecting a software platform are provided in Table 1.

figure

Left: The published guidelines identify 96 steps and 4 stages in network reconstruction, most of which are summarized here. Right: Each software platform provides varying levels of support for each step. Automatic: the software performs this step automatically and updates the reconstruction without any user input. Assistance: the software provides support for the user to perform the step (e.g. a function that can be called by the user). Users can accept or reject suggested modifications to the reconstruction. Manual: the user must perform the step manually. Step numbers refer to the guidelines published by Thiele and Palsson (2010).

Table 1. Selected characteristics of software platforms for reconstruction and simulation of metabolic networks
SuBliMinal Model SEED RAVEN Pathway Tools COBRA Toolbox
Input Species name Genome annotated in RAST Annotated genome sequence Annotated genome sequence GENRE in SBML format
Reference Database KEGG, MetaCyc SEED KEGG MetaCyc N/A
Interface Command Line Web Matlab Web, Pathway Tools Software Matlab
License Free Free Free (requires a Matlab license) Free for academic and government Free (requires a Matlab license)
Output SBML SBML, Excel SBML, Excel PGDB, SBML, BioPax SBML, Excel
Supports Simulations No Yes Yes Yes Yes

Stage 1: draft reconstruction

In the first stage of network reconstruction, an annotated genome is used to assemble a collection of metabolic reactions. Draft reconstructions also contain gene-protein-reaction (GPR) associations, relationships that indicate which gene products carry out which biochemical transformations. Annotated genomes can be obtained from a variety of sources (step 1), including the National Center for Biotechnology Information (Wheeler et al., 2007), the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000; Kanehisa et al., 2012) and the SEED (Overbeek et al., 2005). A recent review describes a number of additional databases useful for network reconstruction (Garcia-Albornoz and Nielsen, 2013).

To obtain the appropriate metabolic reactions, metabolic genes are first identified [e.g. on the basis of enzyme names, enzyme commission (EC) numbers and gene ontology (GO) categories (Ashburner et al. 2000)] (step 2). The four software platforms connect genes to reactions via biochemical reaction databases, such as KEGG (Kanehisa and Goto, 2000; Kanehisa et al., 2012), MetaCyc (Caspi et al., 2010), MetRxn (Kumar et al., 2012), BIGG (Schellenberger et al., 2010) (step 3) or other existing GENREs [available through databases such as BioModels DB (Le Novère et al., 2006) or the BioMet Toolbox (Cvijovic et al., 2010)], as well as individual websites (e.g. Feist et al., 2009b). The Model SEED contains its own curated reaction database. The draft reconstruction comprises all reactions and GPR associations retrieved in such a way (step 4).

The RAVEN Toolbox draws on the KEGG database for its reconstruction. The user provides an annotated genome, and the RAVEN Toolbox uses protein homology to identify the KEGG Orthology (KO) ID, which best matches each gene. The reactions and genes corresponding to that KO ID are then imported into the reconstruction. RAVEN users also have the option to use one or more template GENREs as their reaction database, in which case protein homology is used to identify and import reactions and genes from the template GENREs into the reconstruction. In either case, users should inspect the reconstruction to ensure that protein functions identified by homology match the functions in the annotation.

The Model SEED requires users to annotate their genome using RAST (Aziz et al., 2008), a novel annotation platform relying on manually curated subsystems and protein families (FIGfams). Users can inspect the RAST annotation and make changes based on their knowledge of the organism. Reactions are then obtained from the manually curated Model SEED database.

Pathway Tools requires users to provide an annotated genome, and draws reactions from the manually curated MetaCyc database. The software uses the PathoLogic algorithm (Karp et al., 2002) to retrieve reactions from MetaCyc and infers missing reactions based on the organism's complement of biochemical pathways. We recommend users to inspect these inferred reactions to ensure that their presence is supported by experimental evidence. Users are also given the opportunity to clarify unclear annotations. Both Pathway Tools and the Model SEED also maintain databases of draft reconstructions for a variety of organisms which are freely available.

Finally, the SuBliMinaL Toolbox incorporates data from both KEGG and MetaCyc. The SuBliMinaL Toolbox downloads all KEGG and MetaCyc pathways and reactions for a given organism, and merges them to create a draft reconstruction. Users are thus restricted to organisms found in these two databases. Users can also incorporate existing GENREs into this process. Because of a change in the way third-party applications must access KEGG, the SuBliMinaL Toolbox as published can no longer access the current KEGG databases. However, a working version of the SuBliMinaL Toolbox is available from the toolbox authors.

Stage 2: refinement/curation

In the second stage, the draft reconstruction must be evaluated and refined to ensure consistency with physical principles and experimental evidence. The inclusion of each reaction is carefully scrutinized, and GPR associations are validated. The reconstruction is updated to include notes denoting the evidence for and confidence in each reaction in the reconstruction. Finally, additional information is gathered to support subsequent steps of the reconstruction process.

The first task in stage 2 is to verify the substrates and products of each reaction (step 6). Reaction databases such as KEGG may represent reactions in a generic way (e.g. as acting on an acyl group, or as involving a nonspecific cofactor such as ‘electron carrier’) in order to capture a spectrum of catalytic activities. Manually curated or organism-specific databases are less likely to include such generic reactions. In the Model SEED, generic reactions have been updated with standard metabolites (e.g. NAD as a cofactor) where possible, though we encourage users to verify that the cofactors are correct for their organism. For all software platforms, generic reactions included in the draft reconstruction should be replaced with organism-specific ones.

In addition, many databases represent metabolites in their uncharged state, when the actual charged state depends on the intracellular pH. Thus, all reactions should be checked to ensure all metabolites are in their proper charged state, and the overall reaction is mass- and charge-balanced (steps 7–9). Reactions which are unbalanced may lead to the production of metabolites or energy (e.g. ATP) out of nothing. Both the Model SEED and Pathway Tools use manually curated reaction databases for which these steps have already been performed. Users are encouraged to check that all reactions in their reconstruction are balanced. In contrast, users of the RAVEN Toolbox must update metabolites and reactions themselves because the KEGG database is not balanced. The RAVEN Toolbox helpfully contains a function to identify unbalanced reactions, which users can then balance manually. The SuBliMinaL Toolbox occupies a middle road, allowing users to call the cheminformatics software MarvinBeans (ChemAxon, Budapest, Hungary) to update metabolite charges and formulas (when possible). The SuBliMinaL Toolbox also attempts to balance reactions, and notifies the user of reactions which cannot be balanced. Unfortunately, the SuBliMinaL Toolbox provides no way for users to manually correct metabolites or reactions which cannot get corrected automatically, and so users need to make these changes after exporting their reconstruction from the software.

Reactions should also be written in the proper direction, localized to the proper compartment (for eukaryotes) and associated with the proper metabolic pathway/subsystem (steps 10–12). The Model SEED and Pathway Tools both predict physiological directions of all reactions, whereas the RAVEN and SuBliMinaL Toolboxes assume reactions are bidirectional in the absence of any specifying information. Users should attempt to assign specific reaction directions on the basis of available evidence [such as thermodynamics (Fleming et al., 2009)]. The RAVEN and SuBliMinaL Toolboxes support multicompartment reconstructions, and predict compartments for reactions and genes based on amino acid sequences using WoLF PSORT (Horton et al., 2007), while the Model SEED and Pathway Tools are focused on bacterial (and archaeal) reconstructions, and so do not support reaction compartmentalization. In all cases, users should evaluate predicted reaction directions and compartments to ensure they make biological sense. The Model SEED, Pathway Tools and the RAVEN Toolbox support the assignment of reactions to subsystems, although subsystem definitions may differ across platforms. Should users of the SuBliMinaL Toolbox desire subsystem annotations, they must assign them manually.

Next, users should verify the GPR associations for all reactions in the reconstruction (step 13). Users should ensure that genes are associated with the proper biochemical reactions (e.g. on the basis of annotations or experimental evidence) and check that each GPR has the proper form (Fig. 3). In the simplest case, a single-gene product carries out a single biochemical reaction (Fig. 3A). Additionally, multiple enzymes may carry out the same reaction (called isozymes, Fig. 3B); reactions may be carried out by a multimeric protein complex (Fig. 3C); or a single enzyme may carry out multiple reactions (Fig. 3D). A detailed GPR captures these types of interactions between genes and reactions. The RAVEN and SuBliMinaL Toolboxes generate lists of genes associated with each reaction, and allow users to define the detailed GPR structure themselves. If users desire to perform simulations involving genes (e.g. gene deletion phenotypes), they should determine detailed GPR structures. If a user of the RAVEN Toolbox provides template reconstructions (instead of using the KEGG database) with detailed GPR associations (e.g. a multimeric complex), then these associations are maintained in the draft reconstruction. The Model SEED and Pathway Tools automatically generate detailed GPR relationships, including enzyme complexes and isozymes.

figure

Examples of detailed gene-protein-reaction (GPR) associations.

A. Simple association, in which a single gene encodes a single enzyme.

B. Isozymes, in which multiple genes encode distinct proteins carrying out the same function.

C. Multimeric protein complex, in where multiple genes encoding distinct protein subunits come together to form an active enzyme.

D. One-to-many relationship, in which a single protein can carry out multiple reactions.

After all reactions in the draft reconstruction have been validated, it is important to provide information justifying the inclusion of each reaction in the reconstruction (steps 15–17). This information includes a confidence score indicating the strength of the supporting evidence for a reaction, any notes and references justifying the score, and a flag indicating any information obtained from related organisms. Unfortunately, with the exception of Pathway Tools, this information is not provided by most software platforms. Pathway Tools, however, has developed an evidence ontology (EO) (Karp et al., 2004) which users can use to document their reconstruction. Users should manually compile this information so that both they and others can immediately see the justification and evidence for each reaction in the reconstruction.

Additionally, all reactions and metabolites should also be associated with a unique, unambiguous, identifier, such as ChEBI identifier (Degtyarenko et al., 2008) or EC number (step 14). All four software platforms have some support for assigning these identifiers, although the platforms vary in their choice of identifiers and annotation methods.

In stage 4, a GEM derived from the reconstruction is used to computationally evaluate the reconstruction. The remaining steps of stage 2 are devoted to updating the reconstruction with information in preparation for these simulations (steps 19–37). The first of these steps involve the addition of spontaneous and transport reactions (steps 19–22). The Model SEED and Pathway Tools both add spontaneous reactions based on pathway completeness, with extracellular transport reactions added based on genomic evidence. Because the Model SEED and Pathway Tools emphasize prokaryotic systems, transport reactions between compartments are not added. The RAVEN Toolbox gives the user the ability to add spontaneous reactions as well as intra- and extracellular transporters, but does not do so automatically. Finally, the SuBliMinaL Toolbox allows users to add a default set of transporters, which the user is tasked with pruning based on experimental evidence. Because transporters are often poorly annotated, users should augment software predictions with literature evidence, adding (or removing) transporters as necessary.

Next, a biomass reaction should be added to the reconstruction (steps 24–33). The biomass reaction is a non-enzymatic reaction containing the macromolecules and other compounds which make up the dry weight of the cell (Feist and Palsson, 2010); the reaction is used to represent cellular growth when performing computational simulations. Both the Model SEED and the SuBliMinaL Toolbox generate biomass equations based on the organism's phylogeny and genomic content, although users should verify the equations and stoichiometric coefficients themselves based on experimental evidence. The RAVEN Toolbox and Pathway Tools require users to add the biomass equation manually, giving them the ability to define a biomass equation to their own specifications. The construction of biomass equations has been recently reviewed (Feist and Palsson, 2010).

At this point, users should also identify the growth requirements of their organism for use in subsequent simulations (step 37). The Model SEED assumes a rich medium in which any metabolite with a transporter is present. If this medium differs from the experimentally characterized medium, users may need to repeat steps in stages 2 and 4 manually on the proper medium.

Other reactions to be added in this stage of the reconstruction process include the ATP-maintenance reaction (representing cellular maintenance costs) and demand/sink reactions for metabolites whose biosynthesis or degradation pathways are unknown (steps 34–37). The RAVEN Toolbox and Pathway Tools enable users to add these reactions themselves, while users of the Model SEED and the SuBliMinaL Toolbox should add these reactions to their reconstruction prior to performing simulations.

The evaluation to be performed in stage 4 (and later applications) is often facilitated by visual analysis. Many tools have been developed for visualization of network reconstructions (Pavlopoulos et al., 2008), including Cytoscape (Shannon et al., 2003; Smoot et al., 2011). Of the platforms reviewed here, three support visualization (step 23). Pathway Tools dynamically generates pathway diagrams for each pathway in the reconstruction, while the Model SEED overlays reconstruction pathways on top of static KEGG pathway diagrams. The RAVEN Toolbox supports visualization on top of manually drawn CellDesigner maps (Funahashi et al., 2008). Manual construction of pathway maps can be very time-consuming, while KEGG maps may not reflect the unique features of a particular organism. Users should consider which, if any, visualization approach best suits their needs.

Stage 3: conversion to a GEM

In this stage, the refined GENRE gets converted to a GEM that serves as a basis for the simulations of stage 4. Simulations can be performed using a variety of software platforms, including the popular COBRA Toolbox for Matlab (Schellenberger et al., 2011a), as well as many others [reviewed in (Raman and Chandra, 2009; Lakshmanan et al., 2012)].

GENREs and other systems-biology models are published and distributed in one or more standard formats, such as Systems Biology Markup Language (SBML) (Hucka et al., 2003) or BioPax (Demir et al., 2010). Each platform reviewed here supports exporting the reconstruction in one or more standard formats (Table 1) for import into simulation software. Furthermore, the Model SEED, the RAVEN Toolbox and Pathway Tools provide in-software support for simulations. Users of the SuBliMinaL Toolbox must use third-party software to perform the simulations necessary for stage 4. In the next section, we discuss the simulation capabilities of the Model SEED, the RAVEN Toolbox and Pathway Tools, while the published reconstruction guidelines (Thiele and Palsson, 2010) describe how COBRA can be used instead.

Stage 4: network evaluation

The fourth and final stage of network reconstruction consists of network evaluation and validation against experimental data. During this stage, simulations are performed on a GEM derived from the reconstruction. The fundamental algorithm upon which most simulations are based is flux-balance analysis (FBA; Orth et al., 2010), a constraint-based method for predicting the flow of metabolites through a metabolic network. FBA can be applied to a variety of physiological analyses, including predicting growth rates, by-product secretion rates and gene essentiality, and calculating theoretical yields (Orth et al., 2010).

The first evaluation step is to identify metabolic dead ends, those metabolites which cannot be created or consumed (step 45). Such metabolites point to gaps, or missing reactions, in the network which may need to be filled (steps 46–48). In particular, gaps associated with the production of biomass components (steps 60–66) or secretion products (steps 67–75), or which may cause blocked reactions (i.e. reactions that cannot carry any flux, steps 76–78), should be evaluated and filled.

The GEM predictions should also be validated against available experimental data. Common validation steps include prediction of experimental growth rates (steps 84–94), gene deletion phenotypes (steps 79–80) or other important physiological properties (such as P/O ratio, or flux splits in metabolic pathways, steps 81–83).

Of the many steps outlined in stage 4, the Model SEED emphasizes the production of biomass precursors (steps 60–66). The Model SEED performs an ‘auto-completion’ process that identifies and adds the minimum number of reactions necessary to enable growth of the GEM. The Model SEED does not allow the user to specify a growth medium for this reaction addition step, instead employing a rich media containing all metabolites for which the reconstruction has transporters. As a result, additional reactions may need to be added to match growth under other media conditions. The Model SEED can perform FBA simulations on other growth media, which users should use to ensure the GEM predicts growth on media known to support growth of the organism (steps 84–94). FBA also enables assessment of secretion products (steps 67–75) and other physiological properties (steps 81–83). Users wishing to perform the remaining evaluation steps using a Model SEED model will have to use third-party simulation software.

Both Pathway Tools and the RAVEN Toolbox contain considerable support for performing the simulations needed for network evaluation. Both contain tools to identify dead end metabolites (step 45), blocked reactions (steps 76–78) and network gaps associated with biomass precursors and secretion products (steps 60–75). Both platforms also suggest reaction additions to fill the gaps (steps 46–48), while Pathway Tools will also suggest genomic evidence in support of different reaction candidates. Users of the RAVEN Toolbox must identify candidate genes themselves. The less automated approach of these two platforms (compared with the Model SEED) enables users to apply their knowledge of the organism to refine the reconstruction, rather than relying solely on computational algorithms.

The RAVEN Toolbox also provides support for many of the flux-balance simulations available in the COBRA Toolbox (Schellenberger et al., 2011a), while Pathway Tools integrates the MetaFlux tool for FBA and related simulations (Latendresse et al., 2012). This enables both software platforms to perform simulations of growth rate (steps 84–94), gene deletions (steps 79–80) and other physiological properties (steps 81–83).

Finally, we note that there are a number of algorithms to perform network evaluation and validation. These include: GapFind and GapFill for identifying and resolving dead ends and gaps (Kumar et al., 2007); FVA (flux variability analysis) for identifying blocked reactions (Mahadevan and Schilling, 2003); and SMILEY (Reed et al., 2006), GrowMatch (Kumar and Maranas, 2009), GeneForce (Barua et al., 2010) and CROP (Dreyfuss et al., 2013) for resolving growth phenotype inconsistencies. The RAVEN Toolbox, Model SEED and Pathway Tools implement their own variations of these algorithms, and we encourage users of these platforms to consult the software documentation before selecting an approach.

Future directions

While all four software platforms reviewed here facilitate network reconstruction, we believe that their adoption can be encouraged through additional software features. Currently, researchers must be aware of the potential pitfalls in automated reconstruction and must actively evaluate software inputs and outputs to ensure a high-quality reconstruction. The utility of these software platforms would be enhanced through features, which guide users through reconstruction, rather than just facilitate it.

For example, these platforms could provide better support for mass- and charge-balancing reactions. Many reactions involve generic metabolites which must be replaced before a reaction can be balanced. Such efforts could be facilitated by automatic flagging of generic metabolites and reactions and providing users the opportunity to replace them. It is also important that metabolites be represented in their properly charged state. For example, the cheminformatics software MarvinBeans can be used to identify the proper charge for each metabolite. (The SuBliMinaL Toolbox implements this approach.) Users could then be notified of unbalanced reactions and given the opportunity to balance them. It is also important that reactions be assigned their proper direction, to prevent stoichiometrically balanced cycles (SBCs) and other unrealistic behaviours (such as free ATP production). Reaction directions can be predicted on the basis of thermodynamics, and one such method has been implemented in the COBRA Toolbox (Noor et al., 2013). Additionally, the Model SEED and Pathway Tools perform automatic filling of metabolic gaps. Users could be presented with these results of these gap-filling algorithms and be allowed to accept or reject them on the basis of experimental evidence. This more hands-on approach is currently implemented by the RAVEN Toolbox.

Reconstructions should be checked for SBCs (e.g. A → B → C → A) in their GEMs, sets of reactions for which the overall thermodynamic driving force is zero, and through which no net flux can occur (steps 51–58 in the reconstruction guidelines). Unfortunately, while there have been a number of studies on SBCs (Beard et al., 2002; 2004; Qian et al., 2003; Yang et al., 2005; Schellenberger et al., 2011b), no universal approach has yet been developed to identify and eliminate them. As techniques for handling SBCs are developed, these could be incorporated into the reconstruction and simulation platforms.

Another area likely to change is model dissemination. While all the platforms support export to SBML, they do not all export the same information or represent it in the same way. For example, the RAVEN Toolbox exports SBML in the format of the Yeast 1.0 GEM (Herrgård et al., 2008), while the Model SEED exports COBRA-compliant SBML (Schellenberger et al., 2011a). While this issue may be partially resolved by level 3 of the SBML standard (Olivier and Bergmann, 2011), it is nonetheless important to adhere to a common standard. It is also important that exported reconstructions contain notes, references and confidence scores supporting the inclusion of each reaction. Pathway Tools employs a powerful EO that may be suitable for this purpose (Karp et al., 2004). The lack of support for these features in current reconstruction software may be due to evolving SBML standards for representing this information.

Software availability and recommendations

The SuBliMinaL Toolbox (Swainston et al., 2011) emphasizes drafting and refining network reconstructions, and leaves network evaluation in the hands of the user. To this end, the SuBliMinaL Toolbox supports exporting reconstructions in SBML format, formatted in such a way that they can be loaded into the COBRA Toolbox. However, we have found that GENREs created in the SuBliMinaL Toolbox from the merger of KEGG and MetaCyc pathways require extensive curation. We find that this is largely due to the duplication of metabolites which lack shared identifiers across KEGG and Metacyc. Furthermore, the SuBliMinaL Toolbox must be accessed via a command-line interface. For these reasons, we recommend the SuBliMinaL Toolbox for more advanced users looking to automate parts of their existing reconstruction workflow.

The Model SEED (Henry et al., 2010) also emphasizes the draft and refinement stages of network reconstruction, via a freely available web-based interface. The Model SEED performs more automation than the SuBliMinaL Toolbox, but the steps are not automatically vetted against experimental evidence and should be carefully evaluated. The reconstruction of Bacteroides thetaiotaomicron (Heinken et al., 2013) provides an example of the manual curation steps required to generate a high-quality reconstruction from Model SEED. The Model SEED is fully integrated with the SEED annotation environment (Overbeek et al., 2005), giving users easy access to high-quality genome and subsystem annotations which facilitate GENRE refinement. The Model SEED exports models in SBML format for use with the COBRA Toolbox. Because of its limited simulation capabilities (FBA only), we recommend the Model SEED for users willing to work with external simulation platforms such as COBRA.

The RAVEN Toolbox (Agren et al., 2013) is a Matlab-based toolbox providing support for all stages of the reconstruction process. The RAVEN Toolbox generates draft reconstructions from KEGG, and also enables users to use existing GENREs as templates. The toolbox implements many of the functions commonly used in network refinement and evaluation. Users can call each function individually, manually examine the results and update their reconstructions. This enables users to examine the relevant literature and accept or reject changes on an individual basis. This feature is especially useful, as KEGG metabolites can be generic and reactions are often unbalanced, necessitating the need for extensive refinement. For this reason, we recommend using RAVEN for reconstructions where an existing SBML model can be used as a template (e.g. a GENRE of a closely related organism). While Matlab does have a graphical user interface (GUI), most operations are performed from a command line. To users willing to learn its interface, the RAVEN Toolbox provides a powerful platform for exploring the reconstruction process.

Pathway Tools (Karp et al., 2010; Latendresse et al., 2012) is a comprehensive software platform that provides support for all stages of the reconstruction process, as well as many other bioinformatics applications. It provides a standalone software environment with a GUI that is freely available to academic and government users. As with the RAVEN Toolbox, users can call and evaluate functions individually to update their reconstruction. For users who are new to network reconstruction, we recommend Pathway Tools for your first attempt. Pathway Tools stores reconstructions in the form of Pathway/Genome Databases, but can export them to a variety of other formats. As with all the platforms reviewed here, many application-oriented algorithms (e.g. for metabolic engineering) are unavailable through Pathway Tools and require additional software.

Concluding remarks

GENREs are proven tools for addressing biological questions in a variety of fields. A comprehensive set of guidelines have been developed to walk researchers through the development of high-quality reconstructions. A variety of software platforms automate many steps of the reconstruction process, enabling researchers to focus their efforts on those steps requiring experimental knowledge. However, these platforms are aimed primarily at researchers familiar and comfortable with the process of network reconstruction. To make the most of these automated technologies, researchers must be aware of their relative advantages and limitations to generate high-quality reconstructions. The software platforms discussed here greatly facilitate the reconstruction process and will enable the generation of high-quality reconstructions for any sequenced organism.

Acknowledgements

We thank Neil Swainston for providing an updated version of the SuBliMinaL Toolbox. This work was supported by the National Science Foundation through a Graduate Research Fellowship for JJH (DGE-0718123) and a Career Award to JLR (1053712).