PubGene
   
 

Tutorials

Get an introduction to PubGene applications and concepts through screen capture DEMOS.

Download our user guide for a complete walk-through of PubGene Public functions: PDF 2.2M

FAQ

What does PubGene do?
PubGene helps you retrieve information on genes and proteins. The underlying structure of PubGene can be viewed as a "gene-centric" database. Gene and protein names are cross-referenced to each other and to terms that are relevant to understanding their biological function, importance in disease and relationship to chemical substances. The result is a "literature network" organizing information in a form that is easy to navigate. No one researcher can be expected to stay informed on all that is happening in genetics. Let PubGene help you find connections and speed the discovery process! (See Jenssen et al. 2001 PubMed).

What is text mining?
Generally, text mining is the retrieval of information relevant to a specific purpose from a body of text. PubGene targets information on genes and proteins in the MEDLINE database. Specially developed algorithms mine the abstract texts of 25 Million PubMed articles for co-citation of multiple genes or proteins and displays them as "Literature Networks", where nodes represent each gene or protein and the connecting lines represents the number of articles in which each gene or protein pair is co-cited. The list of articles can be retrieved by clicking on the number along the lines.

What are "Bio Networks"?
Bio Networks show how gene or protein names are co-cited in the literature. That is, names that appear in the same text form a pair and are said to be neighbors in the network. When a gene or protein is studied, there is a good chance its name (or a synonym for that name) will appear in articles together with other gene or protein names. One can visualize how most genes that have been studied will be connected either directly or indirectly to each other in a Bio Network. Connections in the literature are a strong indicator of biological interaction. Networks can also be annotated with Gene Ontology (GO), Medical Subject Heading (MeSH) or chemical keywords. By organizing genes and proteins into networks PubGene helps you visualize and navigate information to gain understanding.

How do I find out what a gene (or protein) "does"?
Finding out what a gene or protein does will always require some effort. Experts are generally specialists on the function and interactions of some relatively small number of genes. PubGene extracts information associated with individual genes and proteins from MEDLINE records and metadata. In this way, genes and proteins are linked to "keywords" such as Gene Ontology (GO), Medical Subject Heading (MeSH), drugs, toxins and other chemicals or compounds (Chem). The options within the BioAssociations tool allow you to view lists of such terms associated with individual genes or proteins. Using these options in BioAssociations you can find out what is known about the processes, function, cellular compartmentalization, pathologies and behavior in response to chemical treatment associated with a gene or protein.

Does PubGene also deal with DNA- and protein sequences?
Many genes identified by sequencing efforts remain to be described in terms of the function of their products. In such cases you may be able to guess the function of the gene of interest by looking at similar genes. DNA- and protein sequences are also useful for experiments aimed at validating your results, e.g. as primer sequences for RT-PCR. Generally, to find similar genes you look for genes with similar structure (that is, similar nucleotide or amino acid sequence). PubGene can draw networks based on sequence similarity. This function is available under Advanced Options in Bio Networks. PubGene sequence networks are based on pair-wise alignment of reference sequences in a pre-compiled database. By viewing a sequence network you can quickly see if other genes with similar sequence have been described in terms of their biological or medical importance. You can also perform searches of sequence databases with the PubGene Sequence Homology tool. This tool aligns a DNA or protein sequence against the entries in one of several databases using the Smith-Waterman algorithm. PubGene employs the powerful PARALIGN method. This makes it possible to return the highly accurate and sensitive Smith-Waterman alignments in a matter of minutes â?? at speeds comparable to the quick (but dirty) BLAST and FASTA methods.

Can PubGene find Genes using Keywords?
PubGene tools are designed to allow you to find gene (protein) to gene as well as keyword to gene (protein) connections. A "keyword" submitted to the PubGene Bio Networks will show a literature (or sequence) network of genes associated with that keyword. BioAssociations can generate lists of keywords associated with a gene (protein). Or BioAssociations can do the opposite: find a list of genes (proteins) relevant to a keyword. In both cases the entries in the list are ranked by the significance of the association.

How do I find a connection between specific genes?
Some genes are not co-cited (mentioned together) in any MEDLINE record. But you can submit both gene names simultaneously to PubGene Bio Networks. Networks produced by submitting two (or more) genes simultaneously may show indirect connections between genes (via a neighbor).

What does the coloring of "nodes" in network images mean?
Each PubGene network is drawn in response to a specific query. The color scheme in the networks reflects "distance" from the query term submitted. If a single gene term is the query its "node" in the network will be colored bright red. Neighbors of (genes co-cited with) the query gene are a darker red color and neighbors of neighbors are colored black. If two genes are submitted in the same query, both will be shown as bright red in the network. Keyword queries using default settings will draw networks with as many as 10 of the nodes in the network colored bright red, these corresponding to the 10 genes most significantly associated with the keyword.

What are the numbers on the "edges" of the network?
Bio Networks have numbers on the "edges" between nodes (that is, the lines connecting genes to each other or genes to keywords). These numbers indicate the "strength" of the association between the nodes. Under default settings they correspond to the number of MEDLINE records in which the connected genes are co-cited. Using Advanced Options you can change graph drawing to Probabilistic. Values shown on network edges will then be decimals that can be read as p-values. Another option is to draw networks of sequence similarity. Sequence network edges are labeled with e-values.

How do I pull up articles from a network image?
Click on a network edge. PubGene will open the Literature tool and present the corresponding list of MEDLINE records. Clicking on the PubMed ID for an entry will bring up a window showing the record and highlight the query terms.

Does PubGene have commercial products?
PubGene offers products by subscription. Visit http://www.pubgene.com to learn more.

Does PubGene perform customized projects?
PubGene is currently involved in a number of collaborative projects to meet special needs of organizations. Visit www.pubgene.com to learn more.

 
   
PubGene, Forskningsveien 2A, PO Box 180 Vinderen, NO-0319 Oslo, Norway. Email: info at pubgene.com
Copyright © 2007 PubGene, Inc. All rights reserved.