KEGG API is a web service to use the KEGG system from your program via SOAP/WSDL.
We have been making the KEGG system available at GenomeNet. KEGG is a suite of databases including GENES, SSDB, PATHWAY, LIGAND, LinkDB, etc. for genome research and related research areas in molecular and cellular biology. These databases and associated computation services are available via WWW and the user interfaces are built on web browsers. Thus, the interfaces are designed to be accessed by humans, not by machines, which means that it is troublesome for the researchers who want to use KEGG in an automated manner. Besides, from the database developer's side, it is impossible to prepare all the CGI programs that satisfy a variety of users' needs.
In recent years, the Internet technology for application-to-application communication referred to as the web service is improving at a rapid rate. For exmaple, Google, a popular Internet search engine, provides the web service called the Google Web API. The service enables users to develop software that accesses and manipulates a massive amount of web documents that are constantly refreshed. In the field of genome research, a similar kind of web service called DAS (distributed annotation system) has been used on several web sites, including Ensembl, Wormbase, Flybase, SGD, TIGR.
With the background and the trends noted above, we have started developing a new web service called KEGG API using SOAP and WSDL. The service has been tested with Ruby (Ruby 1.8.2 or Ruby 1.6.8 with SOAP4R version 1.4.8.1) and Perl (SOAP::Lite version 0.55) languages. Although the service has not been tested with clients written in other languages, it should work if the language can treat SOAP/WSDL.
The BioRuby project prepared a Ruby library to handle the KEGG API, so users of the Ruby language should check out the latest release of the BioRuby distribution.
For the general information on KEGG API, see the following page at GenomeNet:
This guide explains how to use the KEGG API in your programs for searching and retrieving data from the KEGG database.
As always, the best way to become familar with it is by looking at an example. In this document, sample codes written in several languages are shown. After understanding the first exsample, try other APIs.
Firstly, you have to install the SOAP related libraries for the programming language of your choice.
In the case of Perl, you need to install the following packages:
Here's a first example in Perl language.
#!/usr/bin/env perl
use SOAP::Lite;
$wsdl = 'http://soap.genome.jp/KEGG.wsdl';
$serv = SOAP::Lite->service($wsdl);
$offset = 1;
$limit = 5;
$top5 = $serv->get_best_neighbors_by_gene('eco:b0002', $offset, $limit);
foreach $hit (@{$top5}) {
print "$hit->{genes_id1}\t$hit->{genes_id2}\t$hit->{sw_score}\n";
}
The output will be
eco:b0002 eco:b0002 5283 eco:b0002 ecj:JW0001 5283 eco:b0002 sfx:S0002 5271 eco:b0002 sfl:SF0002 5271 eco:b0002 ecc:c0003 5269
showing that eco:b0002 has Smith-Waterman score 5271 with sfl:SF0002 as a 4th hit among the entire KEGG/GENES database (here, "eco" means E. coli K-12 MG1655 and "sfl" means Shigella flexneri 2457T in the KEGG organism codes).
The method internally searches the KEGG/SSDB (Sequence Similarity Database) database which contains information about the amino acid sequence similarities among all protein coding genes in the complete genomes, together with information about best hits and bidirectional best hits (best-best hits). The relation of gene x in genome A and gene y in genome B is called bidirectional best hits, when x is the best hit of query y against all genes in A and vice versa, and it is often used as an operational definition of ortholog.
Next example simply lists PATHWAYs for E. coli ("eco") in KEGG database.
#!/usr/bin/env perl
use SOAP::Lite;
$wsdl = 'http://soap.genome.jp/KEGG.wsdl';
$results = SOAP::Lite
-> service($wsdl)
-> list_pathways("eco");
foreach $path (@{$results}) {
print "$path->{entry_id}\t$path->{definition}\n";
}
This example colors the boxes corresponding to the E. coli genes b1002 and b2388 on a Glycolysis pathway of E. coli (path:eco00010).
#!/usr/bin/env perl
use SOAP::Lite;
$wsdl = 'http://soap.genome.jp/KEGG.wsdl';
$serv = SOAP::Lite -> service($wsdl);
$genes = SOAP::Data->type(array => ["eco:b1002", "eco:b2388"]);
$result = $serv -> mark_pathway_by_objects("path:eco00010", $genes);
print $result; # URL of the generated image
If you use the KEGG API methods which requires arguments in ArrayOfstring datatype, you must need following modifications depending on the version of SOAP::Lite.
As you see in the above example, you always need to convert a Perl's array into a SOAP object expicitly in SOAP::Lite by
SOAP::Data->type(array => [value1, value2, .. ])
when you pass an array as the argument for any KEGG API method.
You should use version >= 0.69 as the versions between 0.61-0.68 contain bugs.
You need to add following code to your program to pass the array of string and/or int data to the SOAP server.
sub SOAP::Serializer::as_ArrayOfstring{
my ($self, $value, $name, $type, $attr) = @_;
return [$name, {'xsi:type' => 'array', %$attr}, $value];
}
sub SOAP::Serializer::as_ArrayOfint{
my ($self, $value, $name, $type, $attr) = @_;
return [$name, {'xsi:type' => 'array', %$attr}, $value];
}
By adding the above, you can write
$genes = ["eco:b1002", "eco:b2388"];
instead of the following (writing as follows is also permitted).
$genes = SOAP::Data->type(array => ["eco:b1002", "eco:b2388"]);
You can test with the following script for the SOAP::Lite v0.69. If it works, a URL of the generated image will be returned.
#!/usr/bin/env perl
use SOAP::Lite +trace => [qw(debug)];
print "SOAP::Lite = ", $SOAP::Lite::VERSION, "\n";
my $serv = SOAP::Lite -> service("http://soap.genome.jp/KEGG.wsdl");
my $genes = ["eco:b1002", "eco:b2388"];
my $result = $serv->mark_pathway_by_objects("path:eco00010", $genes);
print $result, "\n";
# sub routines implicitly used in the above code
sub SOAP::Serializer::as_ArrayOfstring{
my ($self, $value, $name, $type, $attr) = @_;
return [$name, {'xsi:type' => 'array', %$attr}, $value];
}
sub SOAP::Serializer::as_ArrayOfint{
my ($self, $value, $name, $type, $attr) = @_;
return [$name, {'xsi:type' => 'array', %$attr}, $value];
}
If you are using Ruby 1.8.1 or later, you are ready to use KEGG API as Ruby already supports SOAP in its standard library.
If your Ruby is 1.6.8 or older, you need to install followings:
Here's a sample code for Ruby having the same functionality with Perl's first example shown above.
#!/usr/bin/env ruby
require 'soap/wsdlDriver'
wsdl = "http://soap.genome.jp/KEGG.wsdl"
serv = SOAP::WSDLDriverFactory.new(wsdl).create_rpc_driver
serv.generate_explicit_type = true
# if uncommented, you can see transactions for debug
#serv.wiredump_dev = STDERR
offset = 1
limit = 5
top5 = serv.get_best_neighbors_by_gene('eco:b0002', offset, limit)
top5.each do |hit|
print hit.genes_id1, "\t", hit.genes_id2, "\t", hit.sw_score, "\n"
end
You may need to iterate to obtain all the results by increasing offset and/or limit.
#!/usr/bin/env ruby
require 'soap/wsdlDriver'
wsdl = "http://soap.genome.jp/KEGG.wsdl"
serv = SOAP::WSDLDriverFactory.new(wsdl).create_rpc_driver
serv.generate_explicit_type = true
offset = 1
limit = 100
loop do
results = serv.get_best_neighbors_by_gene('eco:b0002', offset, limit)
break unless results
results.each do |hit|
print hit.genes_id1, "\t", hit.genes_id2, "\t", hit.sw_score, "\n"
end
offset += limit
end
It is automatically done by using BioRuby library, which implements get_all_* methods for this. BioRuby also provides filtering functionality for selecting needed fields from the complex data type.
#!/usr/bin/env ruby
require 'bio'
serv = Bio::KEGG::API.new
results = serv.get_all_best_neighbors_by_gene('eco:b0002')
results.each do |hit|
print hit.genes_id1, "\t", hit.genes_id2, "\t", hit.sw_score, "\n"
end
# Same as above but using filter to select fields
fields = [:genes_id1, :genes_id2, :sw_score]
results.each do |hit|
puts hit.filter(fields).join("\t")
end
# Different filters to pick additional fields for each amino acid sequence
fields1 = [:genes_id1, :start_position1, :end_position1, :best_flag_1to2]
fields2 = [:genes_id2, :start_position2, :end_position2, :best_flag_2to1]
results.each do |hit|
print "> score: ", hit.sw_score, ", identity: ", hit.identity, "\n"
print "1:\t", hit.filter(fields1).join("\t"), "\n"
print "2:\t", hit.filter(fields2).join("\t"), "\n"
end
The equivalent for the Perl's second example described above will be
#!/usr/bin/env ruby
require 'bio'
serv = Bio::KEGG::API.new
list = serv.list_pathways("eco")
list.each do |path|
print path.entry_id, "\t", path.definition, "\n"
end
and equivalent for the last example is as follows.
#!/usr/bin/env ruby
require 'bio'
serv = Bio::KEGG::API.new
genes = ["eco:b1002", "eco:b2388"]
result = serv.mark_pathway_by_objects("path:eco00010", genes)
print result # URL of the generated image
In the case of Python, you have to install
plus some extra packages required for SOAPpy ( fpconst, PyXML etc.).
Here's a sample code using KEGG API with Python.
#!/usr/bin/env python
from SOAPpy import WSDL
wsdl = 'http://soap.genome.jp/KEGG.wsdl'
serv = WSDL.Proxy(wsdl)
results = serv.get_genes_by_pathway('path:eco00020')
print results
In the case of Java, you need to obtain Apache Axis library version axis-1_2alpha or newer (axis-1_1 doesn't work properly for KEGG API)
and put required jar files in an appropriate directory.
For the binary distribution of the Apache axis-1_2alpha release, copy the jar files stored under the axis-1_2alpha/lib/ to the directory of your choice.
% cp axis-1_2alpha/lib/*.jar /path/to/lib/
You can use WSDL2Java coming with Apache Axis to generate classes needed for the KEGG API automatically.
To generate classes and documents for the KEGG API, download the script axisfix.pl and follow the steps below:
% java -classpath /path/to/lib/axis.jar:/path/to/lib/jaxrpc.jar:/path/to/lib/commons-logging.jar:/path/to/lib/commons-discovery.jar:/path/to/lib/saaj.jar:/path/to/lib/wsdl4j.jar:. org.apache.axis.wsdl.WSDL2Java -p keggapi http://soap.genome.jp/KEGG.wsdl % perl -i axisfix.pl keggapi/KEGGBindingStub.java % javac -classpath /path/to/lib/axis.jar:/path/to/lib/jaxrpc.jar:/path/to/lib/wsdl4j.jar:. keggapi/KEGGLocator.java % jar cvf keggapi.jar keggapi/* % javadoc -classpath /path/to/lib/axis.jar:/path/to/lib/jaxrpc.jar -d keggapi_javadoc keggapi/*.java
This program will do the same job as the Python's example (extended to accept a pathway_id as the argument).
import keggapi.*;
class GetGenesByPathway {
public static void main(String[] args) throws Exception {
KEGGLocator locator = new KEGGLocator();
KEGGPortType serv = locator.getKEGGPort();
String query = args[0];
String[] results = serv.get_genes_by_pathway(query);
for (int i = 0; i < results.length; i++) {
System.out.println(results[i]);
}
}
}
This is another example which uses ArrayOfSSDBRelation data type.
import keggapi.*;
class GetBestNeighborsByGene {
public static void main(String[] args) throws Exception {
KEGGLocator locator = new KEGGLocator();
KEGGPortType serv = locator.getKEGGPort();
String query = args[0];
SSDBRelation[] results = null;
results = serv.get_best_neighbors_by_gene(query, 1, 50);
for (int i = 0; i < results.length; i++) {
String gene1 = results[i].getGenes_id1();
String gene2 = results[i].getGenes_id2();
int score = results[i].getSw_score();
System.out.println(gene1 + "\t" + gene2 + "\t" + score);
}
}
}
Compile and execute this program (don't forget to include keggapi.jar file in your classpath) as follows:
% javac -classpath /path/to/lib/axis.jar:/path/to/lib/jaxrpc.jar:/path/to/lib/wsdl4j.jar:/path/to/keggapi.jar GetBestNeighborsByGene.java % java -classpath /path/to/lib/axis.jar:/path/to/lib/jaxrpc.jar:/path/to/lib/commons-logging.jar:/path/to/lib/commons-discovery.jar:/path/to/lib/saaj.jar:/path/to/lib/wsdl4j.jar:/path/to/keggapi.jar:. GetBestNeighborsByGene eco:b0002
You may wish to set the CLASSPATH environmental variable.
bash/zsh:
% for i in /path/to/lib/*.jar
do
CLASSPATH="${CLASSPATH}:${i}"
done
% export CLASSPATH
tcsh:
% foreach i ( /path/to/lib/*.jar )
setenv CLASSPATH ${CLASSPATH}:${i}
end
For the other cases, consult the javadoc pages generated by WSDL2Java.
Users can use a WSDL file to create a SOAP client driver. The WSDL file for the KEGG API can be found at:
Related site:
Many of the KEGG API methods will return a set of values in a complex data structure as described below. This section summarizes all kind of these data types. Note that, the retuened values for the empty result will be
SSDBRelation data type contains the following fields:
genes_id1 genes_id of the query (string) genes_id2 genes_id of the target (string) sw_score Smith-Waterman score between genes_id1 and genes_id2 (int) bit_score bit score between genes_id1 and genes_id2 (float) identity identity between genes_id1 and genes_id2 (float) overlap overlap length between genes_id1 and genes_id2 (int) start_position1 start position of the alignment in genes_id1 (int) end_position1 end position of the alignment in genes_id1 (int) start_position2 start position of the alignment in genes_id2 (int) end_position2 end position of the alignment in genes_id2 (int) best_flag_1to2 best flag from genes_id1 to genes_id2 (boolean) best_flag_2to1 best flag from genes_id2 to genes_id1 (boolean) definition1 definition string of the genes_id1 (string) definition2 definition string of the genes_id2 (string) length1 amino acid length of the genes_id1 (int) length2 amino acid length of the genes_id2 (int)
ArrayOfSSDBRelation data type is a list of the SSDBRelation data type.
MotifResult data type contains the following fields:
motif_id motif_id of the motif (string) definition definition of the motif (string) genes_id genes_id of the gene containing the motif (string) start_position start position of the motif match (int) end_position end position of the motif match (int) score score of the motif match for TIGRFAM and PROSITE (float) evalue E-value of the motif match for Pfam (double)
Note: 'score' and/or 'evalue' is set to -1 if the corresponding value is not applicable.
ArrayOfMotifResult data type is a list of the MotifResult data type.
Definition data type contains the following fields:
entry_id database entry_id (string) definition definition of the entry (string)
ArrayOfDefinition data type is a list of the Definition data type.
LinkDBRelation data type contains the following fields:
entry_id1 entry_id of the starting entry (string) entry_id2 entry_id of the terminal entry (string) type one of the "original", "reverse", or "equivalent" (string) path link information across the databases (string) # obsolete
Notice: due to the incompatible change on the server side program which is used to calculate link information, path information could not be returned as of the KEGG API v6.2. We also plan to remove path field from the LinkDBRelation data type in the next release (v7.0).
ArrayOfLinkDBRelation data type is a list of the LinkDBRelation data type.
PathwayElement represents the object on the KEGG PATHWAY map. PathwayElement data type contains the following fields:
element_id unique identifier of the object on the pathway (int)
type type of the object ("gene", "enzyme" etc.) (string)
names array of names of the object (ArrayOfstring)
components array of element_ids of the group components (ArrayOfint)
ArrayOfPathwayElement data type is a list of the PathwayElement data type.
PathwayElementRelation represents the relationship between PathwayElements. PathwayElementRelation data type contains the following fields:
element_id1 unique identifier of the object on the pathway (int)
element_id2 unique identifier of the object on the pathway (int)
type type of relation ("ECrel", "maplink" etc.) (string)
subtypes array of objects involved in the relation (ArrayOfSubtype)
ArrayOfPathwayElementRelation data type is a list of the PathwayElementRelation data type.
Subtype is used in the PathwayElementRelation data type to represent the object involved in the relation. Subtype data type contains the following fields:
element_id unique identifier of the object on the pathway (int)
relation kind of relation ("compound", "inhibition" etc.) (string)
type type of relation ("+p", "--|" etc.) (string)
ArrayOfSubtype data type is a list of the Subtype data type.
StructureAlignment represents structural alignment of nodes between two molecules with score. StructureAlignment data type contains the following fields:
target_id entry_id of the target (string) score alignment score (float) query_nodes indices of aligned nodes in the query molecule (ArrayOfint) target_nodes indices of aligned nodes in the target molecule (ArrayOfint)
ArrayOfStructureAlignment data type is a list of the StructureAlignment data type.
This section describes the APIs for retrieving the general information concerning latest version of the KEGG database.
list_databases
List of database names and its definitions available on the GenomeNet is returned.
Return value:
ArrayOfDefinition (db, definition)
Related site:
list_organisms
List up the organisms in the KEGG/GENES database. 'org' code and the organism's full name is returned in the Definition data type.
Return value:
ArrayOfDefinition (org, definition)
Related site:
list_pathways(string:org)
List up the pathway maps of the given organism in the KEGG/PATHWAY database. Passing the string "map" as its argument, this method returns a list of the reference pathways.
Return value:
ArrayOfDefinition (pathway_id, definition)
Related site:
This section describes the wrapper methods for DBGET system developed at the GenomeNet. For more details on DBGET system, see:
Related site:
binfo(string:db)
Show the version information of the specified database. Passing the string "all" as its argument, this method returns the version information of all databases available on the GenomeNet.
Return value:
string
Example:
# Show the information of the latest GenBank database.
binfo("gb")
bfind(string:str)
Wrapper method for bfind command. bfind is used for searching entries by keywords. User need to specify a database from those which are supported by DBGET system before keywords. Number of keywords given at a time is restricted up to 100.
Return value:
string
Example:
# Returns the IDs and definitions of entries which have definition
# including the word 'E-cadherin' and 'human' from GenBank.
bfind("gb E-cadherin human")
bget(string:str)
The bget command is used for retrieving database entries specified by a list of 'entry_id'. This method accepts all the bget command line options as a string. Number of entries retrieved at a time is restricted up to 100.
Return value:
string
Example:
# retrieve two KEGG/GENES entries
bget("eco:b0002 hin:tRNA-Cys-1")
# retrieve nucleic acid sequences in a FASTA format
bget("-f -n n eco:b0002 hin:tRNA-Cys-1")
# retrieve amino acid sequence in a FASTA format
bget("-f -n a eco:b0002")
btit(string:str)
Wrapper method for btit command. btit is used for retrieving the definitions by given database entries. Number of entries given at a time is restricted up to 100.
Return value:
string
Example:
# Returns the ids and definitions of four GENES entries "hsa:1798",
# "mmu:13478", "dme:CG5287-PA" and cel:Y60A3A.14".
btit("hsa:1798 mmu:13478 dme:CG5287-PA cel:Y60A3A.14")
bconv(string:str)
The bconv command converts external IDs to KEGG IDs. Currently, following external databases are available.
External database Database prefix ----------------- --------------- NCBI GI ncbi-gi: NCBI GeneID ncbi-geneid: GenBank genbank: UniGene unigene: UniProt uniprot:
The result is a tab separated pair of the given ID and the converted ID in each line.
Return value:
string
Example:
# Convert NCBI GI and NCBI GeneID to KEGG genes_id
serv.bconv("ncbi-gi:10047086 ncbi-gi:10047090 ncbi-geneid:14751")
Related site:
get_linkdb_by_entry(string:entry_id, string:db, int:offset, int:limit)
Retrieve the database entries linked from the user specified database entry. It can also be specified the targeted database.
Return value:
ArrayOfLinkDBRelation
Example:
# Get the entries of KEGG/PATHWAY database linked from the entry 'eco:b0002'.
get_linkdb_by_entry('eco:b0002', 'pathway', 1, 10)
get_linkdb_by_entry('eco:b0002', 'pathway', 11, 10)
Related site:
get_linkdb_between_databases(string:from_db, string:to_db, int:offset, int:limit)
Retrieve all links between entries among the given two databases.
Return value:
ArrayOfLinkDBRelation
Example:
# Get all links from "eco" (KEGG GENES) to "pathway" (KEGG PATHWAY)
# databases.
get_linkdb_between_databases("eco", "pathway", 1, 100)
# Print the contents of obtained links in Ruby language
links = get_linkdb_between_databases("eco", "pathway", 1, 100)
links.each do |link|
puts link.entry_id1 # => "eco:b0084"
puts link.entry_id2 # => "path:map00550"
puts link.type # => "indirect"
puts link.path # => "eco->ec->path"
end
Related site:
get_genes_by_enzyme(string:enzyme_id, string:org)
Retrieve all genes of the given organism.
Return value:
ArrayOfstring (genes_id)
Example:
# Returns all the GENES entry IDs in E.coli genome which are assigned
# EC number ec:1.2.1.1
get_genes_by_enzyme('ec:1.2.1.1', 'eco')
get_enzymes_by_gene(string:genes_id)
Retrieve all the EC numbers which are assigned to the given gene.
Return value:
ArrayOfstring (enzyme_id)
Example:
# Returns the EC numbers which are assigned to E.coli genes b0002
get_enzymes_by_gene('eco:b0002')
get_enzymes_by_compound(string:compound_id)
Retrieve all enzymes which have a link to the given compound_id.
Return value:
ArrayOfstring (enzyme_id)
Example:
# Returns the ENZYME entry IDs which have a link to the COMPOUND entry,
# 'cpd:C00345'
get_enzymes_by_compound('cpd:C00345')
get_enzymes_by_glycan(string:glycan_id)
Retrieve all enzymes which have a link to the given glycan_id.
Return value:
ArrayOfstring (enzyme_id)
Example
# Returns the ENZYME entry IDs which have a link to the GLYCAN entry,
# 'gl:G00001'
get_enzymes_by_glycan('gl:G00001')
get_enzymes_by_reaction(string:reaction_id)
Retrieve all enzymes which have a link to the given reaction_id.
Return value:
ArrayOfstring (enzyme_id)
Example:
# Returns the ENZYME entry IDs which have a link to the REACTION entry,
# 'rn:R00100'.
get_enzymes_by_reaction('rn:R00100')
get_compounds_by_enzyme(string:enzyme_id)
Retrieve all compounds which have a link to the given enzyme_id.
Return value:
ArrayOfstring (compound_id)
Example:
# Returns the COMPOUND entry IDs which have a link to the ENZYME entry,
# 'ec:2.7.1.12'.
get_compounds_by_enzyme('ec:2.7.1.12')
get_compounds_by_reaction(reaction_id)
Retrieve all compounds which have a link to the given reaction_id.
Return value:
ArrayOfstring (compound_id)
Example:
# Returns the COMPOUND entry IDs which have a link to the REACTION entry,
# 'rn:R00100'
get_compounds_by_reaction('rn:R00100')
get_glycans_by_enzyme(string:enzyme_id)
Retrieve all glycans which have a link to the given enzyme_id.
Return value:
ArrayOfstring (glycan_id)
Example
# Returns the GLYCAN entry IDs which have a link to the ENZYME entry,
# 'ec:2.4.1.141'
get_glycans_by_enzyme('ec:2.4.1.141')
get_glycans_by_reaction(string:reaction_id)
Retrieve all glycans which have a link to the given reaction_id.
Return value:
ArrayOfstring (glycan_id)
Example
# Returns the GLYCAN entry IDs which have a link to the REACTION entry,
# 'rn:R06164'
get_glycans_by_reaction('rn:R06164')
get_reactions_by_enzyme(string:enzyme_id)
Retrieve all reactions which have a link to the given enzyme_id.
Return value:
ArrayOfstring (reaction_id)
Example:
# Returns the REACTION entry IDs which have a link to the ENZYME entry,
# 'ec:2.7.1.12'
get_reactions_by_enzyme('ec:2.7.1.12')
get_reactions_by_compound(string:compound_id)
Retrieve all reactions which have a link to the given compound_id.
Return value:
ArrayOfstring (reaction_id)
Example:
# Returns the REACTION entry IDs which have a link to the COMPOUND entry,
# 'cpd:C00199'
get_reactions_by_compound('cpd:C00199')
get_reactions_by_glycan(string:glycan_id)
Retrieve all reactions which have a link to the given glycan_id.
Return value:
ArrayOfstring (reaction_id)
Example
# Returns the REACTION entry IDs which have a link to the GLYCAN entry,
# 'gl:G00001'
get_reactions_by_glycan('gl:G00001')
This section describes the APIs for SSDB database. For more details on SSDB, see:
get_best_best_neighbors_by_gene(string:genes_id, int:offset, int:limit)
Search best-best neighbor of the gene in all organisms.
Return value:
ArrayOfSSDBRelation
Example:
# List up best-best neighbors of 'eco:b0002'.
get_best_best_neighbors_by_gene('eco:b0002', 1, 10)
get_best_best_neighbors_by_gene('eco:b0002', 11, 10)
get_best_neighbors_by_gene(string:genes_id, int:offset, int:limit)
Search best neighbors in all organism.
Return value:
ArrayOfSSDBRelation
Example:
# List up best neighbors of 'eco:b0002'.
get_best_neighbors_by_gene('eco:b0002', 1, 10)
get_best_neighbors_by_gene('eco:b0002', 11, 10)
get_reverse_best_neighbors_by_gene(string:genes_id, int:offset, int:limit)
Search reverse best neighbors in all organisms.
Return value:
ArrayOfSSDBRelation
Example:
# List up reverse best neighbors of 'eco:b0002'.
get_reverse_best_neighbors_by_gene('eco:b0002', 1, 10)
get_reverse_best_neighbors_by_gene('eco:b0002', 11, 10)
get_paralogs_by_gene(string:genes_id, int:offset, int:limit)
Search paralogous genes of the given gene in the same organism.
Return value:
ArrayOfSSDBRelation
Example:
# List up paralogous genes of 'eco:b0002'.
get_paralogs_by_gene('eco:b0002', 1, 10)
get_paralogs_by_gene('eco:b0002', 11, 10)
get_motifs_by_gene(string:genes_id, string:db)
Search motifs in the specified gene. The value of 'db' can be 'pfam' for Pfam, 'pspt' for PROSITE pattern, 'pspf' for PROSITE profile or 'all' for all the above.
Return value:
ArrayOfMotifResult
Example:
# Returns the all pfam motifs in the E. coli gene 'b0002'
get_motifs_by_gene('eco:b0002', 'pfam')
get_genes_by_motifs([string]:motif_id_list, int:offset, int:limit)
Search all genes which contains all of the specified motifs.
Return value:
ArrayOfDefinition (genes_id, definition)
Example:
# Returns all genes which have Pfam 'DnaJ' and Prosite 'DNAJ_2' motifs. list = ['pf:DnaJ', 'ps:DNAJ_2'] get_genes_by_motifs(list, 1, 10) get_genes_by_motifs(list, 11, 10)
get_ko_by_gene(string:genes_id)
Search all KOs to which given genes_id belongs.
Return value:
ArrayOfstring (ko_id)
Example:
# Returns ko_ids to which GENES entry 'eco:b0002' belongs.
get_ko_by_gene('eco:b0002')
get_ko_by_ko_class(string:ko_class_id)
Return all KOs which belong to the given ko_class_id.
Return value:
ArrayOfDefinition (ko_id, definition)
Example:
# Returns ko_ids which belong to the KO class '01196'.
get_ko_by_ko_class('01196')
get_genes_by_ko_class(string:ko_class_id, string:org, int:offset, int:limit)
Retrieve all genes of the specified organism which are classified under the given ko_class_id.
Return value:
ArrayOfDefinition (genes_id, definition)
Example:
# Returns first 100 human genes which belong to the KO class '00930'
get_genes_by_ko_class('00903', 'hsa' , 1, 100)
get_genes_by_ko(string:ko_id, string:org)
Retrieve all genes of the specified organism which belong to the given ko_id.
Return value:
ArrayOfDefinition (genes_id, definition)
Example
# Returns E.coli genes which belong to the KO 'K00001'
get_genes_by_ko('ko:K00001', 'eco')
# Returns genes of all organisms which are assigned to the KO 'K00010'
get_genes_by_ko('ko:K00010', 'all')
This section describes the APIs for PATHWAY database. For more details on PATHWAY database, see:
Related site:
mark_pathway_by_objects(string:pathway_id, [string]:object_id_list)
Mark the given objects on the given pathway map and return the URL of the generated image.
Return value:
string (URL)
Example:
# Returns the URL of the generated image for the given map 'path:eco00260'
# with objects corresponding to 'eco:b0002' and 'cpd:C00263' colored in red.
obj_list = ['eco:b0002', 'cpd:C00263']
mark_pathway_by_objects('path:eco00260', obj_list)
color_pathway_by_objects(string:pathway_id, [string]:object_id_list, [string]:fg_color_list, [string]:bg_color_list)
Color the given objects on the pathway map with the specified colors and return the URL of the colored image. In the KEGG pathway maps, a gene or enzyme is represented by a rectangle and a compound is shown as a small circle. 'fg_color_list' is used for specifying the color of text and border of the given objects and 'bg_color_list' is used for its background area. The order of colors in these lists correspond with the order of objects in the 'object_id_list' list.
Return value:
string (URL)
Example:
# Returns the URL for the given pathway 'path:eco00260' with genes
# 'eco:b0514' colored in red with yellow background and
# 'eco:b2913' colored in green with yellow background.
obj_list = ['eco:b0514', 'eco:b2913']
fg_list = ['#ff0000', '#00ff00']
bg_list = ['#ffff00', 'yellow']
color_pathway_by_objects('path:eco00260', obj_list, fg_list, bg_list)
color_pathway_by_elements(string:pathway_id, [int]:element_id_list, [string]:fg_color_list, [string]:bg_color_list)
Color the objects (rectangles and circles on a pathway map) corresponding to the given 'element_id_list' with the specified colors and return the URL of the colored image. 'fg_color_list' is used for specifying the color of text and border of the objects with given 'element_id_list' and 'bg_color_list' is used for its background area. The order of colors in these lists correspond with the order of objects in the 'element_id_list' list.
This method is useful to specify which graphical object on the pathway to be colored as there are some cases that multiple genes are assigned to one rectangle or a gene is assigned to more than one rectangle on the pathway map. The 'element_id' is an unique numerical identifier on the pathway defined by the KGML (XML represeentation of the KEGG PATHWAY) in the <entry> tag. List of the 'element_id's can be obtained by the 'get_elements_by_pathway' method.
For more details on KGML, see:
Return value:
string (URL)
Example:
# Returns the URL of the colored image of given pathway 'path:bsu00010' with
# * gene bsu:BG11350 (element_id 78, ec:3.2.1.86) colored in red on yellow
# * gene bsu:BG11203 (element_id 79, ec:3.2.1.86) colored in blue on yellow
# * gene bsu:BG11685 (element_id 51, ec:2.7.1.2) colored in red on orange
# * gene bsu:BG11685 (element_id 47, ec:2.7.1.2) colored in blue on orange
element_id_list = [ 78, 79, 51, 47 ]
fg_list = [ '#ff0000', '#0000ff', '#ff0000', '#0000ff' ]
bg_list = [ '#ffff00', '#ffff00', '#ffcc00', '#ffcc00' ]
color_pathway_by_elements('path:bsu00010', element_id_list, fg_list, bg_list)
get_html_of_marked_pathway_by_objects(string:pathway_id, [string]:object_id_list)
HTML version of the 'mark_pathway_by_objects' method. Mark the given objects on the given pathway map and return the URL of the HTML with the generated image as a clickable map.
Return value:
string (URL)
Example:
# Returns the URL of the HTML which can be passed to the web browser
# as a clickable map of the generated image of the given pathway
# 'path:eco00970' with three objects corresponding to 'eco:b4258',
# 'cpd:C00135' and 'ko:K01881' colored in red.
obj_list = ['eco:b4258', 'cpd:C00135', 'ko:K01881']
get_html_of_marked_pathway_by_objects('path:eco00970', obj_list)
get_html_of_colored_pathway_by_objects(string:pathway_id, [string]:object_id_list, [string]:fg_color_list, [string]:bg_color_list)
HTML version of the 'color_pathway_by_object' method. Color the given objects on the pathway map with the specified colors and return the URL of the HTML containing the colored image as a clickable map.
Return value:
string (URL)
Example:
# Returns the URL of the HTML which can be passed to the web browser
# as a clickable map of colored image of the given pathway 'path:eco00970'
# with a gene 'eco:b4258' colored in gray/red, a compound 'cpd:C00135'
# colored in green/yellow and a KO 'ko:K01881' colored in blue/orange.
obj_list = ['eco:b4258', 'cpd:C00135', 'ko:K01881']
fg_list = ['gray', '#00ff00', 'blue']
bg_list = ['#ff0000', 'yellow', 'orange']
get_html_of_colored_pathway_by_objects('path:eco00970', obj_list, fg_list, bg_list)
get_html_of_colored_pathway_by_elements(string:pathway_id, [int]:element_id_list, [string]:fg_color_list, [string]:bg_color_list)
HTML version of the 'color_pathway_by_elements' method. Color the objects corresponding to the given 'element_id_list' on the pathway map with the specified colors and return the URL of the HTML containing the colored image as a clickable map.
Return value:
string (URL)
Example:
# Returns the URL of the HTML which can be passed to the web browser as a
# clickable map of colored image of the given pathway 'path:bsu00010' with
# * gene bsu:BG11350 (element_id 78, ec:3.2.1.86) colored in red on yellow
# * gene bsu:BG11203 (element_id 79, ec:3.2.1.86) colored in blue on yellow
# * gene bsu:BG11685 (element_id 51, ec:2.7.1.2) colored in red on orange
# * gene bsu:BG11685 (element_id 47, ec:2.7.1.2) colored in blue on orange
element_id_list = [ 78, 79, 51, 47 ]
fg_list = [ '#ff0000', '#0000ff', '#ff0000', '#0000ff' ]
bg_list = [ '#ffff00', '#ffff00', '#ffcc00', '#ffcc00' ]
get_html_of_colored_pathway_by_elements('path:bsu00010', element_id_list, fg_list, bg_list)
get_references_by_pathway(string:pathway_id)
Returns all PubMed IDs of the references associated with the specified pathway.
Return value:
ArrayOfint (pubmed_id)
Example:
# Returns a list of PMIDs associated with the reference pathway 'path:map00010'
get_references_by_pathway("path:map00010")
get_element_relations_by_pathway(string:pathway_id)
Search all relations of the objects on the specified pathway.
Return value:
ArrayOfPathwayElementRelation
Example:
# Returns a list of PathwayElementRelation on the pathway map 'path:bsu00010'
relations = get_element_relations_by_pathway('path:bsu00010')
# Print the contents of obtained relations in Ruby language
relations.each do |rel|
puts rel.element_id1
puts rel.element_id2
puts rel.type
rel.subtypes.each do |sub|
puts sub.element_id
puts sub.relation
puts sub.type
end
end
get_elements_by_pathway(string:pathway_id)
Search all objects on the specified pathway. This method will be used in combination with the color_pathway_by_elements method to distingish graphical objects on the pathway sharing the same name.
Return value:
ArrayOfPathwayElement
Example:
# Returns list of PathwayElement on the pathway map 'path:bsu00010'
get_elements_by_pathway('path:bsu00010')
# Find entry_ids for genes 'bsu:BG11350', 'bsu:BG11203' and 'bsu:BG11685'
# in Ruby language
elems = serv.get_elements_by_pathway('path:bsu00010')
genes = [ 'bsu:BG11350', 'bsu:BG11203', 'bsu:BG11685' ]
elems.each do |elem|
genes.each do |gene|
if elem.names.include?(gene)
puts gene, elem.element_id
end
end
end
get_genes_by_pathway(string:pathway_id)
Search all genes on the specified pathway. Organism name is given by the name of the pathway map.
Return value:
ArrayOfstring (genes_id)
Example:
# Returns all E. coli genes on the pathway map '00020'.
get_genes_by_pathway('path:eco00020')
get_enzymes_by_pathway(string:pathway_id)
Search all enzymes on the specified pathway.
Return value:
ArrayOfstring (enzyme_id)
Example:
# Returns all E. coli enzymes on the pathway map '00020'.
get_enzymes_by_pathway('path:eco00020')
get_compounds_by_pathway(string:pathway_id)
Search all compounds on the specified pathway.
Return value:
ArrayOfstring (compound_id)
Example:
# Returns all E. coli compounds on the pathway map '00020'.
get_compounds_by_pathway('path:eco00020')
get_drugs_by_pathway(string:drug_id)
Search all drugs on the specified pathway.
Return value:
ArrayOfstring (drug_id)
Example:
# Returns all drugs on the pathway map '07025'.
get_compounds_by_pathway('path:map07025')
get_glycans_by_pathway(string:pathway_id)
Search all glycans on the specified pathway.
Return value:
ArrayOfstring (glycan_id)
Example
# Returns all E. coli glycans on the pathway map '00510'
get_glycans_by_pathway('path:eco00510')
get_reactions_by_pathway(string:pathway_id)
Retrieve all reactions on the specified pathway.
Return value:
ArrayOfstring (reaction_id)
Example:
# Returns all E. coli reactions on the pathway map '00260'
get_reactions_by_pathway('path:eco00260')
get_kos_by_pathway(string:pathway_id)
Retrieve all KOs on the specified pathway.
Return value:
ArrayOfstring (ko_id)
Example:
# Returns all ko_ids on the pathway map 'path:hsa00010'
get_kos_by_pathway('path:hsa00010')
Related site:
get_pathways_by_genes([string]:genes_id_list)
Search all pathways which include all the given genes. How to pass the list of genes_id will depend on the language specific implementations.
Return value:
ArrayOfstring (pathway_id)
Example:
# Returns all pathways including E. coli genes 'b0077' and 'b0078' get_pathways_by_genes(['eco:b0077' , 'eco:b0078'])
get_pathways_by_enzymes([string]:enzyme_id_list)
Search all pathways which include all the given enzymes.
Return value:
ArrayOfstring (pathway_id)
Example:
# Returns all pathways including an enzyme '1.3.99.1' get_pathways_by_enzymes(['ec:1.3.99.1'])
get_pathways_by_compounds([string]:compound_id_list)
Search all pathways which include all the given compounds.
Return value:
ArrayOfstring (pathway_id)
Example:
# Returns all pathways including compounds 'C00033' and 'C00158' get_pathways_by_compounds(['cpd:C00033', 'cpd:C00158'])
get_pathways_by_drugs([string]:drug_id_list)
Search all pathways which include all the given drugs.
Return value:
ArrayOfstring (pathway_id)
Example:
# Returns all pathways including drugs 'D00204' and 'D01053' get_pathways_by_drugs(['dr:D00204', 'dr:D01053'])
get_pathways_by_glycans([string]:glycan_id_list)
Search all pathways which include all the given glycans.
Return value:
ArrayOfstring (pathway_id)
Example
# Returns all pathways including glycans 'G00009' and 'G00011' get_pathways_by_glycans(['gl:G00009', 'gl:G00011'])
get_pathways_by_reactions([string]:reaction_id_list)
Retrieve all pathways which include all the given reaction_ids.
Return value:
ArrayOfstring (pathway_id)
Example:
# Returns all pathways including reactions 'rn:R00959', 'rn:R02740', # 'rn:R00960' and 'rn:R01786' get_pathways_by_reactions(['rn:R00959', 'rn:R02740', 'rn:R00960', 'rn:R01786'])
get_pathways_by_kos([string]:ko_id_list, string:org)
Retrieve all pathways of the organisms which include all the given KO IDs.
Return value:
ArrayOfstring (pathway_id)
Example:
# Returns all human pathways including 'ko:K00016' and 'ko:K00382' get_pathways_by_kos(['ko:K00016', 'ko:K00382'], 'hsa') # Returns pathways of all organisms including 'ko:K00016' and 'ko:K00382' get_pathways_by_kos(['ko:K00016', 'ko:K00382'], 'all')
get_linked_pathways(string:pathway_id)
Retrieve all pathways which are linked from a given pathway_id.
Return value:
ArrayOfstring (pathway_id)
Example:
# Returns IDs of PATHWAY entries linked from 'path:eco00620'.
get_linked_pathways('path:eco00620')
This section describes the APIs for GENES database. For more details on GENES database, see:
get_genes_by_organism(string:org, int:offset, int:limit)
Retrieve all genes of the specified organism.
Return value:
ArrayOfstring (genes_id)
Example:
# Retrive hundred H. influenzae genes at once.
get_genes_by_organism('hin', 1, 100)
get_genes_by_organism('hin', 101, 100)
This section describes the APIs for GENOME database. For more details on GENOME database, see:
get_number_of_genes_by_organism(string:org)
Get the number of genes coded in the specified organism's genome.
Return value:
int
Example:
# Get the number of the genes on the E.coli genome.
get_number_of_genes_by_organism('eco')
This section describes the APIs for LIGAND database.
Related site:
convert_mol_to_kcf(string:mol)
Convert a MOL format into the KCF format.
Return value:
string
Example:
convert_mol_to_kcf(mol_str)
search_compounds_by_name(string:name)
Returns a list of compounds having the specified name.
Return value:
ArrayOfstring (compound_id)
Example:
search_compounds_by_name("shikimic acid")
search_drugs_by_name(string:name)
Returns a list of drugs having the specified name.
Return value:
ArrayOfstring (drug_id)
Example:
search_drugs_by_name("tetracyclin")
search_glycans_by_name(string:name)
Returns a list of glycans having the specified name.
Return value:
ArrayOfstring (glycan_id)
Example:
search_glycans_by_name("Paragloboside")
search_compounds_by_composition(string:composition)
Returns a list of compounds containing elements indicated by the composition. Order of the elements is insensitive.
Return value:
ArrayOfstring (compound_id)
Example:
search_compounds_by_composition("C7H10O5")
search_drugs_by_composition(string:composition)
Returns a list of drugs containing elements indicated by the composition. Order of the elements is insensitive.
Return value:
ArrayOfstring (drug_id)
Example:
search_drugs_by_composition("HCl")
search_glycans_by_composition(string:composition)
Returns a list of glycans containing sugars indicated by the composition. Order of the sugars (in parenthesis with number) is insensitive.
Return value:
ArrayOfstring (glycan_id)
Example:
search_glycans_by_composition("(Man)4 (GalNAc)1")
search_compounds_by_mass(float:mass, float:range)
Returns a list of compounds having the molecular weight around 'mass' with some ambiguity (range).
Return value:
ArrayOfstring (compound_id)
Example:
search_compounds_by_mass(174.05, 0.1)
search_drugs_by_mass(float:mass, float:range)
Returns a list of drugs having the molecular weight around 'mass' with some ambiguity (range).
Return value:
ArrayOfstring (drug_id)
Example:
search_drugs_by_mass(150, 1.0)
search_glycans_by_mass(float:mass, float:range)
Returns a list of glycans having a molecular weight around 'mass' with some ambiguity (range).
Return value:
ArrayOfstring (glycan_id)
Example:
search_glycans_by_mass(174.05, 0.1)
search_compounds_by_subcomp(string:mol, int:offset, int:limit)
Returns a list of compounds with the alignment having common sub-structure calculated by the subcomp program.
You can obtain a MOL formatted structural data of matched compounds using bget method with the "-f m" option to confirm the alignment.
Return value:
ArrayOfStructureAlignment
Example:
mol = bget("-f m cpd:C00111")
search_compounds_by_subcomp(mol, 1, 5)
Related site:
search_drugs_by_subcomp(string:mol, int:offset, int:limit)
Returns a list of drugs with the alignment having common sub-structure calculated by the subcomp program.
You can obtain a MOL formatted structural data of matched drugs using bget method with the "-f m" option to confirm the alignment.
Return value:
ArrayOfStructureAlignment
Example:
mol = bget("-f m dr:D00201")
search_drugs_by_subcomp(mol, 1, 5)
Related site:
search_glycans_by_kcam(string:kcf, string:program, string:option, int:offset, int:limit)
Returns a list of glycans with the alignment having common sub-structure calculated by the KCaM program.
The argument 'program' can be 'gapped' or 'ungaped'. The next argument 'option' can be 'global' or 'local'.
You can obtain a KCF formatted structural data of matched glycans using bget method with the "-f k" option to confirm the alignment.
Return value:
ArrayOfStructureAlignment
Example:
kcf = bget("-f k gl:G12922")
search_glycans_by_kcam(kcf, "gapped", "local", 1, 5)
Related site:
Last updated: December 27, 2006
This document is written and maintained by Toshiaki Katayama.
Copyright (C) 2003, 2004, 2005, 2006 Toshiaki Katayama <k@bioruby.org>