drugBase Documentation
drugBase is a database of drug-target interactions. The interaction data is obtained from MATADOR, a manually annotated resource describing drugs and targets.
SuperTarget - MATADOR
SuperTarget is a drug-target interaction database developed using heterogeneous data resources. The information on drugs is obtained from the SuperDrug database. The drug-target relations are extracted from PubMed abstracts and MeSH (Medical Subject Headings) terms. Additional drug-target interactions found in other existing drug-target interaction databases like DrugBank, TTD, SuperLigands and KEGG are also added to the SuperTarget database after verifying the literature. MATADOR is the manually curated version of drug-target relations in SuperTarget.
Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, Schneider R, Skoblo R, Russell RB, Bourne PE, Bork P and Preissner R.
SuperTarget and Matador: resources for exploring drug-target relationships
Mapping to Entrez Gene
The protein identifiers in MATADOR are defined based on an older version of String, a database for protein and other interactions. We mapped the String (v7.1) protein identifiers onto the corresponding EntrezGene identifiers. Proteins that do not have a corresponding Entrez gene identifier in String are mapped to Entrez Gene using HGNC gene symbols and NCBI Refseq identifiers. In drugBase, 11 proteins from the MATADOR database do not have a corresponding Entrez Gene identifier.
drugBase is designed as a gene centered drug-target database. So, the drug-target interations defined solely based on MeSH terms (without any specific gene/protein) in MATADOR were not included in drugBase.
Implementation
Database
drugBase is built using CouchDB. CouchDB is an open source, non-relational, document oriented database system. CouchDB’s built in web administration console communicates with the database using HTTP requests. The RESTful JSON API provided by CouchDB allows to access the database from any environment that allows HTTP requests (see examples).Interface
The search interface is built using geneSmash, a gene-centric couchDB database. The geneSmash web service converts any of the input gene identifier types to Entrez Gene. The Entrez Gene identifiers are passed to drugBase to retreive the relevant drug-target interations
Replication
CouchDB provides native support
for database
replication. You can use this facility to make (and maintain) a
local copy of the drugBase database. If
you want an identical interface, you should also replicate geneSmash
along with drugBase. If you replicate the
database, we request that you maintain links
to drugBase, geneSmash and the University
of Texas M.D. Anderson Cancer Center.
Web Site
You can search in drugBase for drug-target relationships using various types of gene identifiers. You can use one of
- HGNC Gene Symbol
- Gene Symbol Alias
- Ensembl Identifier
- Entrez Gene Identifier
- NCBI Unigene Identifier
- Human Gene Expression Array Identifiers of
- Affymetrix
- Illumina
- Agilent
The results provide link to the drug information
from NCBI PubChem. and
to the ATC (Anatomical Therapeutic Chemical) code(s) associated with
drugs, provided by the WHO
ATC classification system. The drug-target interaction information
contains MATADOR and/or DRUGBANK score and the type of interaction
(Direct/Indirect). In some cases, link to the associated PubMed
literature and MeSH terms are also provided.
Using drugBase in other programs
- The following code example shows how to use drugBase in the PERL programming environment
- The top two lines loads PERL packages LWP::Simple and JSON which
are used to handle HTTP request and convert JSON objects to Perl data
structure respectively.The The first two lines of
the
getTargetingDrugs
function declare and initiate variables. The next two lines use the Entrez Gene identifier argument to construct an appropriate URL to query drugBase. The next line actually makes the HTTP request to the drugBase server. The following three lines converts the JSON response into an PERL Array of Hash data structure. The final lines extract the relevant part of the response - This version of the code does not perform error checking on the result, so it can probably not be used in production code. Failure can occur because the server is not available, or because the gene identifier passed as an argument is not a valid Entrez Gene identifier, or because no known drug-target interation is available for the gene; all three conditions should be checked.
- The following code example show, how to use drugBase in the R statistical programming environment
- The first line loads an R package that converts between JSON
objects and R objects.The first two lines of
the
getTargetingDrugs
function use the symbol argument to construct an appropriate URL. The next line actually makes the HTTP request to the geneSmash server and converts the JSON response into an R object. This code snippet converts an alternate gene identifier to Entrez gene identifier. The Entrez gene identifier obtained from geneSmash is used to make a drug-target data retreival using similar HTTP request to drugBase. The final lines extract the relevant part of the response - This version of the code does not perform error checking on the result, so it can probably not be used in production code. Failure can occur because the server is not available, or because the symbol passed as an argument is not a valid HGNC symbol, or because no known drug-target interation is available for the gene; all three conditions should be checked.
use LWP::Simple;
use JSON;
sub getTargetingDrugs {
my ($EntrezGeneID, $couchdbView, $dataLink, $dData, %dData, @drugData, @drugList);
$EntrezGeneID = $_[0];
$couchdbView = "HOSTNAME/drugbase/_design/basic/_view/by_EntrezGene";
$dataLink = $couchdbView. '?key="'.$EntrezGeneID.'"';
$dData = get $dataLink;
$dData = decode_json $dData;
%dData = %$dData;
@drugData = @{$dData{"rows"}};
for($i = 0; $i < scalar(@drugData);$i++) {
$drugList[@drugList] = $drugData[$i]->{"value"};
}
return @drugList;
}
my @drugList = getTargetingDrugs("10");
print "@drugList\n";
Some notes on the code:
library(rjson.krc)
#Function to extract drugs targeting a gene (using HGNC Gene Symbol)
getTargetingDrugs <- function(sym) {
giUrl <- "HOSTNAME/genesmash/_design/basic/_view/by_symbol"
glink <- paste(giUrl, "?key=\"", sym, "\"", sep='')
gene_data <- fromJSON(paste(readLines(glink), collapse=''))
Drug_Info <- NA
if(length(gene_data[["rows"]]) != 0) {
GeneID <- gene_data[["rows"]][[1]]$id
#Extracting drug data for the Gene from drugBase
diUrl <- "HOSTNAME/drugbase/_design/basic/_view/by_EntrezGene"
link <- paste(diUrl, "?key=\"", GeneID, "\"&include_docs=true", sep='')
JSON_data <- paste(readLines(link), collapse='')
Drug_data <- fromJSON(JSON_data)
if(is.list(Drug_data) & (length(Drug_data$rows) > 0)) {
Drug_data <- Drug_data$rows
Drug_Info <- matrix(nrow = length(Drug_data), ncol = 5)
colnames(Drug_Info) <- c("Pubchem_ID", "Drug_Name", "ATC_Code",
"MATADOR_Interaction", "DrugBank_Interaction")
for(i in 1:length(Drug_data)) {
row <- Drug_data[[i]]$doc
Drug_Info[i,1] = unlist(row['chemical_id'])
Drug_Info[i,2] = unlist(row['chemical_name'])
Drug_Info[i,3] = ifelse(is.null(unlist(row['atc'])), NA, unlist(row['atc']))
Drug_Info[i,4] = ifelse(is.null(unlist(row['matador_annotation'])),
NA, unlist(row['matador_annotation']))
Drug_Info[i,5] = ifelse(is.null(unlist(row['drugbank_annotation'])),
NA, unlist(row['drugbank_annotation']))
}
}
}
Drug_Info
}
getTargetingDrugs("TP53")
Some notes on the code: