drugBase Documentation

drugBase is a database of drug-target interactions. The interaction data is obtained from MATADOR, a manually annotated resource describing drugs and targets.

SuperTarget - MATADOR

SuperTarget is a drug-target interaction database developed using heterogeneous data resources. The information on drugs is obtained from the SuperDrug database. The drug-target relations are extracted from PubMed abstracts and MeSH (Medical Subject Headings) terms. Additional drug-target interactions found in other existing drug-target interaction databases like DrugBank, TTD, SuperLigands and KEGG are also added to the SuperTarget database after verifying the literature. MATADOR is the manually curated version of drug-target relations in SuperTarget.

Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, Schneider R, Skoblo R, Russell RB, Bourne PE, Bork P and Preissner R.
SuperTarget and Matador: resources for exploring drug-target relationships

Mapping to Entrez Gene

The protein identifiers in MATADOR are defined based on an older version of String, a database for protein and other interactions. We mapped the String (v7.1) protein identifiers onto the corresponding EntrezGene identifiers. Proteins that do not have a corresponding Entrez gene identifier in String are mapped to Entrez Gene using HGNC gene symbols and NCBI Refseq identifiers. In drugBase, 11 proteins from the MATADOR database do not have a corresponding Entrez Gene identifier.

drugBase is designed as a gene centered drug-target database. So, the drug-target interations defined solely based on MeSH terms (without any specific gene/protein) in MATADOR were not included in drugBase.

Implementation

Database

drugBase is built using CouchDB. CouchDB is an open source, non-relational, document oriented database system. CouchDB’s built in web administration console communicates with the database using HTTP requests. The RESTful JSON API provided by CouchDB allows to access the database from any environment that allows HTTP requests (see examples).

Interface

The search interface is built using geneSmash, a gene-centric couchDB database. The geneSmash web service converts any of the input gene identifier types to Entrez Gene. The Entrez Gene identifiers are passed to drugBase to retreive the relevant drug-target interations

Replication

CouchDB provides native support for database replication. You can use this facility to make (and maintain) a local copy of the drugBase database. If you want an identical interface, you should also replicate geneSmash along with drugBase. If you replicate the database, we request that you maintain links to drugBase, geneSmash and the University of Texas M.D. Anderson Cancer Center.

Web Site

You can search in drugBase for drug-target relationships using various types of gene identifiers. You can use one of

HGNC Gene Symbol
Gene Symbol Alias
Ensembl Identifier
Entrez Gene Identifier
NCBI Unigene Identifier
Human Gene Expression Array Identifiers of
- Affymetrix
- Illumina
- Agilent

The results provide link to the drug information from NCBI PubChem. and to the ATC (Anatomical Therapeutic Chemical) code(s) associated with drugs, provided by the WHO ATC classification system. The drug-target interaction information contains MATADOR and/or DRUGBANK score and the type of interaction (Direct/Indirect). In some cases, link to the associated PubMed literature and MeSH terms are also provided.

Using drugBase in other programs

The following code example shows how to use drugBase in the PERL programming environment
```
use LWP::Simple;
use JSON;
sub getTargetingDrugs {
  my ($EntrezGeneID, $couchdbView, $dataLink, $dData, %dData, @drugData, @drugList);
  $EntrezGeneID = $_[0];
  $couchdbView = "HOSTNAME/drugbase/_design/basic/_view/by_EntrezGene";
  $dataLink = $couchdbView. '?key="'.$EntrezGeneID.'"';
  $dData = get $dataLink;
  $dData  = decode_json $dData;
  %dData = %$dData;
  @drugData = @{$dData{"rows"}};
  for($i = 0; $i < scalar(@drugData);$i++) {  
	$drugList[@drugList] = $drugData[$i]->{"value"};
  }
	return @drugList;
} 
my @drugList = getTargetingDrugs("10");
print "@drugList\n";
```
Some notes on the code:
- The top two lines loads PERL packages LWP::Simple and JSON which are used to handle HTTP request and convert JSON objects to Perl data structure respectively.The The first two lines of the getTargetingDrugs function declare and initiate variables. The next two lines use the Entrez Gene identifier argument to construct an appropriate URL to query drugBase. The next line actually makes the HTTP request to the drugBase server. The following three lines converts the JSON response into an PERL Array of Hash data structure. The final lines extract the relevant part of the response
- This version of the code does not perform error checking on the result, so it can probably not be used in production code. Failure can occur because the server is not available, or because the gene identifier passed as an argument is not a valid Entrez Gene identifier, or because no known drug-target interation is available for the gene; all three conditions should be checked.

The following code example show, how to use drugBase in the R statistical programming environment


library(rjson.krc)
#Function to extract drugs targeting a gene (using HGNC Gene Symbol)
getTargetingDrugs <- function(sym) {
  giUrl <- "HOSTNAME/genesmash/_design/basic/_view/by_symbol"
  glink <- paste(giUrl, "?key=\"", sym, "\"", sep='')
  gene_data <- fromJSON(paste(readLines(glink), collapse=''))
  Drug_Info <- NA
  if(length(gene_data[["rows"]]) != 0) {
	  GeneID <- gene_data[["rows"]][[1]]$id
	  
	  #Extracting drug data for the Gene from drugBase
	  diUrl <- "HOSTNAME/drugbase/_design/basic/_view/by_EntrezGene"
	  link <- paste(diUrl, "?key=\"", GeneID, "\"&include_docs=true", sep='')
	  JSON_data <- paste(readLines(link), collapse='')
	  Drug_data <- fromJSON(JSON_data)
	  if(is.list(Drug_data) & (length(Drug_data$rows) > 0)) {
		  Drug_data <- Drug_data$rows
		  Drug_Info <- matrix(nrow = length(Drug_data), ncol = 5)
		  colnames(Drug_Info) <- c("Pubchem_ID", "Drug_Name", "ATC_Code", 
					"MATADOR_Interaction", "DrugBank_Interaction") 
		  for(i in 1:length(Drug_data)) {
			row <- Drug_data[[i]]$doc
			Drug_Info[i,1] = unlist(row['chemical_id'])
			Drug_Info[i,2] = unlist(row['chemical_name'])
			Drug_Info[i,3] = ifelse(is.null(unlist(row['atc'])), NA, unlist(row['atc']))
			Drug_Info[i,4] = ifelse(is.null(unlist(row['matador_annotation'])),
						NA, unlist(row['matador_annotation']))
			Drug_Info[i,5] = ifelse(is.null(unlist(row['drugbank_annotation'])),
						NA, unlist(row['drugbank_annotation']))
		  }	
	   } 
  }
  Drug_Info
}
getTargetingDrugs("TP53")

Some notes on the code:

The first line loads an R package that converts between JSON objects and R objects.The first two lines of the getTargetingDrugs function use the symbol argument to construct an appropriate URL. The next line actually makes the HTTP request to the geneSmash server and converts the JSON response into an R object. This code snippet converts an alternate gene identifier to Entrez gene identifier. The Entrez gene identifier obtained from geneSmash is used to make a drug-target data retreival using similar HTTP request to drugBase. The final lines extract the relevant part of the response
This version of the code does not perform error checking on the result, so it can probably not be used in production code. Failure can occur because the server is not available, or because the symbol passed as an argument is not a valid HGNC symbol, or because no known drug-target interation is available for the gene; all three conditions should be checked.