targetHub Documentation
targetHub is a database of miRNA-mRNA interactions. The interaction data is obtained various external data sources and in some cases computed in-house by algorithms implemented for miRNA target prediction
miRNA and 3'UTR data
miRBase, a standard repository for miRNA data (Version 18) is used as the reference for Human miRNA data in targetHub. 3'UTR sequence data of the Human genome is extracted from UCSC Genome Brower site. Human mRNA transcripts in the 3'UTR data are annotated with Entrez Gene ID to use targetHub data in conjunction with geneSmash. The transcripts that doesn't map to standard chromosomes are filtered out from the sequence data. The filtered Human 3'UTR sequences and the mature miRNA sequence data from miRBase are used to compute the target predictions by algorithms such as miRanda
Targets for mature miRNA originating from the same stem-loop
The mature miRNA originating from the same hairpin loop are on different strands. As their sequences are complimentary, the targets for these miRNA sequences are completely different. The algorithms used to find the miRNA targets mostly rely on the seed sequences of mature miRNA. The identification used for mature miRNA by these algorithms is not standard, as the nomeculature is still evolving.If two mature miRNA sequences are excised from the same stem-loop miRNA, they can be named in different ways.
- miRNA/miRNA* notation is used to represent the miRNA of the same hairpin loop based on the expression levels. miRNA* represent the one with low expression level (example: hsa-miR-103a and hsa-miR-103a*).
- The current method is to represent the arm: 5p or 3p (example: hsa-miR-103a-1-5p). In Version 18 of miRBase, mature miRNA sequences from all Human precursors are now designated in -5p and -3p convention, rather than miR/miR*.
Stem-loop miRNA nomenclature
Guidelines for the miRNA nomeculature are provided by miRBase (Griffiths-Jones et al, 2006, Meyes et al, 2008 and Griffiths-Jones et al, 2008). Some of them are elaborated in this section as they might be of utility in the context of querying the targetHub- Paralogous miRNA whose mature sequences differ atmost 2 bases are given suffixes by letters (example: hsa-miR-103a, hsa-miR-103b)
- Distinct stem-loop miRNA that give rise to identical (in every base) mature miRNA have numbered suffixes (example: hsa-miR-103a-1, hsa-miR-103a-2)
- If the miRNA is present in similar locus of genome in different strands of the genome, The distinction was made with 'S' for sense and 'AS' for antisense (example: hsa-miR-103 and hsa-miR-103-AS) previously in some cases
miRNA Target Prediction
This section describes the miRNA-target interaction data sources/methods used to setup the data in targetHub.miRTarBase
miRTarBase is a
curated database of experimentally validated microRNA-target
interactions. Human version of miRTarBase [Release 2.5, October, 2011]
is downloaded from their website. Each record of the database is
associated with the type of experimental evidence and relevant PubMed
article supporting the interaction. As the nomenclature for miRNA is
not standard, miRNA names in miRTarBase are represented with various
conventions (described above). miRNA that match with current version
of miRBase are retained as it is in targetHub; while the candidates
that has no matching identifier are manually curated by mapping
through previous identifiers of miRBase.
TargetScan
The miRNA-mRNA
interaction data predicted by TargetScan
algorithm(Lewis
et al,
2005,Grimson
et al, 2007
and Friedman
et al, 2009) is obtained from
their website [Version 6.1,
March 2012]. TargetScan provides two metrics: Probablity of conserved
targeting (Pct) and Total contextual score (TCS) to assess the
importance mir-target interaction. Pct corresponds to a Bayesian
estimate of the probablity that a miR site on the 3' UTR of an mRNA is
conserved due to miR targeting. While TCS represents the strength of
the sequential features that facilitate miR-target
hybridization/cleaveage. TargetScan predicts miR targets for miR
families instead of individual miRs. targetHub specify the miR-family
for which the target is identified along with number of conserved and
non-conserved probable sites of miRNA-mRNA interaction in the mRNA
transcript
PicTar
PicTar also looks for
identical seed sequence to predict miRNA-mRNA interaction, similar to
TargetScan. The targets predictions by PicTar were computed in 2005,
where as TargetScan has an update in 2012. As the data is relatively
old, there is lot of descripencies between current naming convention
in miRBase and PicTar. miRNA name is manually curated in case of
discrepency and the miRNA identifiers of PicTar are retained. In many
cases the predictions of PicTar and TargetScan overlap well, compared
to other algorithms like miRanda. PicTar derives an overall score to
assess the strength of the miR-target interaction. This is the maximum
likelihood that a given 3'UTR sequence is targeted by a fixed set of
microRNAs. The PicTar algorithm scores any 3' UTR that has at least
one aligned conserved predicted binding site for a microRNA, but then
incorporates all possible binding sites into the score, even if they
appear to be non-conserved.
Two levels of conservation can be chosen for PicTar algorithm:
- Conservation among four vertebrates: human, mouse, rat, and dog [termed as picTar4 in targetHub]
- Conservation among five vertebrates: human, mouse, rat, dog, and chicken [termed as picTar5 in targetHub]
miRanda
miRanda is first bioinformatic method to predict the target genes of microRNA. The algorithms adds empirical rules to inscrease the weight of certain signficant positions in the miRNA. Recently (Betel et al, 2010), a machine learning method (mirSVR) is integrated to miRanda that would predict the extent of downregulation of a specific mRNA by a given miRNA (mirSVR score). This method is supposedly capable of finding non-canonical and non-conserved miR target sites. The code of miRanda is obtained from their website and is used to compute the miRNA targets for the current version (18) of miRBase using strict mode and default cutoff score (140).Implementation
Database
targetHub is built using CouchDB. CouchDB is an open source, non-relational, document oriented database system. CouchDB’s built in web administration console communicates with the database using HTTP requests. The RESTful JSON API provided by CouchDB allows to access the database from any environment that allows HTTP requests (see examples).The database "design document" (which serves as the equivalent of a database schema in a relational database) is available here in JSON format.Interface
The gene-based search interface for the website is built using geneSmash, a gene-centric couchDB database. The geneSmash web service converts any of the input gene identifier types to Entrez Gene. The Entrez Gene identifiers are passed to targetHub to retreive the relevant miRNA-target interactions. The queries with miRNA identifiers would be passed directly to targetHub
Replication
CouchDB provides native support
for database
replication. You can use this facility to make (and maintain) a
local copy of the targetHub database. If you want an identical
interface, you should also replicate geneSmash along with
targetHub. If you replicate the database, we request that you maintain
links to targetHub, geneSmash and the University of Texas
M.D. Anderson Cancer Center.
Web Site
Search
Search can be performed in targetHub for miRNA-target relationships using either miRBase miRNA (stem-loop or mature) identifier or various types of gene identifiers supported by geneSmash. Any one of the following list of identifiers can be used to search targetHub
- miRBase Identifier [stem-loop miRNA]
- mature miR Identifier
- HUGO Gene Symbol
- Gene Symbol Alias
- Ensembl Identifier
- Entrez Gene Identifier
- NCBI Unigene Identifier
- Human Gene Expression Array Identifiers of
- Affymetrix
- Illumina
- Agilent
Download
The search results can be downloaded as a tab-delimited (TSV) file. Each record in the results of a TSV file represent a miRNA-target interaction by a specific method (Eg: TargetScan). The first six columns in the file represent a generic miRNA-target interaction in targetHub. The last four columns are specific to each method used to derive the miRNA-target interaction. The following table describe the last four columns for each method in targetHub.Method | Param1 | Param2 | Param3 | Param4 |
---|---|---|---|---|
TargetScan | Context Score | Aggregate Pct | Total Conserved Sites | Representative miRNA |
mirTarBase | Experiment Type | Evidence Level | InteractionID | PubmedID |
miRanda | Score | Energy | Transcript Location | Position |
picTar (4 & 5) | Score | Position | miRNA | - |
Illustration
The predicted targets by various methods described above are illustrated by venn diagram after search with any criteria in the web site. The predicted targets are summarized for a given stem-loop miRNA or a gene. A miRNA-gene interaction defined by two different mature miRNA (3p and 5p) of the same stem-loop miRNA are considered as different interactions for this count. Search with mature miRNA would not generate any illustrations, as multiple stem-loop miRNA are associated with each mature miRNA.Using targetHub in other programs
- The following code examples show, how to use targetHub to extract miRNA targeting a gene by specific methods in the PERL programming environment
- The top two lines loads PERL packages LWP::Simple and JSON which
are used to handle HTTP request and convert JSON objects to Perl data
structure respectively.The The first two lines of
the
getTargetingMirna
function declare and initiate variables. The next three lines use the Entrez Gene identifier and predicting method arguments to construct an appropriate URL to query targetHub. The next line actually makes the HTTP request to the targetHub server. The following three lines converts the JSON response into an PERL Array of Hash data structure. The final lines extract the relevant part of the response - This version of the code does not perform error checking on the result, so it can probably not be used in production code. Failure can occur because the server is not available, or because the gene identifier passed as an argument is not a valid Entrez Gene identifier, or because no known mirna-target interation is available for the gene; all three conditions should be checked.
- The following code example show, how to use targetHub to extract miRNA targeting a gene by evidence count in the R statistical programming environment
- The first line loads an R package that converts between JSON
objects and R objects.The first two lines of the
funcyton
getTargetingMirna
use the symbol argument to construct an appropriate URL. The next line actually makes the HTTP request to the geneSmash server and converts the JSON response into an R object. This code snippet converts an alternate gene identifier to Entrez gene identifier. The Entrez gene identifier obtained from geneSmash is used to make a mirna-target data retreival using similar HTTP request to targetHub. The final lines extract the relevant part of the response - This version of the code does not perform error checking on the result, so it can probably not be used in production code. Failure can occur because the server is not available, or because the symbol passed as an argument is not a valid HUGO symbol, or because no known miRNA-target interaction is available for the gene; all three conditions should be checked.
use LWP::Simple;
use JSON;
sub getTargetingMirna {
my ($EntrezGeneID, $method, $couchdbView, $dataLink, $tData, %tData, @targetData, @targetList);
$EntrezGeneID = $_[0]; $method = $_[1];
$couchdbView = "HOSTNAME/tarhub/_design/basic/_view/by_geneIDmethod";
$method =~ s/\+/%2B/g;
$dataLink = $couchdbView. '?key=["'.$EntrezGeneID.'","'.$method.'"]';
$tData = get $dataLink;
$tData = decode_json $tData;
%tData = %$tData;
@targetData = @{$tData{"rows"}};
for($i = 0; $i < scalar(@targetData);$i++) {
$targetList[@targetList] = $targetData[$i]->{"id"}."\t".$targetData[$i]->{"value"}."\n";
}
return @targetList;
}
my @targetList = getTargetingMirna("7157", "miranda+targetscan");
print "@targetList\n";
Some notes on the code:
library(RJSONIO)
getTargetingMirna <- function(sym, evidence_count) {
giUrl <- "HOSTNAME/genesmash/_design/basic/_view/by_symbol"
glink <- paste(giUrl, "?key=\"", sym, "\"", sep='')
gene_data <- fromJSON(paste(readLines(glink), collapse=''))
Target_Info <- NA
if(length(gene_data[["rows"]]) != 0) {
GeneID <- as.character(gene_data$rows[[1]]["id"])
#Extracting mirna-target interactions for the Gene from targetHub
diUrl <- 'HOSTNAME/tarhub/_design/basic/_view/by_geneIDcount'
link <- paste(diUrl, '?key=["', GeneID, '",', evidence_count, ']', sep='')
JSON_data <- paste(readLines(link), collapse='')
target_data <- fromJSON(JSON_data)
if(is.list(target_data) & (length(target_data$rows) > 0)) {
target_data <- target_data$rows
target_Info <- matrix(nrow = length(target_data), ncol = 2)
colnames(target_Info) <- c("miRNA-gene_interaction", "corresponding_mature_miR")
for(i in 1:length(target_data)) {
target_Info[i,1] = unlist(target_data[[i]]$id)
target_Info[i,2] = unlist(target_data[[i]]$value)
}
}
}
target_Info
}
getTargetingMirna("TP53",2)
Some notes on the code: