Usage Guide

Main Functions

Select any of the sections to view the results of the corresponding data. There are two main modes of querying regardless of the data source you select: motif based query (or source query) and gene/target-id based query (target query).

One result of a source having a score in one of the target regions is called a hit.

The data source determines what kind of hits you will be querying and which species or region of genome those hits come from.

In this mode, you can select a source, be it a motif for MotifMap based results or a TF for ChIPSeq, and then find its targets across the genome.

You can select a variety of filtering parameters to narrow down your search. The details on the parameters are explained below in their corresponding section.

In most cases, you will be able to select a row from a table of motifs/TFs and conduct your search based on that motif. This table, along with all the other tables from this site, is fully interactive. You can sort, search by keyword and download results into xls format.

Once you determine the motif and the parameter. Press the button to show target results. In some cases, if you select any row from the output, you can utilize a link below the table to view extra annotations for that row. Doing so will open a new window.

In this mode, you can select a target, be it a gene or a miRNA or a UTR region, and then query the database for all sources (motifs etc) that have hits within the target region.

You can again use a variety of filtering parameters to narrow down your search.

The results are presented in a similar way as motif based queries and you can sometimes view additional annotations in the same way.

Together these two methods provides you a powerful toolset for discovering potential binding relationships between regulating factors and their target regions.

MotifMap-RNA

Motif Selection in Motif Search

In the Motif Search mode, there is an interactive table containing all the motifs from MotifMap-RNA. You can select a row, see additional annotations from CISBP or RBPDB by clicking the corresponding buttons, or start a search using the specified filtering parameters. This table contains the following columns: ID, this is the ID used by MotifMap-RNA, CISBP-ID this is the ID of the corresponding CISBP motif, if such exists, RBPDB-ID this is the ID of the RBPDB motif, if such exists. Note that ID matches either of these two and whichever it matches is the source of this motif. Name and Original Species contain basic annotations for the selected motif.

Additionally, when you click on a motif (row), you will see a motif LOGO if it exists. Some motifs may be very short and degenerate, and as a result the hits are not as reliable as those from a longer, more specific motif. In these cases you may want to increase the filtering thresholds, and especially take advantage of genomic distance filtering to localize potential hits. Aggregate z-scores and BBLS may also indicate more reliable hits.

If you have questions on the quality or origin of the motif, you can check the annotations from CISBP or RBPDB, which typically include the publication that the motif is derived from.

Filtering parameters:

  1. z-score is the main indicator of the strength of predicted binding. It is specific to the binding motif (in PWM form) and the target sequence of the exact site.

  2. Weighted z-score is a form of meta z-scores that has exponentially decaying weight based on the local ranking of z-scores of the hits. It compensates the main z-score by considering the potential local clustering of binding sites. It will always be higher than z-score. One effective method of filtering strategy is to filter based on a relatively low z-score but a high Weighted z-score.

  3. BBLS is Bayesian Branch Length Score, which is a measure of how conserved the binding site is across the phylogentic tree near the targetted species. It is proven to be an effective indicator of conservativity of hits. However, in the case of RBP hits, sometimes it is absent for strong hits and may not be the best primary filtering parameters. However, you can sort the results by it in the results table.

  4. Upstream and Downstream indicates the distances in bp from the transcription start site (TSS) of the target gene or the closest gene. In some cases this is highly effective as a filtering parameter while in others it will be meaningless. Note: in the case of intronic data, this parameter filters for the distance relative to the intron start site.

When counting distances, the strandity of the hit and the target gene are always considered. However, in presenting the genomic location in the form of genome coordinates, we never use negative connotations. All positions are based on the positive strand.

Results intepretation:

  1. SeqID is the assigned sequence ID for the hit. Usually this is generated by MotifMap-RNA but for LNC we keep the NONCODE ID.

  2. Chr, Start and Stop show the genome coordinate of the hit, always on the positive strand. Strand indicates the actual strandity of the hit. We provide both the strand of the hit (as Strand) and the strand of the closest genomic annotation (ClosestGeneStrand), in many cases if the two differ the hit is not very relevant. Since the strand of the reference is always the same, you can sort by hit strand to easily see all hits on that same strand.

  3. ZScore is the main factor indicating the strength of potential binding. Please note that in the case of short motifs there can exist many hits with max ZScores.

  4. WeightedZ is an aggregate ZScore exponentially weighted by ranks. It is similar to StoufferZ which is linearly weighted by rank. These two scores indicate local clustering of strong hits and are more scalable than ZScores. If you find many hits with the same max ZScores they will help you identify those with more strong hits in its vicinity.

  5. BBLS indicates conservativity of the hit, a score of 1 is significant while 4 is very strongly conserved. Edges indicates in how many phylogenetic neightbor does the hit likely exist.

  6. ClosestGene indicates the closest gene to the hit, regardless of the type of hit. The distances to CDS and TSS are presented as well.

  7. In the case of a gene search, the gene information is replaced by motif information. Additional links to CISBP or RBPDB is are also provided when you click on a row.

MotifMap-DNA

MotifMap documentation is currently available at MotifMap website

ChIP-Seq

We house ChIP-Seq data derived from ENCODE or GEO public data. More information pending.