- Downloading data
- Discovering circRNAs
- Coordinate conversion
- Submit your data
Currently there are three main ways of obtaining data from circBase: simple search, list search and conditional retrieval via table browser.
Search textbox on main page can be used to query the database by circBase identifier (e.g. mmu_circ_0000010), refseq transcript ID (NM_027671), gene symbol (Pvt1), genomic coordinates (chrII:123456-7891011) or Gene Ontology term identifiers. Identifiers and names (such as mm9_circ_001517 or CDR1as) used in already published data have been added to the database as aliases and may be used in searches as well. Search is case-insensitive.
The database can also be queried by DNA or RNA sequence, either through the simple search interface (exact matches of sequences longer than 6 nt, or their reverse complements, will be returned), or by using Blat. For building Blat references, circRNAs were cut opposite to the head-to-tail junction. Therefore, it is possible to search with sequences that span circular junctions.
- Multi-search: It is possible to enter more identifiers at once, given they are all of the same type and separated by spaces or tabs (e.g. myFavGene1 myFavGene2 myFavGene3 is OK, but myFavGene1 myFavCircRNA1 myFavRefseqTranscript1 is not).
- Genomic position format: Anything recognizable by UCSC genome browser should also be recognized by circBase. In addition to the usual chr:start-end format, dash and colon can be replaced by space or tab (meaning that genomic position can be copy-pasted directly from spreadsheets).
- Search by genomic coordinates can be limited to a particular organism by using a three-letter organism identifier (hsa, mmu or cel) as the last search term (e.g. chr1:10000-10000000 hsa).
Upon selecting the organism, user can paste or upload a list of circBase or refseq identifiers, gene symbols or genomic coordinates. Single-column input is expected for identifiers, while for genomic coordinates an arbitrary number of columns is possible, but only first three columns will be interpreted as genomic position (chromosome | start | end). Therefore, 12-column BED file can be uploaded to get a list of circRNAs overlapping submitted genomic regions.
Table browser interface should enable quick and simple conditional data retrieval. Organism and dataset are mandatory selections, that can be further refined by selecting one or more cell lines, excluding circRNAs overlapping repeats, defining genomic position and limiting the genomic or spliced sequence length.
- Multiple selection for Samples and Annotation menus is supported (hold Ctrl while clicking). Query terms will be combined using AND operator.
Results are presented in an interactive table. Columns can be sorted by clicking the column names, and some cells can be used to link out to external resources. Clicking the genomic position will link to the UCSC genome browser (internal version), additional information on a particular circRNA is linked from the circRNA name column, while NCBI resources (or WormBase for C.elegans) can be accessed by clicking best transcript or gene symbol link. Clicking cells in dataset column will open PubMed record of a respective publication.
3. Downloading data
Tables can be exported in .xlsx, tab-separated .txt or .bed format through the Download table: menu. Genomic sequence of retrieved circRNAs can be downloaded in FASTA format.
- Windows users may encounter difficulties due to different definition of line breaks in different operating systems. Notepad in Windows 7 is still not able to interpret Unix-style LF line breaks, which will lead to wrong display of text files exported from circBase. Luckily, this issue can be avoided by use of (free) third-party text editors (such as Notepad++, and probably many other).
- Genomic sequences are exported as one FASTA file per organism, which raises the need for tarballing, and may be quite large, which is why they are compressed using gzip. While tar.gz format is natively supported in Unixoids, some Windows users may find it hard to uncompress the exported data. Freeware solutions for handling tar.gz archives are 7-Zip, Powerarchiver, and probably many more.
4. Discovering circRNAs
All the code needed to discover circRNAs in your own (Ribominus) RNA-seq data is available from the downloads section. Please refer to the included README file for further instructions.
5. Coordinate conversion
Genome assemblies currently used in circBase are hg19 for H. sapiens, mm9 for M. musculus and ce6 for C. elegans circRNAs. To convert the data between assemblies, user are encouraged to use liftOver. liftOver can be used directly from the UCSC genome browser at doRiNA ("Convert" option from the top menu), through the UCSC's web interface, as a standalone command-line tool, or as a part of the Galaxy platform.
6. Submit your data
If you would like to submit your data to circBase, or to suggest implementation of a published circRNA dataset, please contact us.
circBase is developed by the Rajewsky lab at the Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin, Germany.
First version: April 18th 2013. Last Update: Dec 15th 2015.