Circular RNAs (circRNAs) are highly stable forms of non-coding RNAs with diverse biological functions. They are implicated in modulation of gene expression thus affecting various cellular and disease processes. Based on existing bioinformatics approaches, we developed a comprehensive workflow called Circ-Seq to identify and report expressed circRNAs. Circ-Seq also provides informative genomic annotation along circRNA fused junctions thus allowing prioritization of circRNA candidates. We applied Circ-Seq first to RNA-sequence data from breast cancer cell lines and validated one of the large circRNAs identified. Circ-Seq was then applied to a larger cohort of breast cancer samples (n = 885) provided by The Cancer Genome Atlas (TCGA), including tumors and normal-adjacent tissue samples. Notably, circRNA results reveal that normal-adjacent tissues in estrogen receptor positive (ER+) subtype have relatively higher numbers of circRNAs than tumor samples in TCGA. Similar phenomenon of high circRNA numbers were observed in normal breast-mammary tissues from the Genotype-Tissue Expression (GTEx) project. Finally, we observed that number of circRNAs in normal-adjacent samples of ER+ subtype is inversely correlated to the risk-of-relapse proliferation (ROR-P) score for proliferating genes, suggesting that circRNA frequency may be a marker for cell proliferation in breast cancer. The Circ-Seq workflow will function for both single and multi-threaded compute environments. We believe that Circ-Seq will be a valuable tool to identify circRNAs useful in the diagnosis and treatment of other cancers and complex diseases.
Keywords: breast cancer; circ-seq; circular RNA; molecular subtypes; proliferation.