t-Distributed Stochastic Neighbor Embedding (t-SNE): A tool for eco-physiological transcriptomic analysis

Matthew C Cieslak; Ann M Castelfranco; Vittoria Roncalli; Petra H Lenz; Daniel K Hartline

doi:10.1016/j.margen.2019.100723

t-Distributed Stochastic Neighbor Embedding (t-SNE): A tool for eco-physiological transcriptomic analysis

Mar Genomics. 2020 Jun:51:100723. doi: 10.1016/j.margen.2019.100723. Epub 2019 Nov 26.

Authors

Matthew C Cieslak¹, Ann M Castelfranco¹, Vittoria Roncalli², Petra H Lenz³, Daniel K Hartline¹

Affiliations

¹ Pacific Biosciences Research Center, University of Hawai'i at Mānoa, 1993 East-West Rd., Honolulu, HI 96822, USA.
² Pacific Biosciences Research Center, University of Hawai'i at Mānoa, 1993 East-West Rd., Honolulu, HI 96822, USA; Department of Genetics, Microbiology and Statistics, Facultat de Biologia, IRBio, Universitat de Barcelona, Av. Diagonal 643, 08028 Barcelona, Spain.
³ Pacific Biosciences Research Center, University of Hawai'i at Mānoa, 1993 East-West Rd., Honolulu, HI 96822, USA. Electronic address: petra@hawaii.edu.

PMID: 31784353
DOI: 10.1016/j.margen.2019.100723

Abstract

High-throughput RNA sequencing (RNA-Seq) has transformed the ecophysiological assessment of individual plankton species and communities. However, the technology generates complex data consisting of millions of short-read sequences that can be difficult to analyze and interpret. New bioinformatics workflows are needed to guide experimentation, environmental sampling, and to develop and test hypotheses. One complexity-reducing tool that has been used successfully in other fields is "t-distributed Stochastic Neighbor Embedding" (t-SNE). Its application to transcriptomic data from marine pelagic and benthic systems has yet to be explored. The present study demonstrates an application for evaluating RNA-Seq data using previously published, conventionally analyzed studies on the copepods Calanus finmarchicus and Neocalanus flemingeri. In one application, gene expression profiles were compared among different developmental stages. In another, they were compared among experimental conditions. In a third, they were compared among environmental samples from different locations. The profile categories identified by t-SNE were validated by reference to published results using differential gene expression and Gene Ontology (GO) analyses. The analyses demonstrate how individual samples can be evaluated for differences in global gene expression, as well as differences in expression related to specific biological processes, such as lipid metabolism and responses to stress. As RNA-Seq data from plankton species and communities become more common, t-SNE analysis should provide a powerful tool for determining trends and classifying samples into groups with similar transcriptional physiology, independent of collection site or time.

Keywords: Bioinformatics; Copepod; Omics; RNA-Seq; Zooplankton.

MeSH terms

Animals
Copepoda / genetics*
Female
Gene Expression Profiling / methods*
Gene Ontology
High-Throughput Nucleotide Sequencing
Larva / genetics
RNA-Seq
Species Specificity
Stochastic Processes
Transcriptome