CD Genomics Perspective: Introduction to Virus Identification Using RNA Sequencing

Author: Dianna Gellar

RNA sequencing (RNA-Seq) is a high-throughput RNA profiling technique that uses next-generation sequencing (NGS) platforms, and offers accurate abundance estimations over a wide dynamic range. Both high and low expression levels in individual viral RNA sequencing experiments can be determined using a wide dynamic range, which greatly improves analysis efficiency. As a result, RNA-seq has played a critical role in basic research. Viral RNA-seq is a time-saving and cost-effective method to sequence and collects the virus's transcriptional information. Particularly, long-read sequencing (LRS) has become a popular research method for deciphering the complex transcriptome structure.

Viral RNA sequencing has contributed to acknowledging pathogen-host immune interactions, discovering antibiotic resistance, quantifying gene expression changes, and monitoring disease progression. Its significance in research is widely recognized. Viral RNA-seq using next-generation sequencing and/or long-read sequencing has become a standard method for viral transcriptome or genome data analysis.

Viruses Identified Using RNA Sequencing

Researchers employed the RNA-Seq strategies to test a wide range of hypotheses due to the diversity and utility of the transcriptomic output. RNA-Seq, for example, is being used by groups to better comprehend the relationship between viruses and cancer. Incorporating viruses into the host genome's cDNA as free viral RNA or genomic DNA plays a key role in various diseases, including cancer. Thus, viruses are thought to be responsible for 15-20 percent of cancers rate. High-throughput sequencing technology can easily detect viral genomes. Even though most oncogenic viruses are DNA viruses, RNA-Seq can detect their transcribed RNAs. Two independent studies evaluated thousands of RNA-Seq samples across multiple human cancers using publicly available large-scale RNA-Seq data sets. Khoury et al. looked for viral RNAs in 3775 TCGA specimens using RNA-Seq data.

Human papillomavirus (HPV), hepatitis B virus (HBV), and Epstein–Barr virus were all found in head-and-neck cancers, uterine endometrioid cancers, and lung cancers, respectively. These studies demonstrated that RNA-Seq approaches could be used to detect viral integration. In addition, several viral integration sites and oncogenes were discovered in HPV and HBV-infected tumors. Nonetheless, more recent research using the TCGA database, which included 4433 tumors and 404 normal samples from 19 cancer types, found that viral etiology was not the cause of most cancers. This research also showed that 1897 host genes were modified at least 2-fold in HPV-positive HNSC tumors compared to HPV-negative HNSC tumors, indicating that viral transcripts and host mRNA expression are co-adapted. Nevertheless, these findings strongly suggest that HPV has a wide-ranging effect on the host transcriptome, which can be investigated using smRNA and RNA-Seq techniques.

Tools Used in Virus Identification Using RNA Sequencing

Due to the functional importance of virus detection, many tools have been developed for sequencing data, including PathSeq, ViralFusionSeq, VirusSeq, and VirusFinder. Increased mismatch allowance is needed to associate RNA-Seq reads to the viral genome due to the rapid mutation rate of DNA viruses (106–108 mutations per base per generation) and RNA viruses (103–105 mutations per base per generation). The fundamental algorithms also pertain to RNA-Seq data, even though some tools are intended for exome or whole-genome sequencing data.