High-Throughput Sequencing Technologies are Now Applied in Life Science Researches
Life science research now is getting more and more popular in biological fields. As the rapid development of high-throughput sequencing technologies, more and more researchers and scientists solve biological problems with high-throughput sequencing technologies.
Technologies applied in all kinds of fields:
De novo sequencing can help in gaining the reference sequence of the species, which can be the foundation for further research and molecular breeding; using whole genome resequencing technology for the species with reference to detect and scan mutation sites on the whole genome to discover the molecular basis of individual differences; performing whole transcriptome resequencing at the transcriptional group level for the research of differentially expressed genes, alternative splicing, the coding sequence of a single nucleotide polymorphism and so on; and separating RNA molecules of certain sizes by using small RNA sequencing to discover new microRNA molecules. At the transcriptional group level, ChIP working with MeDIP detects DNA region binding to specific transcription factor and methylated sites on the genome. Recently, high-throughput technologies are widely applied in studying candidate genes related to some diseases.
Compared with sanger sequencing technology, high-throughput sequencing is able to collect more output data and more statistical information. Now, the most popular high-throughput sequencing method is Solexa, which is a new type of sequencing method based on sequencing-by-synthesis. It is depending on the single molecule array to perform bridge type PCR reaction on FlowCell. The new reversible blocking technology can achieve a synthesis of base each time, mark the fluorophore, and then use the appropriate laser to excite fluorophores, capture excitation light, thereby the information base can be read.
Taking transcriptome analysis process with reference as example to illustrate the basic workflow of high-thoughput data analysis.
High-throughput sequencing data in FASTQ format to record measured reads of base and quality scores. After the output of data, evaluating if the amount of data satisfies information analysis requirements by detecting the length of the sample reads, the number of bases, the number of GC content and other indicators. And then filtering low-quality data, covering artifactual sequences with a variety of sequence alignment software. As for the transcriptome data with reference, firstly mapping all sequencing of reads on reference genome. Comparing with reference genome group to select all well-matched reads and targeting reading section gene for further analysis.
Further analysis items include gene structure analysis, gene expression analysis and analysis of new gene.