Modern Tanscriptome Data Rrocessing Algorithms: a Review of Methods and Results of Approbation
Abstract
Analysis of bioinformatics data is an actual problem in modern computational biology and applied mathematics. With the development of biotechnology, as well as tools for obtaining and processing information derived from biological objects and systems, unresolved issues of the development and application of new algorithms and software have emerged. The authors propose practical algorithms and methods for processing transcriptome data for effective results of annotation, visualization and interpretation of data.
About the Authors
M. V. SprindzukBelarus
Candidate of Science (Technical), Senior Researcher, Laboratory of Mathematical Cybernetics
6 Surganovа Str., 220012 Minsk
L. V. Mozharovskaya
Belarus
Researcher, Laboratory of Genomics Research and Bioinformatics
71 Proletarskaya Str., 246001 Gomel
A. P. Konchits
Belarus
Candidate of Science (Biological), Leading Researcher, Forest Tree Breeding and Seed Production Laboratory
71 Proletarskaya Str., 246001 Gomel
L. P. Titov
Russian Federation
Doctor of Sciences (Medical), Professor, Corresponding Member of the NAS of Belarus, Head of the Laboratory for Clinical and
Experimental Microbiology
23 Filimonova Str., 220114 Minsk
References
1. Conesa A., Madrigal P., Tarazona S., Gomez-Cabrero D., Cervera A. et al. A survey of best practices for RNA-seq data analysis. Genome biology, 2016, V. 17, № 1. 13 p.
2. Elden V., Zararsiz G., Taşçi T., Duru I.P., Bakir Y. et al. Transcriptome Analysis for Non-Model Organism: Current Status and Best-Practices. Applications of RNA-Seq and Omics Strategies-From Microorganisms to Human Health, 2017, V. 1, № 2. pp. 1-19.
3. Liu X., Li N., Liu S., Wang J., Zhang N. et al. Normalization Methods for the Analysis of Unbalanced Transcriptome Data: A Review. Front BioengBiotechnol, 2019, V. 7. 358 p.
4. Mutz, K.-O., Heilkenbrinker, A., Lönne, M., Walter, J.-G., Stahl, F. Transcriptome analysis using next-generation sequencing. Current opinion in biotechnology, 2013, V. 24, № 1. pp. 22-30.
5. Mozharovskaya L.V., Panteleev S.V., Baranov O.Yu., Padutov V.E. Identification and Functional Annotation of Pathogen-Induced Genes of the Scots Pine Seedlings. Molecular and Applied Genetics, Minsk, 2019, V. 26. pp.69-78. (in Russian).
6. Mozharovskaya, L.V. Comparative Analysis of the Transcription Profiles from Pine Seedlings (Pinus Sylvestris L.) Grown Under Various Temperature Conditions. Problemy lesovedeniya i lesovodstva, Gomel, V. 78. pp. 70-78. (in Russian).
7. Mozharovskaya L.V., Panteleev S. V., Razumova O.A., Baranov O. Yu. Identification of mRNA Editing Sites in the Chloroplast Genome Of Pine (Pinus Sylvestris L.). Problemy lesovedeniya i lesovodstva, Gomel, 2019, V. 79. pp. 54-61. (in Russian).
8. Kiryanov P.S., Baranov O. Yu., Padutov V.E., Identification of Genetic Features Among the Forms of Silver Birch, Differing by the Characteristic of Wood Patterning // Forestry: materials of the 84th scientific-technical. conferences of faculty, researchers and graduate students (with in-ternational participation), Minsk: BSTU, 2020. pp. 106-107. (in Russian).
9. Padutov V.E., Tretyakova I.N., Mozharovskaya L.V. Konstantinov A.V., Kulagin D.V., Kus-enkova M.P. Comparative Analysis of Transcriptional Profiles of Callus Cultures of Siberian Larch with Different Embryogenic Potential // Forestry: materials of the 84th scientific-technical. conferences of faculty, research staff and graduate students (with international participation), Minsk: BSTU, 2020. p. 131.
10. Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews genetics, 2009, V. 10., №. 1. pp. 57-63.
11. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc., 2013., V. 8, № 8. pp. 1494-512.
12. Wang, Y., Sun, M.-a. Transcriptome Data Analysis: Methods and Protocols. Springer, 2018.
13. Available at: http://bioinformaticsinstitute.ru/sites/default/files/07-28-04-kasyanov.pdf. (accessed: 04.09.2020) (in Russian).
14. Kasyanov A. S. New methods of data processing obtained using modern sequencing technologies for solving problems of gene expression analysis: author. diss. Cand. physical-mat. sciences, 2012. (in Russian).
15. Vodyasova E.A., Chelebieva E.S., Kuleshova O.N. The latest technologies for high-performance sequencing of the transcriptome of individual cells. Vavilovskiy Zhurnal Genetics and Breeding, 2019, V. 23, №5. - pp. 508-518.
16. Ewing B., Green P. Base-calling of automated sequencer traces using phred. II. Error proba-bilities. Genome research, 1998, V. 8, №. 3. pp. 186-194.
17. Brown J., Pirrung M., McCue L.A. FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool. Bioinformatics, 2017, V. 1, № 1.─ pp. 1-9.
18. Dai M., Thompson R.C., Maher C., Contreras-Galindo R., Kaplan M.H. et al. NGSQC: cross-platform quality analysis pipeline for deep sequencing data. BMC Genomics, 2010, V. 11. p. S7.
19. Romanenkov K.V. Method for assessing the quality of genome assembly based on frequen-cies of k-mers. Preprints M.V. Keldysh. 2017. No. 11. 24 p. doi: 10.20948 / prepr-2017-11
20. Giannoulatou E., Park S.H., Humphreys D.T., Ho J.W. Verification and validation of bioin-formatics software without a gold standard: a case study of BWA and Bowtie. BMC Bioinfor-matics, 2014,V. 15 Suppl 16. pp. S15.
21. Langdon W.B. Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks. BioData Min., 2015, V. 8, № 1. pp. 1.
22. Lu R., Zhang J., Liu D., Wei Y.L., Wang Y. et al. Characterization of bHLH/HLH genes that are involved in brassinosteroid (BR) signaling in fiber development of cotton (Gossypiumhirsutum). BMC Plant Biol., 2018, V. 18, № 1. pp. 304.
23. Kim D., Pertea G., Trapnell C., Pimentel H., Kelley R. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol., 2013. V. 14, № 4. p. R36.
24. Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol., 2012., V. 19, № 5. pp. 455-477.
25. Bankar K.G., Todur V.N., Shukla R.N., Vasudevan M. Ameliorated de novo transcriptome assembly using Illumina paired end sequence data with Trinity Assembler. Genom Data. 2015, V. 5. pp. 352-9.
26. Cabau C., Escudie F., Djari A., Guiguen Y., Bobe J. et al. Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies. PeerJ., 2017, V. 5. p. e2988.
27. Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc., 2013., V. 8, № 8. pp. 1494-1512.
28. Kim C.S., Winn M.D., Sachdeva V., Jordan, K.E. K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity. BMC Bioinformatics, 2017, V. 18, № 1. pp. 467.
29. Cabau C., Escudie F., Djari A., Guiguen Y., Bobe J. et al. Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies. PeerJ. 2017. V. 5. pp. e2988.
30. Schulz M.H., Zerbino D.R., Vingron M., Birney E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 2012, V. 28, № 8. pp. 1086-1092.
31. Birol I., Jackman S.D., Nielsen C.B., Qian J.Q., Varhol R. et al. De novo transcriptome assembly with ABySS. Bioinformatics, 2009, V. 25, № 21. pp. 2872-2877.
32. Jackman S.D., Vandervalk B.P., Mohamadi H., Chu J., Yeo S. et al. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter.Genome Res., 2017, V. 27, № 5. pp. 768-777.
33. Simpson J.T., Wong K., Jackman S.D., Schein J.E., Jones S.J. et al. ABySS: a parallel assembler for short read sequence data. Genome Res., 2009, V. 19, № 6. pp. 1117-1123.
34. Boerner S., McGinnis K.M. Computational Analysis of LncRNA from cDNA Sequences. Methods In Molecular Biology (Clifton, N.J.)., 2016, V. 1402. pp. 255-269.
35. Ge, S., Jung, D. ShinyGO: a graphical enrichment tool for animals and plants. 2018.
36. Zhang C. et al. Evaluation and comparison of computational tools for RNA-seq isoform quan-tification. BMC genomics, 2017, V. 18, №. 1. pp. 583.
37. Chen, T.W., Gan, R.C., Wu, T.H., Huang, P.J., Lee, C.Y. et al. FastAnnotator--an efficient transcript annotation web tool. BMC Genomics, 2012, V. 13, Suppl 7. pp. S9.
38. Huerta-Cepas J., Szklarczyk D., Forslund K., Cook H., Heller D. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Research, 2016, V. 44, № D1. pp. D286-D293.
39. Van Bel M., Proost S., Van Neste C., Deforce D., Van de Peer Y. et al. TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes, Genome Biol., 2013, V. 14, № 12. pp. R134.
40. Jones P., Binns D., Chang H.Y., Fraser M., Li W. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics, 2014, V. 30, № 9.pp. 1236-40.
41. Kelly R.J., Vincent D.E., Friedberg I. IPRStats: visualization of the functional potential of an InterProScan run. BMC Bioinformatics, 2010, V. 11 Suppl 12. pp. S13.
42. Mulder N., Apweiler R. InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol Biol., 2007, V. 396. P. 59-70.
43. Quevillon E., Silventoinen V., Pillai S., Harte N., Mulder N. et al. InterProScan: protein domains identifier. Nucleic Acids Research, 2005, V. 33. № pp. W116-20.
44. Syed A., Upton C. Java GUI for InterProScan (JIPS): a tool to help process multiple InterProScans and perform ortholog analysis. BMC Bioinformatics, 2006, V. 7. p. 462.
45. Zdobnov E.M., Apweiler R. InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics, 2001,V. 17, № 9. pp. 847-8.
46. Panteleev S. V., Baranov O. Yu., Rubel I. E. Molecular-genetic diagnostics of infectious agents of Scots pine shoots with signs of "witch`s brooms". Problemy lesovedeniya i lesovodstva, Gomel, 2016, V. 76. pp. 242–249. (in Russian).
Review
For citations:
Sprindzuk M.V., Mozharovskaya L.V., Konchits A.P., Titov L.P. Modern Tanscriptome Data Rrocessing Algorithms: a Review of Methods and Results of Approbation. Digital Transformation. 2021;(1):53-64. (In Russ.)