Weekly Activities

Mathematical Methods for the Understanding of the Human Genome
Speaker: Vo Sy Nam (Vingroup BigData Institute, Vingroup)

Time: 14h, Friday, August 9, 2019

Location: Room 612, Building A6

Abstract: Mathematics and genetics have a long history partnership, and recent advancements in biotechnology open up lots of new opportunities and challenges for such involvement, in particular with human genetics. Since the completion of determining a DNA sequence of essentially the entire human genome by the Human Genome Project, a lot of efforts have been dedicated to decoding the human genome at large scale. For example, the 1000 Genomes Project (1KGP) has established a detailed catalogue of genetic variations of more than 2,500 people, or The Cancer Genome Atlas (TCGA) has characterized genetic mutations of more than 11,000 patients spanning 33 cancer types. Such efforts have dramatically changed the scale of genetic data and our understanding of the human genome. Sophisticated mathematical tools are, to a certain extent, needed more than ever to unravel information from such explosion of data.

Here I review several important mathematical aspects of human genome analysis and interpretation. Firstly, I recapitulate some of the mathematics involved in the Human Genome Project. I then look at how the mathematical tools such as graph theory or compression theory are exploited to assemble and represent the human genomes. De Bruijn graphs and Burrows - Wheeler transform (BWT, also called block-sorting compression) are two concepts to be discussed in greater detail. Next discussions with be on utilization of statistical methods, both frequentist and Bayesian inference, for detecting genetic variations in projects such as 1KGP or TCGA. I will also talk about some applications of optimization methods for analyzing genetic code and gene structure. Finally, I will point out some current open problems in genome analysis that need novel mathematical tools. Recent advances in technologies are generating more and more vast amounts of data with diverse characteristics that need more and more effective and efficient mathematical methods for data analysis and interpretation. All above are just a tiny part of an ongoing exciting partnership between mathematics and genetics in general.

Back