Large-scale biomedical data management and analysis with applications in biological sequencing and drug discovery

Ola Spjuth
Department of Pharmaceutical Biosciences
Uppsala University


High-throughput technologies, such as next-generation sequencing and automated drug screening, have transformed molecular biology into a data-intensive discipline. Bioinformaticians are nowadays required to use high-performance computing resources and carry out data management and analysis tasks on large scale. In this presentation I will introduce some common bioinformatics analyses in biological sequencing and drug discovery, and point out challenges due to increasing data sizes from an e-infrastructure as well as a Data Science perspective. I will present some work we have carried out on automating analyses pipelines using scientific workflow tools, which have become increasingly valuable as data-intensive bioinformatics is file-based and batch-systems like compute clusters are available via the national e-infrastructures. I will also present some of my ongoing projects on emerging technologies and methodologies such as cloud computing and Big Data Analytics, to address some of the challenges of growing data sets and analyzing sensitive data.