Next Generation Bioinformatics Tools: From Data Generation to Data Analysis (Ph.D. Course)

 
 

COURSE - The last few years have witnessed significant developments in the area of Bioinformatics. The explosive availability of biological data continues to require an associated increase in the scale and sophistication of the automated systems and tools to enable the researchers to take full advantage of the available databases. However, the majority of the current tools tend to be simplistic, generic, domain independent, static and purely computational. Although such tools, that we refer to as first generation tools, played an important role in the early stages of Bioinformatics, they fall far short from the tools needed to properly analysis and mine the massive data that expected to become available from the recently developed high throughput sequencing technology. In this course, we survey the current tools while highlighting their advantages and shortcomings. We then introduce a new model for the Next Generation Bioinformatics Tools, which is based on developing highly intelligent and dynamic software systems that incorporates biological knowledge and gain intelligence from the increasing availability of biological data. These tools attempt to gain higher degrees of sophistication through genuine integration of Information Technology algorithms and biological knowledge. We present examples of recently developed intelligent tools and expert systems that produced exciting results, in the areas of aging research, genome assembly and recognition of microorganisms, which could not have been obtained using simple first generation tools. 


COURSE MATERIAL:

  1. Day 1, Lecture Slides: (slides1.pdf).

  2. Day 1, Exercises (Utility.pdf) or (Utility_Mac.pdf).

  3. Day 2, Lecture Slides (slides2.pdf).

  4. Day 3, Lecture Slides (Slides3.pdf).

  5. Day 4, Lecture Slides (Slides4.pdf).

  6. Day 4, Exercises (Day4_Sequence Assembly.docx).

  7. Day 5, Lecture Slides (Slides5.pdf).



Course Overview and Objectives

The last few years have witnessed significant developments in various aspects of Biomedical Informatics, including Bioinformatics, Medical Informatics, Public Health Informatics, and Biomedical Imaging. The explosion of medical and biological data requires an associated increase in the scale and sophistication of the automated systems and intelligent tools to enable the researchers to take full advantage of the available databases. This ranges from the effective storage of data and their associated data models, to the design of efficient algorithms to automate the data mining procedures, and also to the development of advanced software systems to support data integration. With more researchers taking on Bioinformatics projects that integrate theoretical and applied concepts from both Bioscience as well as Computational Sciences, Biomedical informatics is quickly emerging as the most exciting field of research in this century.

In this course, we present an overview of the state of discipline for Biomedical Informatics with a focus on the nature and diverse of the available data as well as data collection tools. We make a case for the need for smarter and more advanced data integration and data analysis tools. Such tools are desperately needed to connect the datasets and obtain useful information that can be used for better medical discoveries and patient care. We present examples of recently developed intelligent tools and expert systems that produced exciting results that could not have been obtained without such innovative integration with a particular focus on systems biology. We then focus on a case study in aging research to illustrate the proposed integration and analysis tools.

The massive size of the current available biological and medical databases and its high rate of growth have a great influence on the types of research currently conducted and researchers are focusing more than ever to maximize the use of these databases. Hence, it would be of great advantage for researchers to utilize the information stored in the available databases to extract new information as well as to understand various biological and medical phenomena.   The course will address these issues with a particular focus on the following objectives:

1-Provide an overview of the various emerging disciplines of Biomedical Informatics, with a focus on Bioinformatics, and with an emphasis on the interdisciplinary nature of these fields of study.

2-Introduce the main computational problems in biomedical research with a focus on data collection and analysis related problems, then survey the current available algorithmic tools and address the advantages and the shortcoming of each tool.

3-Introduce the participants to the concept of intelligent data integrating and analysis tools. Such tools are critical to leverage heterogeneous data collected from different resources to produce useful information that can further advance biomedical research and has the potential lead to new discoveries directly related to patient care. Use Aging Research as a case study to illustrate the advantage of the new tools.

4-Identify opportunities and challenges in implementing and hosting the Bioinformatics tools in a high performance computing environment.


Target Audience and Prior Background Knowledge

The tutorial is intended for bio-scientists and computational scientists who are interested in Bioinformatics and how to develop or use computational tools to solve various types of problems in biomedical research. Although some basic background in biomedical sciences would be helpful, it is not necessary since the tutorial will provide a basic background of the needed concepts. Similarly, some basic background in algorithms would be helpful but it is not necessary.


Topics to be covered in the Course

The course is designed for six sessions that can be classified into two parts. The first part covers the introduction, the background and an overview of key problems, algorithms and current tools in the area of Biomedical Informatics. The second part focuses on introducing the audience to new research projects with a focus on the concept of next generation data analysis and integration tools; that are Intelligent, Collaborative and Dynamic (ICD). Several examples of such tools will be introduced with a focus on new systems biology concepts,

1.Brief Introduction to Bioinformatics

Background – The Bioscience aspect and the computational perspective

Review of key Bioinformatics Problems: sequence comparison, motif finding, and phylogeny.

Survey several popular Bioinformatics tools: their benefits and shortcomings.

2.Bioinformatics now – current state of the emerging discipline, opportunities and challenges, and overview of key Research problems.

3.The emerging of for generation data integration and analysis tools; Intelligent, Collaborative and Dynamic (ICD) Tools.

4.Examples of new ICD tools with a focus on data integration and analysis:

Assembly of short reads produced from high throughput sequencing facilities.

Alignment-free sequence comparison algorithms.

Recognition and classification of Biological Sequences.

Correlation Networks and the analysis of expression data.

Hands-On Exercise: Network Models to predict spread in scale-free networks.

5.Case Study: Biomedical Informatics on Aging Research:

Correlation Networks and the identification of genes associated with aging. 

Mobility profiling and predicting potential health problems.

Intelligent Integrated Medical Data System (I2MeDS)

6.High Performance Computing and Bioinformatics.

7.Next Phase in the Bioinformatics Evolution: from data storage/generation to data integration/analysis


Brief Bio Sketch of the Instructor

Hesham H. Ali is a Professor of Computer Science and the Lee and Wilma Seaman Distinguished Dean of the College of Information Science and Technology (IS&T), at the University of Nebraska at Omaha (UNO). He is also the director of UNO Bioinformatics Core Facility. He has published numerous articles in various IT areas including scheduling, distributed systems, wireless networks, and Bioinformatics. He has also published two books in scheduling and graph algorithms, and several book chapters in Bioinformatics. He is currently serving as the PI or Co-PI of several projects funded by NSF, NIH and Nebraska Research Initiative (NRI) in the areas of wireless networks and Bioinformatics. He has been leading a Bioinformatics Research Group at UNO that focuses on developing innovative computational approaches to identify and classify biological organisms. The research group is currently developing a new graph theoretic model for assembling short reads obtained from high throughput instruments, as well as employing a novel correlation networks model for studying aging and the impact of drugs/diseases on the aging and infectious diseases. He has also been leading two funded projects for developing secure wireless infrastructure and using wireless technologies to address tracking and monitoring issues in medical environments.





References

K. Dempsey and H. Ali, “On the Discovery of Cellular Subsystems in Correlation Networks using Centrality Measures,” to appear in Current Bioinformatics, 2012.

K. Dempsey, K. Duraisamy, S. Bhowmick, and H. Ali. The Development of Parallel Adaptive Sampling Algorithms for Analyzing Biological Networks. 11th IEEE International Workshop on High Performance Computational Biology (HiCOMB 2012). May 21, 2012: Shanghai, China.

K. Dempsey, S. Bhowmick, and H. Ali. Function-preserving filters for sampling in biological networks. 2012 Int Conference on Computational Science (ICCS 2012).  June 4-6, 2012: Omaha, NE.

K. Dempsey, I. Thapa, D. Bastola and H. Ali, “Identifying Modular Function via Edge Annotation in Gene Correlation Networks using Gene Ontology Search,” Proceedings of the Second Workshop on Integrative Data Analysis in Systems Biology (IDASB), held in the 2011 IEEE International Conference on Bioinformatics & Biomedicine (BIBM 2011), Atlanta, Georgia Nov 12-15, 2011.

K. Dempsey, H. Ali, “Evaluation of Essential Genes in Correlation Networks using Measures of Centrality. 4th Annual 2011 BIBM Workshop on Biomolecular Network Analysis, Atlanta, Georgia, November 12-15, 2011.

K. Dempsey, B. Currall, R. Hallworth and H. Ali, “A New Approach for Sequence Analysis: Illustrating an Expanded Bioinformatics View through Exploring Properties of the Prestin Protein,” a book chapter in, “Handbook of Research on Computational and Systems Biology: Interdisciplinary Applications,” IGI Global, July 2011.

K. Duraisamy, K. Dempsey, H. Ali and S. Bhowmick, “A Noise Reducing Sampling Approach for Uncovering Critical Properties in Large Scale Biological Networks,” Proceedings of the 2010 Workshop International Workshop on High Performance Computing Systems for Biomedical, Bioinformatics and Life Sciences (BILIS 2011), held in conjunction with The 2011 Int Conference on High Performance Computing & Simulation (HPCS 2011), Istanbul, Turkey, July 4- 8, 2011.

K.  Dempsey, K.  Duraisamy, H. Ali, S. Bhowmick. A Parallel Graph Sampling Algorithm for Analyzing Gene Correlation Networks. International Conf on Computational Science (ICCS 2011), June 1-3, 2011. Singapore.

H. Geng, X. Deng and H Ali, “Applications of Hidden Markov Models in Microarray Gene Expression Data,” A book chapter in: Hidden Markov Models, Theory and Applications, Edited by Przemyslaw Dymarski, InTech, April 2011.

H. Geng, J. Iqbal, W. Chan, H Ali. Virtual CGH: an integrative approach to predict genetic abnormalities from gene expression microarray data applied in lymphoma BMC Medical Genomics, 4:32, April 2011.

K. Dempsey, S Bonasera, D. Bastola and H. Ali, “A Novel Correlation Networks Approach for the Identification of Gene Targets, Proceedings of the 44th Hawaii International Conference on System Sciences (HICSS-44), Kauai, January 4-7, 2011.

K. Dempsey, B. Currall, R. Hallworth and H. Ali, “An intelligent data-centric approach toward identification of conserved motifs in protein sequences,” Proceedings of the 2010 ACM International Conference on Bioinformatics and Computational Biology (BCB 2010), Niagara Falls, New York, August 2-4, 2010.

R. Sengupta, D. Bastola and H. Ali, “Classification and Identification of Fungal Sequences Using Characteristic Restriction Endonuclease Cut Order,” Journal of Bioinformatics and Computational Biology, Volume 8, Number 6, 2010.

D. Quest and H. Ali, “The Motif Tool Assessment Platform (MTAP) for Sequence-Based Transcription Factor Binding Site Prediction Tools,” a Book Chapter in,” Computational Biology of Transcription Factor Binding: Methods and Protocols,” Springer, 2010.

H. Zhou, H. Ali, J. Youn, Z. Zhang, “A Hybrid Wired and Wireless Network Infrastructure to Improve the Productivity and Quality Care of Critical Medical Applications”, the International Conference on Complex Medical Engineering (CME 2010), Gold Coast, Australia, July 2010.

S. Vaidya, J.  Youn, H.  Ali, N. Bahl, and D. Singh, “Real-Time Fall Detection and Activity Recognition Using Wireless Sensors,” International Conference on Networking and Information Technology (ICNIT-2010), Manila, Philippines. June 2010. 

J. Youn, H. Ali, H. Sharif, and B. Chhetri, “RFID-Based Information System for Preventing Medical Errors,” The Sixth Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, Toronto, Canada, July 2009.

N. Sharma, J. Youn, N. Shrestha and H. Ali, “Direction Finding Signage System using RFID for Healthcare Applications,” Proceedings of The International Conference on BioMedical Engineering and Informatics (BMEI2008), Sanya, Hainan, China, May 27-30, 2008.  

J. Uher, D. Sadofsky,  J. Youn, H. Ali, H. Sharif, J. Deogun, and S. Hinrichs, “I2MeDS: Intelligent Integrated Medical Data System,” Proceedings of The International Conference on BioMedical Engineering and Informatics (BMEI2008), Sanya, Hainan, China, May 27-30, 2008.  

P. Ciborowski and H. Ali, “Bioinformatics,” a book chapter in, “Proteomics for Undergraduates,” A. Kraj and J. Silberring (eds.), Wiley Inc., 2008.

X. Deng , H. Geng and H. Ali, “A Hidden Markov Model Approach to Predicting Yeast Gene Function from Sequential Gene Expression Data,” The International Journal of Bioinformatics Research and Applications, 2008:4(3):263-273.

D. Quest, K. Dempsey, M. Shafiullah, D. Bastola, and H. Ali. MTAP: A Motif Tool Assessment Pipeline for Automated Assessment of De Novo Regulatory Motif Discovery Tool. BMC Bioinformatics, August 2008.

X. Deng , H. Geng and H. Ali, “Cross-platform Analysis of Cancer Biomarkers: A Bayesian Network Approach to Incorporating Mass Spectrometry and Microarray Data,” Journal of Cancer Informatics, June 2007.

X. Huang and H. Ali, “High Sensitivity RNA Pseudoknot Prediction,” Nucleic Acid Research, 2007.

D. Quest, K. Dempsey, M. Shafiullah, D. Bastola, and H. Ali. A Parallel Architecture for Regulatory Motif Algorithm Assessment. HiCOMB 2008: Seventh IEEE International Workshop on High Performance Computational Biology, April 14th 2008.

J. Uher, D. Sadofsky,  J. Youn, H. Ali, H. Sharif, J. Deogun, and S. Hinrichs, “I2MeDS: Intelligent Integrated Medical Data System,” Proceedings of The International Conference on BioMedical Engineering and Informatics (BMEI 2008), Sanya, Hainan, China, May 27-30, 2008.

H. Geng, H. Ali and J. Chan, “A Hidden Markov Model Approach for Prediction of Genomic Alterations from Gene Expression Profiling,” Proceedings of the fourth International Symposium on Bioinformatics Research and Applications (ISBRA), Atlanta, Georgia, May 6-9, 2008.

H. Geng, X. Deng and H. Ali, “MPC: a Knowledge-based Framework for Clustering under Biological Constraints,” Int. J. Data Mining and Bioinformatics, Volume 2, Number 2, 2007.

X. Deng, H. Geng, D. Bastola and H. Ali, “Link Test — A Statistical Method for Finding Prostate Cancer Biomarkers," Journal of Computational Biology and Chemistry, 2006.

A. Churbanov, I. Rogozine, J. Deogun, and H. Ali, “Method of Predicting Splice Sites Based on Signal Interactions,” Biology Direct, 2006.

X. Deng, H. Geng, and H. Ali, “Joint Learning of Gene Functions--A Bayesian Network Model Approach". Journal of Bioinformatics and Comp. Biology, Vol. 4, No. 2, pp. 217-239, 2006.

X. Deng and H. Ali, EXAMINE, “A Computational Approach to Reconstructing Gene Regulatory Networks,” Journal of BioSystems, 81:125-136, 2005.

A. Churbanov, M. Pauley, D. Quest and H. Ali, “A method of precise mRNA/DNA homology-based gene structure prediction,” BMC Bioinformatics, 6:261, 2005.

A. Mohamed, D. Kuyper, P. Iwen, H. Ali, D. Bastola and S. Hinrichs, “Computational approach for the identification of Mycobacterium species using the internal transcribed spacer-1 region,” Journal of Clinical Microbiology, Vol. 43, No. 8: 3811-3817, 2005.

 

Lecturer: Hesham H. Ali.

Credits: 6 ECTS.

Date: 10-18 Sept 2012

Exam: Written Project.

Contact: kp@it.uu.se

Assistant: Marcus Björk.

Room: Polacksbacken,

            Building 2,

            Room p2347.

Location: Polacksbacken,

                 Uppsala SE-751 05,

                 Map.








Dates:

  1. 1. 10e Sept 2012, 10.15-12.00

  2. 2. 11e Sept 2012, 10.15-12.00

  3. 3. 12e Sept 2012, 13.15-15.00

  4. 4. 14e Sept 2012, 10.15-12.00

  5. 5. 17e Sept 2012, 10.15-12.00


Some Links

  1. Ali Hesham @ Nebraska

  2. SysCon @ UU