Applications of Bayesian Network Models in studying Acute Myeloid Leukemia (AML)
MetadataShow full metadata
My thesis aims at designing a computational model to analyze gene expression data to improve cancer diagnosis, specifically Acute Myeloid Leukemia (AML), which is a type of aggressive blood cancer. As part of a team of researchers in the Oncinfo Lab, I used Bayesian networks (BN) to model gene expression data. A BN is a probabilistic graphical model where a set of random variables represent nodes of a Directed Acyclic Graph (DAG). The edges of the DAG model the conditional dependencies between the random variables. We used established clustering methods to cluster data and group similar genes together. Specifically, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) as a clustering mechanism to cluster our gene expression data. For each cluster of genes, we used principal component analysis (PCA) to compute a single value, called an eigengene. Eigengenes were represented by nodes in the BN and dependency among those eigengenes were modeled by the edges of the BN. The rational for using a BN in this framework is that it can model gene expressions and dependencies, enabling us to use probability theory to make scientific predictions. The application of our BN model is to identify AML patients from another type of hematological malignancy. I performed the classification of patients using a cross-validation technique and tested the performance on an independent dataset. Moreover, I trained my model on a training dataset with 366 samples and evaluated the performance on a test dataset with 74 samples. The accuracy of predictions on train and test datasets were 93.5% and 84%, respectively. Further improvements to the methodology are required to improve its accuracy and make it appropriate for clinical use.