PhD: Unlocking patterns in limited omics data Machine learning-based diagnostics from small and sparse cancer omics dataset
PhD Defense of Alexandra Danyi
Better cancer diagnosis is key to effective treatment. One promising approach is using machine learning (ML) models to analyze biological data from cancer patients. However, this data often has limitations that make it hard to build reliable models.
Traditionally, doctors use tissue samples (biopsies) to study tumors, but a newer sampling method, liquid biopsy, is less invasive and therefore safer for patients. Yet, data from liquid biopsies is often sparse and different from tissue samples, making it harder for ML models trained on tissue data to work well with liquid biopsy samples.
Another challenge is that data from different sources can vary widely due to differences in experimental methods or patient groups. This 鈥渟hift鈥 can reduce a model's accuracy when applied to new data. Small patient groups in clinical studies also pose challenges, especially when analyzing gene expression data, which typically have many variables. This can lead to models that are overfitting to the limited data they were trained on. Additionally, if one type of cancer is much more common in the dataset than others, ML modles may struggle to recognize the rarer types.
This research explores ways to overcome these issues. For example, it shows how ML models can be adapted to work with sparse liquid biopsy data by creating synthetic data and combining different types of genetic information. It also tests methods to handle data shifts and imbalance, and whether they improve the classification of cancer types. In another study, an ML model was developed to predict how prostate cancer patients respond to specific treatments, even with limited data. The results suggest that combining treatment history with genetic and molecular data can help guide better treatment choices.
- Start date and time
- End date and time
- Location
- PhD candidate
- Alexandra Danyi
- Dissertation
- Unlocking patterns in limited omics data Machine learning-based diagnostics from small and sparse cancer omics datasets
- PhD supervisor(s)
- prof. dr. ir. J. de Ridder
- Co-supervisor(s)
- dr. M. Jager