CoDE Publications CoDE Publications
IRIDIA Publications IRIDIA Publications
SMG Publications
WIT Publications
WIT Publications
SMG Publications
Home People Research Activities Publications Teaching Resources
By Class By Topic By Year Technical Reports
By Class By Topic By Year Technical Reports
login
A. Werth. Prediction of breast cancer survival from gene expression data using a k-NN algorithm. Mémoire d'Ingénieur Civil Informaticien, Université libre de Bruxelles, Brussels, Belgium, 2010.

Abstract

The development of genome wide profiling methods such as microarray has created a new field of bioinformatics that aims at improving the understanding of diseases and thus improving life time and life quality of patients. As a contribution to this field, this thesis explores a new machine learning technique to predict the clinical outcome of breast cancer patients. Breast cancer is one of the most common cancers and an important cause of death in developed countries. When this cancer is diagnosed, the first step is to remove it via surgery. After the removal, a physician has to decide on whether further treatment such as chemotherapy or hormonotherapy are given to avoid a relapse of the disease. To do so, they try to estimate the recurrence risk of the disease by looking at clinical prognostic factors such as the tumour size. However, the predictive power of such prognostic factors is low. For now two decades, new high throughput technologies such as microarrays have opened another window on cancer evolution that could help making the physicians come to a more accurate decision about the patients' future. Microarrays are made of gene expressions from the patient's tumour cells and thus contain key information about the cancer evolution. This is why they can be used as prognostic factors to help predicting the disease outcome. Nevertheless, analysing microarray data has proven to be a challenging task because of the abundance of genes (there are several thousands of them in each sample) and the high levels of noise. A lot of efforts have been put in identifying the genes involved with cancer progression and building prognostic models on them. With the use of biological knowledge obtained in previous studies we propose a new model aimed at predicting survival of breast cancer patients. It uses state-of-the-art knowledge about breast cancer and combines it with a well known but - in this context - unexplored machine learning technique: a Nearest Neighbour algorithm. As opposed to a global model, this local approach respects molecular subtypes without having to identify them a priori. Our method was compared to several other prognostic models including several state-ofthe- art signatures. The performance of the algorithm showed to be comparable to most state-of-the-art signatures. Also, it sheds some light on the importance of genes depending on subtypes of cancer.


Updated: 2017-03-27