Abstract
Data mining and knowledge discovery have been applied to datasets in various industries including biomedical data. Modelling, data mining and visualization in biomedical data address the problem of extracting knowledge from large and complex biomedical data. The current challenge of dealing with such data is to develop statistical-based and data mining methods that search and browse the underlying patterns within the data. In this paper, we employ several data reduction methods for visualizing genome-wide Single Nucleotide Polymorphism (SNP) datasets based on state-of-art data reduction techniques. Visualization approach has been selected based on the trustworthiness of the resultant visualizations. To deal with large amounts of genetic variation data, we have chosen to apply different data reduction methods to deal with the problem induced by high dimensionality. Based on the trustworthiness metric we found that neighbour Retrieval Visualizer (NeRV) outperformed other methods. This method optimizes the retrieval quality of Stochastic neighbour Embedding. The quality measure of the visualization (i.e. NeRV) showed excellent results, even though the dataset was reduced from 13917 to 2 dimensions. The visualization results will assist clinicians and biomedical researchers in understanding the systems biology of patients and how to compare different groups of clusters in visualizations.
Original language | English |
---|---|
Title of host publication | AusDM 2008 : Proceedings of the 7th Australasian Data Mining Conference |
Editors | John F. Roddick, Jiuyong Li, Peter Christen, Paul J. Kennedy |
Place of Publication | Sydney, N.S.W |
Publisher | Australian Computer Society |
Pages | 111-121 |
Number of pages | 11 |
ISBN (Print) | 9781920682682 |
Publication status | Published - 2008 |
Keywords
- acute
- data mining
- information visualisation
- leukemia
- lymphocytic
- medical informatics
- single nucleotide polymorphisms