Abstract
Gene expression datasets used in biomedical data mining frequently have two characteristics: they have many thousand attributes but only relatively few sample points and the measurements are noisy. In other words, individual expression measurements may be untrustworthy. Gene Feature Ranking (GFR) is a feature selection methodology that addresses these domain specific characteristics by selecting features (i.e. genes) based on two criteria: (i) how well the gene can discriminate between classes of patient and (ii) the trustworthiness of the microarray data associated with the gene. An example from the pediatric cancer domain demonstrates the use of GFR and compares its performance with a feature selection method that does not explicitly address the trustworthiness of the underlying data.
Original language | English |
---|---|
Title of host publication | Data mining for business applications |
Place of Publication | U.S |
Publisher | Springer |
Pages | 159-168 |
Number of pages | 10 |
ISBN (Print) | 9780387794204 |
Publication status | Published - 2009 |
Keywords
- data mining
- gene expression
- medical informatics
- microarray analysis