Skip to main navigation Skip to search Skip to main content

k-link EST clustering : evaluating error introduced by chimeric sequences under different degrees of linkage

  • Lauren M. Bragg
  • , Glenn Stone

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Motivation: The clustering of expressed sequence tags (ESTs) is a crucial step in many sequence analysis studies that require a high level of redundancy. Chimeric sequences, while uncommon, can make achieving the optimal EST clustering a challenge. Single-linkage algorithms are particularly vulnerable to the effects of chimeras. To avoid chimera-facilitated erroneous merges, researchers using single-linkage algorithms are forced to use stringent sequence-similarity thresholds. Such thresholds reduce the sensitivity of the clustering algorithm. Results: We introduce the concept of k-link clustering for EST data. We evaluate how clustering error rates vary over a range of linkage thresholds. Using k-link, we show that Type II error decreases in response to increasing the number of shared ESTs (ie. links) required. We observe a base level of Type II error likely caused by the presence of unmasked low-complexity or repetitive sequence. We find that Type I error increases gradually with increased linkage. To minimize the Type I error introduced by increased linkage requirements, we propose an extension to k-link which modifies the required number of links with respect to the size of clusters being compared.
Original languageEnglish
Pages (from-to)2302-2308
Number of pages7
JournalBioinformatics
Volume25
Issue number18
DOIs
Publication statusPublished - 2009

Fingerprint

Dive into the research topics of 'k-link EST clustering : evaluating error introduced by chimeric sequences under different degrees of linkage'. Together they form a unique fingerprint.

Cite this