Abstract
In this paper we address the problem of matching data from different databases using a third party, where the actual data can not be disclosed. The aim is to provide a mechanism for improved matching results across databases while preserving the privacy of sensitive information in those databases. This is particularly relevant with health related databases, where bringing data about patients together from multiple databases allows for important medical research, but the sensitive nature of the data requires that identifying information never be disclosed. The method described uses a public reference table to provide a way for matching people’s names in different databases without requiring identifying information to be revealed to any party outside the originating data source. An advantage of our algorithm is that it provides a mechanism for dealing with typographical or other errors in the data. The key features of our proposed approach are: (1) original private data from individual custodians are never revealed to any other party because data comparison is performed at individual custodians and only comparison results, which are data in the reference table, are sent; (2) the third party performs the match based on encrypted values in the public reference table and some distance information. Experimental results show that our proposed method performs fuzzy matching (similarity join) at an accuracy comparable to that of conventional fuzzy matching algorithms without revealing any identifying information.
Original language | English |
---|---|
Title of host publication | Intelligent Patient Management |
Editors | S. I. (Sally I.) McClean, Peter H. Millard, Elia El-Darzi, Chris D. Nugent |
Place of Publication | Germany |
Publisher | Springer |
Pages | 71-89 |
Number of pages | 19 |
ISBN (Electronic) | 9783642001796 |
ISBN (Print) | 9783642001789 |
DOIs | |
Publication status | Published - 2009 |