TY - JOUR
T1 - A multistage protocol for aggregated queries in distributed cloud databases with privacy protection
AU - Kelarev, Andrei
AU - Yi, Xun
AU - Badsha, Shahriar
AU - Yang, Xuechao
AU - Rylands, Leanne
AU - Seberry, Jennifer
PY - 2019
Y1 - 2019
N2 - This article is devoted to the novel situation, where a large distributed cloud database is a union of several separate databases belonging to individual database owners who are not allowed to transfer their data for storage in locations different from their already chosen separate cloud service providers. For example, a very large number of medical records may be stored in a distributed cloud database, which is a union of several separate databases from different hospitals, or even from different countries. The owners of the databases may need to provide answers to certain common aggregated queries using all information available without sharing or transferring all data. It is necessary to minimize the communication costs, improve efficiency, and comply with the legal requirements protecting the privacy of confidential data. In this situation, it is impossible to aggregate the whole database in one location, but effective methods for answers to the aggregated queries with privacy protection are required. To solve this important problem, the present article proposes a Multistage Separate Query Processing (MSQP) protocol employing homomorphic encryption with split keys. We show that our protocol can answer a large class of natural queries of practical significance. The running time of the MSQP protocol is O(d+[Formula presented]), where d is the number of database owners and m is the total number of records in the whole database. In practice, d is small, m can be very large, and so the running time is O(m). This means that the protocol is very efficient for large databases. It dramatically reduces the communication costs of computation and completely eliminates the need for exchange of confidential data. We define a new generalized additive homomorphic property and introduce a Multipart ElGamal Cryptosystem (MEC) with split keys, which enjoys this property. MEC is a novel modification of the ElGamal cryptosystem with split keys. This paper presents the results of extensive experiments evaluating the effectiveness of the MSQP protocol employing MEC and comparing it with MSQP employing the ElGamal cryptosystem, for a collection of publicly available medical datasets. The experiments evaluating our protocol on 11 real-life databases and a synthetic database demonstrate that the MSQP protocol employing MEC is more efficient than other options and can be recommended for practical implementations.
AB - This article is devoted to the novel situation, where a large distributed cloud database is a union of several separate databases belonging to individual database owners who are not allowed to transfer their data for storage in locations different from their already chosen separate cloud service providers. For example, a very large number of medical records may be stored in a distributed cloud database, which is a union of several separate databases from different hospitals, or even from different countries. The owners of the databases may need to provide answers to certain common aggregated queries using all information available without sharing or transferring all data. It is necessary to minimize the communication costs, improve efficiency, and comply with the legal requirements protecting the privacy of confidential data. In this situation, it is impossible to aggregate the whole database in one location, but effective methods for answers to the aggregated queries with privacy protection are required. To solve this important problem, the present article proposes a Multistage Separate Query Processing (MSQP) protocol employing homomorphic encryption with split keys. We show that our protocol can answer a large class of natural queries of practical significance. The running time of the MSQP protocol is O(d+[Formula presented]), where d is the number of database owners and m is the total number of records in the whole database. In practice, d is small, m can be very large, and so the running time is O(m). This means that the protocol is very efficient for large databases. It dramatically reduces the communication costs of computation and completely eliminates the need for exchange of confidential data. We define a new generalized additive homomorphic property and introduce a Multipart ElGamal Cryptosystem (MEC) with split keys, which enjoys this property. MEC is a novel modification of the ElGamal cryptosystem with split keys. This paper presents the results of extensive experiments evaluating the effectiveness of the MSQP protocol employing MEC and comparing it with MSQP employing the ElGamal cryptosystem, for a collection of publicly available medical datasets. The experiments evaluating our protocol on 11 real-life databases and a synthetic database demonstrate that the MSQP protocol employing MEC is more efficient than other options and can be recommended for practical implementations.
KW - cloud computing
KW - data protection
KW - online databases
KW - privacy
UR - http://handle.westernsydney.edu.au:8081/1959.7/uws:49970
U2 - 10.1016/j.future.2018.08.017
DO - 10.1016/j.future.2018.08.017
M3 - Article
SN - 0167-739X
VL - 90
SP - 368
EP - 380
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
ER -