Linked data partitioning for RDF processing on Apache Spark

Amir Hossein Atashkar, Nasser Ghadiri, Mehdi Joodaki

Research output: Chapter in Book / Conference PaperConference Paperpeer-review

5 Citations (Scopus)

Abstract

![CDATA[RDF models are widely used in the web of data due to their flexibility and similarity to graph patterns. Because of the growing use of RDFs, their volumes and contents are increasing. Therefore, processing of such massive amount of data on a single machine is not efficient enough, because of the response time and limited hardware resources. A common approach to overcome this limitation is cluster processing and huge datasets could benefit distributed cluster processing on Apache Hadoop. Because of using too much of hard disks, the processing time is usually inadequate. In this paper, we propose a partitiong approach based on Apache Spark for rapid processing of RDF data models. A key feature of Apache Spark is using main memory instead of hard disk, so the speed of data processing in our method is improved. We have evaluated the proposed method by runing SQL queris on RDF data which partitioned on the cluster and demonstrates improved performance.]]
Original languageEnglish
Title of host publicationProceedings of the 3rd International Conference on Web Research (ICWR), Tehran, Iran, 19-20 April, 2017
PublisherIEEE
Pages73-77
Number of pages5
ISBN (Print)9781538604205
DOIs
Publication statusPublished - 2017
EventInternational Conference on Web Research -
Duration: 19 Apr 2017 → …

Conference

ConferenceInternational Conference on Web Research
Period19/04/17 → …

Fingerprint

Dive into the research topics of 'Linked data partitioning for RDF processing on Apache Spark'. Together they form a unique fingerprint.

Cite this