A scalable pattern mining method using Apache spark platform

Samaneh Samiei, Mehdi Joodaki, Nasser Ghadiri

Research output: Chapter in Book / Conference PaperConference Paperpeer-review

3 Citations (Scopus)

Abstract

![CDATA[The amount of data is growing sharply on the Internet. Some data like log files are enormous and entail valuable and precious hidden patterns. In other words, a log file is a set of recorded events that carry beneficial and vital information to develop web server performance, stability server loads, control, and rush up user response operations. However, analyzing massive data take a long time and require powerful hardware. Also, the performance of sequential pattern mining methods is usually unsatisfactory to deal with such data. This paper proposes a novel and advanced parallel method for finding the log file patterns, such as frequent patterns (e.g., URL, IP, Status Code ), how users accessed files, the number of errors, and the most common errors by applying the Apache Spark platform. Experiment results demonstrate that the proposed method's run time on three datasets is significantly less than its four rival pattern mining methods.]]
Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Web Research (ICWR), 19-20 May 2021, Tehran, Iran
PublisherIEEE
Pages114-118
Number of pages5
ISBN (Print)9781665404266
DOIs
Publication statusPublished - 2021
EventInternational Conference on Web Research -
Duration: 19 May 2021 → …

Conference

ConferenceInternational Conference on Web Research
Period19/05/21 → …

Fingerprint

Dive into the research topics of 'A scalable pattern mining method using Apache spark platform'. Together they form a unique fingerprint.

Cite this