Abstract
Graphs and graph databases are applicable over a wide range of domains, including text mining and web mining. Using graphs to represent relationships between entities provides enriched models for emerging tasks of web search and information retrieval. Natural language processing algorithms use graphs to model structural relationships of texts efficiently, resulting in improved performance. However, the need to increase the accuracy of graph construction and weight allocation remains a fundamental challenge. Existing methods for these tasks provide limited efficiency and lack scalability for large graphs. In this study, we propose a novel graph-based method for text modeling and running a query to evaluate the similarity of text segments. In this method, the graph corresponding to the text is first created by modeling words and named entities by the state-of-the-art pre-trained BERT model. Graph nodes are then weighted in two stages. In the first stage, the nodes with more generalization obtain higher weights. The second weighting stage is done by the graph obtained from the query text. In this weighting step, nodes are considered important if they are specifically related to the query text. After determining the important nodes in the graph, the semantic similarity between the query text and the texts in the database is measured. The whole process of this framework uses a natural language processing pipeline in Apache Spark scalable platform. The efficiency of the model was evaluated for both distributed and non-distributed configuration and its scalability on a Spark cluster. Evaluation of the accuracy using the Pearson correlation coefficient shows that the proposed method performs higher performance than its competitors.
Original language | English |
---|---|
Title of host publication | Proceedings of the 7th International Conference on Web Research (ICWR), 19-20 May 2021, Tehran, Iran |
Publisher | IEEE |
Pages | 182-190 |
Number of pages | 9 |
ISBN (Print) | 9781665404266 |
DOIs | |
Publication status | Published - 19 May 2021 |
Event | International Conference on Web Research - Duration: 19 May 2021 → … |
Conference
Conference | International Conference on Web Research |
---|---|
Period | 19/05/21 → … |
Bibliographical note
Publisher Copyright:© 2021 IEEE.