Skip to main navigation Skip to search Skip to main content

Lossless compression with Trie-based shared dictionary for omics data in edge–cloud frameworks

  • The Children's Hospital at Westmead
  • The University of Sydney
  • University of Technology Sydney

Research output: Contribution to journalArticlepeer-review

1 Downloads (Pure)

Abstract

The growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost-effectiveness. This paper introduces a lossless compression method that integrates Trie-based shared dictionaries within an edge–cloud architecture. It presents a software-centric scientific research process of the design and evaluation of the proposed compression method. By enabling localized preprocessing at the edge, our approach reduces data redundancy before cloud transmission, thereby optimizing both storage and network efficiency. A global shared dictionary is constructed using N-gram analysis to identify and prioritize repeated sequences across multiple files. A lightweight index derived from this dictionary is then pushed to edge nodes, where Trie-based sequence replacement is applied to eliminate redundancy locally. The preprocessed data are subsequently transmitted to the cloud, where advanced compression algorithms, such as Zstd, GZIP, Snappy, and LZ4, further compress them. Evaluation on real patient omics datasets from B-cell Acute Lymphoblastic Leukemia (B-ALL) and Chronic Lymphocytic Leukemia (CLL) demonstrates that edge preprocessing significantly improves compression ratios, reduces upload times, and enhances scalability in hybrid cloud frameworks.

Original languageEnglish
Article number41
Number of pages14
JournalJournal of Sensor and Actuator Networks
Volume14
Issue number2
DOIs
Publication statusPublished - Apr 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • genomic data compression
  • global shared dictionary
  • health data storage optimization
  • health informatics scalability
  • N-gram analysis
  • Zstd compression

Fingerprint

Dive into the research topics of 'Lossless compression with Trie-based shared dictionary for omics data in edge–cloud frameworks'. Together they form a unique fingerprint.

Cite this