Abstract
The growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost-effectiveness. This paper introduces a lossless compression method that integrates Trie-based shared dictionaries within an edge–cloud architecture. It presents a software-centric scientific research process of the design and evaluation of the proposed compression method. By enabling localized preprocessing at the edge, our approach reduces data redundancy before cloud transmission, thereby optimizing both storage and network efficiency. A global shared dictionary is constructed using N-gram analysis to identify and prioritize repeated sequences across multiple files. A lightweight index derived from this dictionary is then pushed to edge nodes, where Trie-based sequence replacement is applied to eliminate redundancy locally. The preprocessed data are subsequently transmitted to the cloud, where advanced compression algorithms, such as Zstd, GZIP, Snappy, and LZ4, further compress them. Evaluation on real patient omics datasets from B-cell Acute Lymphoblastic Leukemia (B-ALL) and Chronic Lymphocytic Leukemia (CLL) demonstrates that edge preprocessing significantly improves compression ratios, reduces upload times, and enhances scalability in hybrid cloud frameworks.
| Original language | English |
|---|---|
| Article number | 41 |
| Number of pages | 14 |
| Journal | Journal of Sensor and Actuator Networks |
| Volume | 14 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - Apr 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- genomic data compression
- global shared dictionary
- health data storage optimization
- health informatics scalability
- N-gram analysis
- Zstd compression
Fingerprint
Dive into the research topics of 'Lossless compression with Trie-based shared dictionary for omics data in edge–cloud frameworks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver