Abstract
With the proliferation of fraudulent videos driven by DeepFake technology, DeepFake detection has become a prominent research topic. Many detection methods face generalization challenges when deep feature distributions between training and testing datasets differ. To address this, we propose a novel DeepFake detection model that extracts spatial-frequency features from forged videos across multiple scales and integrates them in a balanced manner to form a reliable and generalized representation of forgery features. Specifically, we employ cascaded MBConv and Swin Transformer modules for local and global spatial feature extraction. Additionally, we analyze spectral features, including high-frequency components in spatial and channel domains, and learnable source-agnostic features. During the fusion phase, we design a learnable feature channel adaption strategy to balance contributions from different modalities and prevent feature degradation. Extensive experiments show that our method outperforms others in both intra-dataset and cross-dataset scenarios, demonstrating strong performance and generalization capability.
| Original language | English |
|---|---|
| Title of host publication | 2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings |
| Publisher | IEEE Computer Society |
| Pages | 1558-1563 |
| Number of pages | 6 |
| ISBN (Electronic) | 9798331523794 |
| DOIs | |
| Publication status | Published - 2025 |
| Externally published | Yes |
| Event | 32nd IEEE International Conference on Image Processing, ICIP 2025 - Anchorage, United States Duration: 14 Sept 2025 → 17 Sept 2025 |
Publication series
| Name | Proceedings - International Conference on Image Processing, ICIP |
|---|---|
| ISSN (Print) | 1522-4880 |
Conference
| Conference | 32nd IEEE International Conference on Image Processing, ICIP 2025 |
|---|---|
| Country/Territory | United States |
| City | Anchorage |
| Period | 14/09/25 → 17/09/25 |
Bibliographical note
Publisher Copyright:©2025 IEEE.
Keywords
- Deepfake Detecton
- Generalization
- Multimodal
- Multimodal Feature Fusion