MULTI-SCALE SPATIAL-FREQUENCY FEATURES REPRESENTATION AND LEARNABLE CROSS MODAL FEATURE FUSION IN DEEPFAKE DETECTION

Yuzhi Lu, Wenyi Wang, Xiaowen Chen, Fengyu Wang, Shuai Wang, Jianwen Chen

Research output: Chapter in Book / Conference PaperConference Paperpeer-review

Abstract

With the proliferation of fraudulent videos driven by DeepFake technology, DeepFake detection has become a prominent research topic. Many detection methods face generalization challenges when deep feature distributions between training and testing datasets differ. To address this, we propose a novel DeepFake detection model that extracts spatial-frequency features from forged videos across multiple scales and integrates them in a balanced manner to form a reliable and generalized representation of forgery features. Specifically, we employ cascaded MBConv and Swin Transformer modules for local and global spatial feature extraction. Additionally, we analyze spectral features, including high-frequency components in spatial and channel domains, and learnable source-agnostic features. During the fusion phase, we design a learnable feature channel adaption strategy to balance contributions from different modalities and prevent feature degradation. Extensive experiments show that our method outperforms others in both intra-dataset and cross-dataset scenarios, demonstrating strong performance and generalization capability.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings
PublisherIEEE Computer Society
Pages1558-1563
Number of pages6
ISBN (Electronic)9798331523794
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event32nd IEEE International Conference on Image Processing, ICIP 2025 - Anchorage, United States
Duration: 14 Sept 202517 Sept 2025

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference32nd IEEE International Conference on Image Processing, ICIP 2025
Country/TerritoryUnited States
CityAnchorage
Period14/09/2517/09/25

Bibliographical note

Publisher Copyright:
©2025 IEEE.

Keywords

  • Deepfake Detecton
  • Generalization
  • Multimodal
  • Multimodal Feature Fusion

Fingerprint

Dive into the research topics of 'MULTI-SCALE SPATIAL-FREQUENCY FEATURES REPRESENTATION AND LEARNABLE CROSS MODAL FEATURE FUSION IN DEEPFAKE DETECTION'. Together they form a unique fingerprint.

Cite this