TY - JOUR
T1 - AMT-Net
T2 - Adversarial Motion Transfer Network with Disentangled Shape and Pose for Realistic Image Animation
AU - Teka, Nega Asebe
AU - Chen, Jianwen
AU - Gedamu, Kumie
AU - Assefa, Maregu
AU - Akmel, Feidu
AU - Zhou, Zhenting
AU - Wu, Weijie
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Computer vision advancements allow motion transfer for animating static objects in images. However, current methods rely on manually collected motion labels and struggle with accurate shape and pose representation, particularly for human bodies due to occlusions and background variations. Thus, we propose an Adversarial Motion Transfer Network with a disentangled Shape and Pose representation for realistic image Animation (AMT-Net), utilizing an encoder-decoder adversarial structure. Specifically, we design a pose and shape learning module that captures the independent shape and pose information by training a discriminator with adversarial loss techniques, enhancing the generation of coherent animated frames. Furthermore, a motion estimation module is introduced to generate masks for objects in consecutive frames and identify occluded parts by creating occlusion maps from these masks and dense motion vectors. To evaluate the effectiveness of our approach, we conducted extensive experiments using four publicly available datasets, including VoxCeleb, TaiChiHD, TED-Talks, and MGif. The results emphasize the importance of landmark detection for video annotation and smooth transitions, while the independent shape and pose module helps capture precise representations.
AB - Computer vision advancements allow motion transfer for animating static objects in images. However, current methods rely on manually collected motion labels and struggle with accurate shape and pose representation, particularly for human bodies due to occlusions and background variations. Thus, we propose an Adversarial Motion Transfer Network with a disentangled Shape and Pose representation for realistic image Animation (AMT-Net), utilizing an encoder-decoder adversarial structure. Specifically, we design a pose and shape learning module that captures the independent shape and pose information by training a discriminator with adversarial loss techniques, enhancing the generation of coherent animated frames. Furthermore, a motion estimation module is introduced to generate masks for objects in consecutive frames and identify occluded parts by creating occlusion maps from these masks and dense motion vectors. To evaluate the effectiveness of our approach, we conducted extensive experiments using four publicly available datasets, including VoxCeleb, TaiChiHD, TED-Talks, and MGif. The results emphasize the importance of landmark detection for video annotation and smooth transitions, while the independent shape and pose module helps capture precise representations.
KW - Image animation
KW - Motion transfer
KW - Representation learning
UR - http://www.scopus.com/inward/record.url?scp=105005572704&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2025.3571760
DO - 10.1109/ACCESS.2025.3571760
M3 - Article
AN - SCOPUS:105005572704
SN - 2169-3536
JO - IEEE Access
JF - IEEE Access
ER -