Research Center for Information Technology Innovation, Academia Sinica

Artificial Intelligence and Image Understanding Lab

We build secure, embodied, and agentic visual intelligence: from computer vision and deepfake detection to MLLMs, physical AI, and multimodal AI agents.

Computer Vision AI Security Deepfake Detection MLLM Physical AI Agentic AI Generative AI

Led by Dr. Jun-Cheng Chen, Associate Research Fellow.

Research Focus

AIIU Lab studies visual intelligence across perception, generation, robustness, biometrics, tracking, 3D reasoning, multimodal reasoning, and autonomous validation.

Lab Direction

Our work connects foundation models with real visual data, physical scenes, and reliable evaluation for practical AI systems.

Research Areas

AI Security

Watermark robustness, adversarial attacks, backdoors, model safety, and trustworthy visual generation.

Deepfake Detection

Foundation-model adaptation, explainable detection, face forgery analysis, and robust media forensics.

MLLM

Vision-language reasoning, hallucination analysis, multimodal evaluation, and reliable foundation models.

Physical AI

Object pose, 3D/4D scene understanding, stereo geometry, robotic perception, and real-world visual grounding.

Agentic AI

LLM/VLLM agents for data generation, validation, prompt optimization, and autonomous visual workflows.

Generative Vision

Controllable diffusion, image/video synthesis, restoration, concept control, and creative visual modeling.

Recent News

View all publications
2026
New papers are to appear in ICIP, ICML, CVPR Findings, FG, and ICLR.
Mar 2026
Four papers will appear at WACV 2026, spanning diffusion control, object pose estimation, domain-adaptive detection, and concept erasure evaluation.
Dec 2025
New work accepted to NeurIPS 2025 and ACML 2025.
Sep 2025
A strong run of papers appeared at ICIP 2025, MMSP 2025, and AVSS 2025, including oral and spotlight presentations.
Apr-Jul 2025
AIIU-related work appeared at ICLR 2025, ICML 2025, and CVPR 2025.

Featured Works

Text Slider teaser
WACV 2026

Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters

Pin-Yen Chiu, I-Sheng Fang, Jun-Cheng Chen

A lightweight framework for continuous concept control in image and video synthesis through low-rank text-encoder adaptation.

Gen-n-Val thumbnail
CVPR Findings 2026

Gen-n-Val: Agentic Image Data Generation and Validation

Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Yu-Lun Liu, Chih-Yu Wang, Jun-Cheng Chen

An agentic pipeline for generating and validating synthetic visual data for robust computer vision training.

KMOPS thumbnail
WACV 2026

KMOPS: Keypoint-Driven Method for Multi-Object Pose and Metric Size Estimation from Stereo Images

Ying-Kun Wu, Yi Shen, Tzuhsuan Huang, I-Sheng Fang, Jun-Cheng Chen

A stereo-based method for estimating multi-object 6D pose and metric size through keypoint geometry.