Dongwon Kim

Postdoctoral Researcher, KAIST
kdwon@postech.ac.kr

world model / representation learning / multi-modal learning

I am a postdoctoral researcher at KAIST, working with Prof. Jeany Son. I previously completed my BS and PhD at POSTECH CVLab where I worked with Prof. Suha Kwak. My research centers on whether machines can learn representations with the right abstraction and hierarchy, spanning work in compositional representation (SelfMod), multi-modalities (DivE, SaG, MaskGen), and world model (ongoing, CompACT).

News

Apr 2026	I gave invited talks at Google DeepMind and Yonsei University, about CompACT and abstraction for world model.
Feb 2026	A paper on world model is accepted at CVPR 2026 (CompACT).
Feb 2026	I won POSTECH CSE Best Research Award 2025, recognizing the best research among PhD graduates.
Jul 2025	A paper on 1-dimensinal tokenization and text-to-image generation is accepted at ICCV 2025 (MaskGen).
Jul 2025	I have completed my defense - now I’m officially a Ph.D (thesis title: “Learning Compositional Visual Representations for Vision-Language Understanding and Generation”).

Education

Sep 2019 - Aug 2025	Integrated M.S & Ph.D in Computer Science & Engineering POSTECH, Pohang, South Korea Advisor: Prof. Suha Kwak
Mar 2015 - Aug 2019	B.S. in Computer Science & Engineering POSTECH, Pohang, South Korea

Experience

Nov 2025 - Present	Postdoctoral researcher · KAIST, Daejeon, KR InnoCORE-LLM, PI: Jeany Son
Jun 2024 - Nov 2024	Research Intern · Fundamental Research Team, ByteDance SEED, San Jose, US Developed efficient text-to-image generative model using 1D tokens (MaskGen)

Publications

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

Dongwon Kim, Gawon Seo, Jinsung Lee, Minsu Cho, and Suha Kwak

In CVPR, Jun 2026

(An early version appeared in LSRW workshop in CoRL 2025)

arXiv Code Website
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

Dongwon Kim*, Ju He*, Qihang Yu*, Chenglin Yang, Xiaohui Shen, Suha Kwak, and Liang-Chieh Chen

In ICCV, Jan 2025

arXiv PDF Code
1.58-bit FLUX

Chenglin Yang, Celong Liu, Xueqing Deng, Dongwon Kim, Xing Mei, Xiaohui Shen, and Liang-Chieh Chen

In arXiv preprint, Dec 2024

arXiv
Bootstrapping Top-down Information for Self-modulating Slot Attention

Dongwon Kim, Seoyeon Kim, and Suha Kwak

In NeurIPS, Dec 2024

arXiv Code
PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery

Jicheol Park, Dongwon Kim, Boseung Jeong, and Suha Kwak

In ECCV, Oct 2024
Extending CLIP’s Image-Text Alignment to Referring Image Segmentation

Seoyeon Kim, Minguk Kang, Dongwon Kim, Jaesik Park, and Suha Kwak

In NAACL, Jun 2024

Code
Shatter and Gather: Learning Referring Image Segmentation with Text Supervision

Dongwon Kim*, Namyup Kim*, Cuiling Lan, and Suha Kwak

In ICCV, Oct 2023

arXiv Code Website
Improving Cross-Modal Retrieval With Set of Diverse Embeddings

Dongwon Kim, Namyup Kim, and Suha Kwak

In CVPR, Jun 2023

(Highlight, 235/9155 = 2.5%)
arXiv Code Website
ReSTR: Convolution-Free Referring Image Segmentation Using Transformers

Namyup Kim, Dongwon Kim, Cuiling Lan, Wenjun Zeng, and Suha Kwak

In CVPR, Jun 2022

arXiv Website
Self-Taught Metric Learning Without Labels

Sungyeon Kim, Dongwon Kim, Minsu Cho, and Suha Kwak

In CVPR, Jun 2022

arXiv Code Website
Embedding Transfer With Label Relaxation for Improved Metric Learning

Sungyeon Kim, Dongwon Kim, Minsu Cho, and Suha Kwak

In CVPR, Jun 2021

arXiv Website
Proxy Anchor Loss for Deep Metric Learning

Sungyeon Kim, Dongwon Kim, Minsu Cho, and Suha Kwak

In CVPR, Jun 2020

arXiv Code Website