I am a final-year Ph.D. student at University of Technology Sydney, advised by Prof. Xiaojun Chang. I also work closely with Heng Wang, Linjie Yang, and Xiaojie Jin on various video-language projects at Bytedance.
Before moving to UTS, I spent a wonderful two years in Monash University. Prior to my candidature, I was a visiting student at MMLab, SIAT, Chinese Academy of Sciences, where I was fortunate to work with Prof. Yu Qiao, and Prof. Yali Wang.
I received my Master's degree from University of Chinese Academy of Sciences (UCAS) and my Bachelor's degree from Nankai University (NKU) with graduate honours.
Recent Activities
- 🌟🌟 I am currently on the job market for a research position. 🌟🌟
- RoomTour3D: Automatic, scalable and cheap! Diverse and Scalable video-instruction data for embodied navigation (VLN). Ongoing effort with current 200k instructions released. We achieve SOTA on SOON and REVERIE with this newly introduced data.
- Shot2Story-134K: Manual 43K + GPTV 90K. 134K multishot videos covering over 548k video shots; Detailed text summaries with over 6M words! We have released this new video description dataset. With the assistance of LLM, our method achieves SOTA performance on zero-shot MSRVTT-QA.
- ECCV 2024 Oral LongVLM: Efficient long-video frame encoding for large video language models.
- CVPR 2024 PMV-400: Portrait-mode videos rock the social media! We have developed the first video dataset dedicated to the research of this emerging video format.
- ICCV 2023 HTML: One paper on language referring video object segmentation (RVOS) gets accepted. No additional cost during inference with performance largely boosted!
- NeurIPS 2023: One paper on efficient video segmentation gets accepted.
Research interest
My research interests lie in computer vision and machine learning. Currently, I am focusing on large vision-language models and their application in robotics. I worked on video-language downstream tasks related to object and event prediction in videos, like Referring-VOS and video grounding. Previously, I worked on individual and group activity recognition, and video object detection with full and limited supervision. During my Master's thesis, I worked on moving object detection and tracking.
Publications and preprints
Mingfei Han, Liang Ma, Kamila Zhumakhanova, Ekaterina Radionova, Jingyi Zhang, Xiaojun Chang, Xiaodan Liang and Ivan Laptev
Web-video based video-instruction training data. Automatic, scalable and cheap! 2024
project page / code / Annotations / Video Frames
Mingfei Han, Linjie Yang, Xiaojun Chang and Heng Wang
We present a new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries. 43K human annotations + 90K GPTV annotations. 2023
project page / paper / demo / code / data / video / bibtex
Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang and Bohan Zhuang
European Conference on Computer Vision (ECCV), 2024 (Oral)
code / pdf / bibtex
Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang and Heng Wang
We have developed the first dataset dedicated to portrait mode videos and focus on the research of this emerging video format. CVPR 2024
project page / paper / data / bibtex
Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang and Bohan Zhuang
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023
code / pdf / bibtex
Mingfei Han, Yali Wang, Zhihui Li, Lina Yao, Xiaojun Chang and Yu Qiao
International Conference on Computer Vision (ICCV), 2023
project page / pdf / poster / bibtex
Mingfei Han, David Junhao Zhang, Yali Wang, Ruiyan, Lina Yao, Xiaojun Chang and Yu Qiao
Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)
project page / arXiv / slides / poster / presentation / bibtex
Mingfei Han, Yali Wang, Mingjie Li, Xiaojun Chang, Yi Yang and Yu Qiao
IEEE Transactions on Image Processing (TIP), 2021
IEEE / bibtex
Mingfei Han, Yali Wang, Xiaojun Chang, Yu Qiao
European Conference on Computer Vision (ECCV), 2020
ECVA / code / bibtex
Shiyu Xuan, Shengyang Li, Mingfei Han, Xue Wan, Gui-song Xia
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2019
IEEE / code / bibtex
Xiaojun Chang, Wenhe Liu, Po-Yao Huang, Changlin Li, Fengda Zhu, Mingfei Han, et al.
First Prize on Trecvid Activities in Extended Video (ActEV) challenge, 2019
NIST / bibtex
Talks
- China Society of Image and Graphics - Guangdong Branch in Chinese, "CSIG-Guangdong CVPR Papers sharing - Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition", May 2022
- Jishi Live in Chinese, with my firend Xiangtao Kong who's on Low-level Vision and Super-Resolution, "CAS-SIAT CVPR Papers sharing - Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition", with recording here, April 2022
- ML and VL Seminar at Monash University, "Mining Inter-Video Proposal Relations for Video Object Detection", November 2020
Academic service
Reviewer for journals: TPAMI, IJCV, TIP, TCSVT, TNNLS, TMM.
Reviewer for conferences: CVPR, ICCV, ECCV, ICLR, 3DV, ACCV.