ThinkOmni is a training-free framework that enhances omni-modal LLMs (OLLMs) with the reasoning ability of large reasoning models (LRMs) via guidance decoding.
Instead of additional finetuning, ThinkOmni integrates an off-the-shelf LRM at decoding time and adaptively balances perception vs. reasoning signals for robust multi-modal reasoning.
# clone
git clone https://github.com/1ranGuan/thinkomni.git
cd thinkomni
# create environment
conda create -n thinkomni python=3.10 -y
conda activate thinkomni
# install packages
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
./dataset/MMAU./dataset/OmniBench./dataset/Daily-Omnibash scripts/inference.sh
Set your API URL / key in scripts/eval.sh, then run:
bash scripts/eval.sh
If you find this work useful, please cite:
```bibtex @inproceedings{guan2026thinkomni, title={ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding}, author={Guan, Yiran and Tu, Sifan and Liang, Dingkang and Zhu, Linghao and Ju, Jianzhong and Luo, Zhenbo and Luan, Jian and Liu, Yuliang and Bai, Xiang}, booktitle={International Conference on Learning Representations (ICLR)}, year={2026} }