thinkomni

ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding

ThinkOmni is a training-free framework that enhances omni-modal LLMs (OLLMs) with the reasoning ability of large reasoning models (LRMs) via guidance decoding.
Instead of additional finetuning, ThinkOmni integrates an off-the-shelf LRM at decoding time and adaptively balances perception vs. reasoning signals for robust multi-modal reasoning.

Highlights

Installation

Set up environment

# clone
git clone https://github.com/1ranGuan/thinkomni.git
cd thinkomni

# create environment
conda create -n thinkomni python=3.10 -y
conda activate thinkomni

# install packages
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Prepare data

Quick Start

Inference

bash scripts/inference.sh

Evaluation

Set your API URL / key in scripts/eval.sh, then run:

bash scripts/eval.sh

Citation

If you find this work useful, please cite:

```bibtex @inproceedings{guan2026thinkomni, title={ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding}, author={Guan, Yiran and Tu, Sifan and Liang, Dingkang and Zhu, Linghao and Ju, Jianzhong and Luo, Zhenbo and Luan, Jian and Liu, Yuliang and Bai, Xiang}, booktitle={International Conference on Learning Representations (ICLR)}, year={2026} }