Abstract: The vision foundation model, relying on large-scale pre-training, has advanced image comprehension capabilities and excels in general scenarios. However, its performance remains suboptimal ...