DRESS : Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Lang

Last updated