Georeactor BlogRSS Feed
Cartoon ML - Part 5 - IDEFICS
I saw a Tweet about multi-modal instruct model IDEFICS, which was released in August. This offered a newer and more accessible alternative to the InstructBLIP model, so I decided to give it a try. It still requires an A100 to run the 9B-param model.
HuggingFace also recently added community blog posts. For a while I thought about my first post being comments on 5-10 ML papers, then I landed on the idea to write about trying IDEFICS: https://huggingface.co/blog/monsoon-nlp/idefics-newyorker
I like the prompting setup:
prompts = [ [ "User: Describe all characters and setting of this cartoon in detail. It may be sardonic or absurdist.", image, # PIL object or URL string "<end_of_utterance>", ], ]
This could be used for few-shot multimodal prompts, chat-like prompts, etc.