Georeactor BlogRSS Feed
Cartoon ML - Part 2 - Progress on Multiple Choice
Two months ago I wrote a post about the basics of working with the New Yorker cartoon dataset.
Getting the dataset
The next steps would be seeing if I can get some preliminary results for selecting the correct caption, then seeing if improvements to the model, prompt, or visual context information improves that score.
cartoons = load_dataset("jmhessel/newyorker_caption_contest", "matching") for cartoon in cartoons['train']: print('At ' + cartoon['image_location'] + ' : ' + cartoon['image_description']) print(cartoon['caption_choices']) break
The dataset's "matching" mode gives us a location, the human-written description of the image, and five possible captions. The label is given as a letter (A, B, C, D, E) which either needs to be converted to a number or a matching sentence.
Here's what I extracted from the first cartoon:
At a bar : Two priests and a rabbi are walking into a bar, as the bartender and another patron look on. The bartender talks on the phone while looking skeptically at the incoming crew. ['Tell me about your childhood very quickly.', "Believe me . . . it's what's UNDER the ground that's most interesting.", "Stop me if you've heard this one.", 'I have trouble saying no.', 'Yes, I see the train but I think we can beat it.']
Choosing a framework
From some Googling, I found that HuggingFace Transformers and Lightning Transformers have multiple-choice examples. To my dismay, unlike other tasks which have a built-in pipeline to insert dataset, model, etc, these example codes make multiple-choice questions seem like a weird hack rather than a common task, would not run correctly on CPU, etc.
I modified the HuggingFace example to accept a sample of my dataset and start training (with a batch size of 1, to ignore an issue with mismatched length of data). The
AutoModelForMultipleChoice class does not accept GPT-style models, so I followed their example's choice of
From this experience, I found how to format this dataset but I needed an up-to-date, GPT-friendly multiple choice runner. I remembered two systems - the BIG-Bench multiple choice runner, and EleutherAI's evaluation harness, which comes with docs for how to reformat my multiple choice dataset as a task. I see that as my next step.
Getting closer to new Universal New Yorker Captions
Meanwhile, I asked Bard and ChatGPT about New Yorker captions. Bard would not engage. After I asked ChatGPT if it was familiar with the 'universal' captions, it created only specific captions, including:
"I can't believe I forgot my glasses again." "Well, that's the last time I take fashion advice from a scarecrow."
When I criticized these, and gave two 'official' universal captions as examples, ChatGPT responded with these:
"I should have read the fine print." "Well, that didn't go as planned." "I've reached the peak of my multitasking abilities." "Note to self: Avoid Mondays at all costs." "I never thought I'd miss dial-up internet until now." "Looks like my shortcut turned into a detour." "The hazards of overthinking in 3... 2... 1..." "Sometimes the best response is just a blank stare." "Who needs luck when you have duct tape?" "I never imagined my life would become a series of browser tabs."
Of these, I think the 'fine print' and 'multitasking' ones have potential for being humorous as universal captions, with 'didn't go as planned' and 'blank stare' as honorable mentions for being New Yorker-y.