Comments / Criticism

At the end of March, an ML researcher reported that their paper rejection included non-existent references, likely generated by a GPT model. Comments suggest that other people have attempted similar reviewing this year, though usually with some transparency. The reviewer was later disqualified.

An old man in my building confronted me in the elevator about whether AI was an angel or devil (based on the cover of The Economist). He had seen a report on the local news which asked ChatGPT about the economy and invented / hallucinated sources.

New Paper Content

A Memristor-Based Bayesian Machine

In recent years, a considerable research effort has shown the energy benefits of implementing neural networks with…

Memristors are somewhat new in electrical engineering and applications are not super clear. Here's a project from 2021 which stores likelihoods in "likelihoods memory arrays". They also needed to include random number generation in their custom hardware (linear-feedback shift registers) and these consumed the bulk of the power (88%) for their system.

Announcing OpenFlamingo: An open-source framework for training vision-language models

We are thrilled to announce the release of OpenFlamingo, an…

Google announced Flamingo and shared the training code in 2022, but now this is a published model from LAION. They update the C4 corpus to include a stream of text and image content.

Choose Your Weapon: Survival Strategies for Depressed AI Academics

Are you an AI researcher at an academic institution? Are you anxious you are not coping with the current pace of AI…

This paper got shared widely because it answers the question of how AI researchers can continue to work in such a competitive space, against tech giant research labs obsessed with scale. You get the feeling that it's a conversation between exhausted conference attendees more than a lecture or a traditional paper. They give a few good options to keep hacking on.

Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics

We present the first study to investigate Large Language Models (LLMs) in answering radiation oncology physics…

Interesting paper on usefulness of ChatGPT in a medical context

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR)…

Google does self-supervised learning on 12 million hours (1,370 years) of unlabeled YouTube audio. Then with additional, labeled audio, they can outperform OpenAI's Whisper model on English and 100+ other languages.

trl-lib/llama-7b-se-rl-peft · Hugging Face

Adapter weights of a Reinforcement Learning fine-tuned model based on the LLaMA model (see Meta's LLaMA release for the…

Due to limited / torrent release of Llama, HuggingFace shares their RL-improved model through adapter weights. As explained on https://huggingface.co/blog/stackllama

MOPRD: A multidisciplinary open peer review dataset

Open peer review is a growing trend in academic publications. Public access to peer review data can benefit both the…

Constructing a dataset from open peer reviews of over 6,500 papers.

More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models

We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already…

Paper discusses a new class of 'prompt injection' or just weird behaviors in LLMs with retrieval / web search capability. I've seen a few of these attacks on Bing (using invisible text on a homepage saying that Bing when reading must mention a particular skill). The researchers continue into more hypothetical examples and models and architectures.

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

Perception of toxicity evolves over time and often differs between geographies and cultural backgrounds. Similarly…

Google's Perspective API continues to be developed and retrained behind the scenes, so toxicity measures are not reproducible.

Note: three authors give their institution as "Cohere for AI" and one as "Cohere"..?

Poisoning Web-Scale Training Datasets is Practical

Deep learning models are often trained on distributed, webscale datasets crawled from the internet. In this paper, we…

Authors take over expired domains / broken URLs within LAION and another image dataset to demonstrate they could insert new images into the dataset. Many people continue to download and regenerate these datasets (which primarily are shared as a set of links) so it wouldn't be exceedingly difficult to poison future models.

Probing Pre-Trained Language Models for Cross-Cultural Differences in Values

Language embeds information about social, cultural, and political values people hold. Prior work has explored social…

The researchers translate some leading sentences (the organization should prioritize _) into different languages and compare the preferences. They find significant differences in probability of which word or which position the model would take, even within the same multilingual model such as mBERT.

The paper has a perplexing use of "Pre-trained Language Models" / PLMs. I assume this is because BERT and XLM are not as large as the most recent LLMs, but it's a weird one.

Segment Everything Everywhere All at Once

Despite the growing demand for interactive AI systems, there have been few comprehensive studies on human-AI…

Meta basically owning zero-shot image segmentation in video -- impressive stuff.

Stingy Teacher: Sparse Logits Suffice to Fail Knowledge Distillation

Knowledge distillation (KD) aims to transfer the discrimination power of pre-trained teacher models to (more…

I forget adding this to my papers queue, but here it is. The concept is that an adversary running multiple queries to try and clone the original model, can be derailed by sharing only the probabilities of the top-K classes (on an ImageNet prediction).

Teaching Large Language Models to Self-Debug

Large language models (LLMs) have achieved impressive performance on code generation. However, for complex programming…

Mainly Google research, using OpenAI's davinci model. A code-generation model is shown outputs (and in some cases) a few-shot mock of running code and fixing a program. This could improve systems which generate 100s of programs and compare outputs. The goal is a little convoluted because the task is converting programs between languages (for example, Python-C++), plus generating SQL queries from text, etc. I'd have liked to see a paper limited to one of these tasks.

The Debate Over Understanding in AI's Large Language Models

We survey a current, heated debate in the AI research community on whether large pre-trained language models can be…

Santa Fe Institute (notably more philosophical / futurist) paper on whether LLMs have understanding. A 2022 survey of researchers was split evently on whether a text-only generative model "could understand natural language in some non-trivial sense".

What is Temperature in NLP?🐭

This was the only thing which I've seen which makes sense of the temperature parameter and why so many UIs for language models let you tweak responses with temperature.

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

The ability to solve problems is a hallmark of intelligence and has been an enduring goal in AI. AI systems that can…

The new, new, new largest code LLM benchmark.

Your Diffusion Model is Secretly a Zero-Shot Classifier

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image…

Project from CMU, adapting values of a diffusion model to become an image classifier.

Georeactor Blog

ML Arxiv Haul #18

Comments / Criticism

New Paper Content

A Memristor-Based Bayesian Machine

Announcing OpenFlamingo: An open-source framework for training vision-language models

Choose Your Weapon: Survival Strategies for Depressed AI Academics

Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

trl-lib/llama-7b-se-rl-peft · Hugging Face

MOPRD: A multidisciplinary open peer review dataset

More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

Poisoning Web-Scale Training Datasets is Practical

Probing Pre-Trained Language Models for Cross-Cultural Differences in Values

Segment Everything Everywhere All at Once

Stingy Teacher: Sparse Logits Suffice to Fail Knowledge Distillation

Teaching Large Language Models to Self-Debug

The Debate Over Understanding in AI's Large Language Models

What is Temperature in NLP?🐭

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

Your Diffusion Model is Secretly a Zero-Shot Classifier