GPT-4 aced most of them,including reading comprehension, mathematics and coding,OpenAI reported4.
the Turing test the imitation game"Can machines think’human judges, to evaluate performance on specific capabilities, such as language ability, common-sense reasoning and mathematical capacity. Increasingly, teams are also turning to academic and professional examinations designed for people.
It’s the kind of game that researchers familiar with LLMs could probably still win, however. Chollet says he’d find it easy to detect an LLM — by taking advantage of known weaknesses of the systems. “If you put me in a situation where you asked me, ‘Am I chatting to an LLM right now?’ I would definitely be able to tell you,” says Chollet.
The key, he says, is to take the LLM outside of its comfort zone. He suggests presenting it with scenarios that are variations on ones the LLM will have seen a lot in its training data. In many cases, the LLM answers by spitting out words that are most likely to be associated with the original question in its training data, rather than by giving the correct answer to the new scenario.
**
The company also set GPT-4 around 30 exams, including: various subject-specific tests designed for US high-school students, known as Advanced Placement; an exam to assess the current state of US physicians’ clinical knowledge; and a standard test used in the selection process for US graduate studies, called the GRE. In the Uniform Bar Exam, which forms part of the qualification process for lawyers in many US states, GPT-4 attained a score that would place it in the top 10% of people, OpenAI reported (see ‘AI system performance — selected results’).
The world’s best artificial intelligence (AI) systems can pass tough exams, write convincingly human essays and chat so fluently that many find their output indistinguishable from people’s. What can’t they do? Solve simple visual logic puzzles.
In a test consisting of a series of brightly coloured blocks arranged on a screen, most people can spot the connecting patterns. But GPT-4, the most advanced version of the AI system behind the chatbot ChatGPT and the search engine Bing, gets barely one-third of the puzzles right in one category of patterns and as little as 3% correct in another, according to a report by researchers this May1.
The team found that among scientists who had published only one paper in English, those from countries with generally low English proficiency spent a median of 29.8% more time writing it than did native speakers; those from countries with moderate English proficiency spent a median of 50.6% more time. Similarly, the researchers found that those from countries with generally low English proficiency spend a median of 90.8% more time reading scientific articles than do native speakers. They also learnt that non-native speakers spend more time preparing to give oral presentations at international conferences, and that many avoid this type of commitment owing to language barriers.
Amano, who is Japanese, says he has always struggled to communicate in English. After many years working in the United Kingdom and Australia, his English is improving, and people might think his papers are similar to those written by a native English speaker. “But behind the scenes, I have to spend so much time to reach that level,” he says. That extra effort is exactly what he wanted to quantify in this study.
Heightened rejection
Amano and his colleagues also examined the peer-review process. Non-native English speakers reported having their papers rejected specifically because of writing issues 2.5 times as often as native speakers. That sounds familiar to Lina Pérez-Angel, a Colombian palaeoclimatologist at Brown University in Providence, Rhode Island. “I have had reviewers that explicitly said that my English puts in doubt the quality of the research, or mostly gave me feedback on my English in a harsh way that made me think it was based on my Latinx/Hispanic-sounding last name,” she says.
Conferences could consider allowing researchers to present in their native language, using a translator, and could publish abstracts in multiple languages. “Non-native English speakers constitute almost 95% of the world’s population,” Amano says. “If we don’t support those 95%, I’m sure we can’t solve many global challenges.”