These AI models reason better than their open-source peers - but still can't rival humans

1 month ago 22

BOOK THIS SPACE FOR AD

ARTICLE AD

Yaroslav Kushta/Getty Images

Can artificial intelligence (AI) pass cognitive puzzles designed for human IQ tests? The results were mixed.

Researchers from the USC Viterbi School of Engineering Information Sciences Institute (ISI) investigated whether multi-modal large language models (MLLMs) can solve abstract visual tests usually reserved for humans.

Also: The best AI chatbots: ChatGPT, Copilot, and worthy alternatives

Presented at the Conference on Language Modeling (COLM 2024) in Philadelphia last week, the research tested "the nonverbal abstract reasoning abilities of open-source and closed-source MLLMs" by seeing if image-processing models could go a step further and demonstrate reasoning skills when presented with visual puzzles.

"For example, if you see a yellow circle turning into a blue triangle, can the model apply the same pattern in a different scenario?" explained Kian Ahrabian, a research assistant on the project, according to Neuroscience News. This task requires the model to use visual perception and logical reasoning similar to how humans think, making it a more complex challenge.

The researchers tested 24 different MLLMs on puzzles developed from Raven's Progressive Matrices, a standard type of abstract reasoning -- and the AI models didn't exactly succeed.

"They were really bad. They couldn't get anything out of it," Ahrabian said. The models struggled both to understand the visuals and to interpret patterns.

However, the results varied. Overall, the study found that open-source models had more difficulty with visual reasoning puzzles than closed-source models like GPT-4V, though those still didn't rival human cognitive abilities. The researchers were able to help some models perform better using a technique called Chain of Thought prompting, which guides the model step-by-step through the reasoning portion of the test.

Also: Open-source AI definition finally gets its first release candidate - and a compromise

Closed-source models are thought to perform better in tests like these due to being specially developed, trained with bigger datasets, and having the advantages of private companies' computing power. "Specifically, GPT-4V was relatively good at reasoning, but it's far from perfect," Ahrabian noted.

"We still have such a limited understanding of what new AI models can do, and until we understand these limitations, we can't make AI better, safer, and more useful," said Jay Pujara, research associate professor and author. "This paper helps fill in a missing piece of the story of where AI struggles."

Also: AI can now solve reCAPTCHA tests as accurately as you can

By finding the weaknesses in AI models' ability to reason, research like this can help direct efforts to flesh out those skills down the line -- the goal being to achieve human-level logic. But don't worry: For the time being, they're not comparable to human cognition.

Read Entire Article

LEFT SIDEBAR AD

These AI models reason better than their open-source peers - but still can't rival humans

BOOK THIS SPACE FOR AD

Related

This is the best car diagnostic tool I've ever used, and it's only $54 in this Black Friday deal

The most durable power station I've tested dares Mother Nature to do her worst (and save big in this Black Friday deal)

This 2 TB Samsung 990 Pro M.2 SSD is on sale for $160 this Black Friday

This 12-in-1 Thunderbolt dock has a surprise power feature for Windows users (get 20% in this Black Friday deal)

My favorite power bank for traveling is waterproof and surprisingly lightweight (and it's cheaper t

DJI's first portable power stations are packed with features - plus a little something extra (and prices are slashed in this Black Friday deal!)

Trending

Popular

Install waybackurls on Kali Linux

1-click RCE in Electron Applications

Microsoft Office Professional Plus 2019 (x64 & x86) Multilingual + Pre-Activated

Over 40 Apps With More Than 100 Million Installs Found Leaking AWS Keys

Install DalFox on Kali Linux

Adobe Master Collection CC 2022 v25.08.2022 (x64) Multilingual Pre-Activated

Maxon CINEMA 4D Studio S22.123 (x64) Multilingual + Crack

Autodesk Revit 2023 R1 Build 23.0.11.19 (x64) Multilingual + Crack

‘We are not motivated by profits’ – Open Bug Bounty maintainers on finding a niche in the crowdsourced AppSec market

Just Gopher It: Escalating a Blind SSRF to RCE for $15k

BOOK THIS SPACE FOR AD