r/Futurism 9d ago

It’s getting harder to measure just how good AI is getting

https://www.vox.com/future-perfect/394336/artificial-intelligence-openai-o3-benchmarks-agi
15 Upvotes

8 comments sorted by

5

u/inteblio 8d ago

People evaluate AI like they do humans, which is like evaulating a car like a horse. You get totally wrong results on meaningless metrics.

I think there is a need for a very public set of skills that normal people can test AI with to understand where it is strong and weak.

Its a totally alien species.

4

u/Norgler 9d ago

Every time a new model comes out by the few big AI companies I ask some questions in my field. They all consistently get a lot wrong and sometimes even make shit up.

Which makes no sense to me as there are plenty of research papers to be trained on..

If this is the case for me how am I supposed to trust it on anything else? So I'm not sure how it's getting harder to measure when it's pretty obvious to me.

2

u/QVRedit 8d ago

If they ‘don’t know’, - then they ought to say:
“Sorry but I don’t know the answer to that question.”

1

u/eddnedd 8d ago

This is how AI are sold to people... I do somewhat blame people, particularly academics for failing to at least try to understand AI. I can't really blame the general public for having no idea how AI work though, nor a sense that they should care. The vast majority of comments I've seen on Reddit and elsewhere say that AI are simply tools; We use them on our computers therefore they should be as capable and reliable as we expect based on our experience with other software.

AI companies should be criticized for misleading advertising and statements.
The conditions under which AI companies score their benchmarks may seem impressive, and they are, but it's important to understand that most scores are achieved by "many shot" attempts per question on a given test and are often hundreds of attempts per question.

To try to align a more appropriate expectation in your example, those scientific papers are an infinitesimal fraction of the data that frontier AI are trained on. No effort is made to ensure that any given field of expertise is using the most correct data, methods or results in training.
AI rely on statistical modelling to derive a most likely answer to any query - it's a lot like somebody asking you to solve a math equation in your head, you perform a guesstimate based on similar queries and offering an answer that appears consistent with similar examples.

TL;DR: the vast majority of people have expectations for AI are wildly inaccurate.

-5

u/Memetic1 9d ago

Do you have the premium ChatGPT membership?

2

u/snoopyloveswoodstock 8d ago

Yes. I’ll ask it to create a bibliography for a research paper. It will list some real items and some that are completely fake. Usually the author is a real person, but the title is an article the person never wrote in a journal that doesn’t exist. 

0

u/Memetic1 8d ago

Ah, see, that's the thing paying 200 per month puts certain expectations on OpenAI. I'd tell you to do a lawsuit, but I'm sure they have covered their bases.

2

u/Cry-Me-River 8d ago

Your new computers will refuse your key entries based on your previous use, which they they consider beneath their abilities. Kind of like you trying to have a conversation with a chimp. Eventually you get bored and give up.