On the AI benchmarking

category: residue [glöplog]

The real test has to be Commodore 64 optimization, when it suggests illegal opcodes and hardware hacks to bypass VIC-II limits or save a single raster interrupt, that's when i call it "intelligent".

added on the 2026-02-18 22:21:50 by rudi

Even intelligence in humans is poorly defined, so good luck with that.

added on the 2026-02-19 00:26:22 by fizzer

Oh I call it “extremely intelligent” when it fails to understand that an upside down cup is in fact a cup turned upside down.

Quote:

https://youtube.com/shorts/3fYiLXVfPa4?si=x11C4qxSSPYzbEu3

added on the 2026-02-19 06:41:39 by 4gentE

Additionally, I call myself “extremely intelligent” when I fail to mark a link as a link and mark it as a quote instead. Sorry.

https://youtube.com/shorts/3fYiLXVfPa4?si=x11C4qxSSPYzbEu3

added on the 2026-02-19 06:44:13 by 4gentE

I think what we typically recognise as intelligence in a broader sense requires things like self-determination, with adaptable agendas and motivations. Possibly even feelings. Like fizzer wrote, it's a poorly understood concept, and before we understand exactly what we mean with "intelligence", it's pointless to try and construct machines that exhibit it.

It's like programming; if you don't understand the problem you are trying to solve, your code is just going to be gibberish.

added on the 2026-02-19 06:58:45 by Radiant

Quote:

Possibly even feelings.

...and a body. With expiration date. And knowledge/understanding of it having an expiration date. Or without that knowledge/understanding if we aim for animal "no consciousness" intelligence. But yeah. "What is intelligence" is a very VERY legitimate question. Problem today is that most of the time this otherwise legitimate question is being (ab)used by techbros and AI groupies that just want to muddy the water, lower the bar for "intelligence". You know, in the way the "free speech" is being (ab)used for years and years now as a temporary means to ultimately promote hate and repression. It's the zeitgeist.

added on the 2026-02-19 07:49:00 by 4gentE

If it is trained on our conversations about illegal opcodes and hardware hacks, it will still grab the relevant stuff from the training data and give an output that resembles a deep understanding of these techniques. But I sense it's always an illusion.

I had moments I asked it some things just for test, maybe to explain me some oldschool code or write something simple for retro platforms in either C or assembly. There was output that surprised me and momentarily made me think "wow, it even mentioned that niche optimization trick I heard from other coders". But I know it just found it in training data, never discovered it from scratch.

Also, at other moments the hallucinations are so funny, z80 code, few lines will be instructions that basically don't exist, unless it got them from later version of z80. I saw a mul a,b but I think e-z80 has such a mul. But then I saw ld hl,de, add bc,bc xor de,de and I asked about it and it made excuses that "Oh no, that's just pseudo-code of certain assemblers!"

added on the 2026-02-19 10:18:28 by Optimus

You can pass for intelligent by just repeating things other people say.
or copy/pasting links that other people have already copied/pasted.

better looksmaxxxing nowadays…

added on the 2026-02-19 11:48:04 by 🎀𝓀𝒶𝓃𝑒𝑒𝓁🎀

There are multiple aspects of intelligence, and yes, AI (so far) is not good in all of them, the same way humans are not good at certain things (computations in general).

The big challenge moving forward is to reduce the amount of "shortcuts" AI is doing. It's a huge problem, as you never know how the model really arrives at the conclusion. Usual question is, is it really applying fundamental reasoning principles or just finds a frequently occuring pattern that may fail in unseen scenario. This was big problem early on when everything was trained on the "human brain output" rather than step by step reasoning (which you don't have data for available at large). But guess what...it is well known problem and it is going to improve (yay!).

added on the 2026-02-19 14:10:34 by tomkh

proof that you can be smart enough to understand that sort of crap but really bad at taking a social hint that your presence is a nuisance and your creations are mediocre.

added on the 2026-02-19 17:00:30 by 🎀𝓀𝒶𝓃𝑒𝑒𝓁🎀

kaneel: what you said is not particularly nice, is it. Maybe you should care more about your own creations.

added on the 2026-02-19 17:25:33 by tomkh

my own mediocrity does not make want to work for the commodification of everything that I like doing, I still wake up, train for things I like to do, I still write my own music, I still play instruments, I still create… it's hard, but I work on it. You thought work for the people who are commodifying the shit out of everything we love, you're working with the people who will send us back to the fields for the sake of productivity, you're human trash to me.

And I don't need to be nice, not to things like you.

added on the 2026-02-19 18:00:51 by 🎀𝓀𝒶𝓃𝑒𝑒𝓁🎀

Yup, so we have it. If you are remotely considering using AI for anything or maybe even doing research on AI related topics you are automatically "bad person", enemy of the people etc... I guess, I'm not surprised here.

This is like the next-level of polarization - left vs right, AI haters vs AI enthusiasts..we are all gonna kill each other one day, while the true enemy (people who truly benefit all of this) quietly stay behind the scenes doing completely immoral things (rape islands and such). Just great!

added on the 2026-02-19 18:13:01 by tomkh

Don't be a snowflake about it though, it's getting old fast.

added on the 2026-02-19 18:24:52 by 🎀𝓀𝒶𝓃𝑒𝑒𝓁🎀

"you're human trash to me" "Don't be a snowflake about it"

Whatever, dude.

added on the 2026-02-19 18:26:23 by tomkh

It feels like this has devolved into a "with us" or "against us" thing, but it should be possible to hold two thoughts at once. I think we can acknowledge it for what it is, a massive pattern-matching engine, without necessarily surrendering our "soul" or craft into it. To Optimus' point: sure, it might just be spitting back illegal opcodes it found in an old coodebase, but that doesn't mean it can't be a useful rubber duck for brainstorming.

It's like the difference between a tool and the intent behind. You can be a purist who loves the frit of manual assembly and hardware tricks, while also using AI to scan for tedious boiler plate. Using a tool doesn't mean you support the commodification of everything, it just means you're using the available tech to get to the fun part faster.

added on the 2026-02-19 22:11:29 by rudi

rudi: thanks for trying, but it's over. I doubt there is any hope for civilized discussion here anymore.

added on the 2026-02-19 22:29:17 by tomkh

AI for duck debugging is good for me, I was never against it and might do sometimes.
Also asking about how some freakin API or language feature works.
And things that when I normal google them I can't find shit. Seriously, search engines are horrible these days.

added on the 2026-02-20 10:43:59 by Optimus

@tomkh:

Quote:

The big challenge moving forward is to reduce the amount of "shortcuts" AI is doing. It's a huge problem, as you never know how the model really arrives at the conclusion.

The shortcut problem you're describing is essentially the battle against computational irreducibility, which mathematically can never be fully solved. It's rooted in the Halting Problem, proven in 1936 by Alan Turing himself, showing that the behavior of Turing-complete systems is fundamentally undecidable. We can't predict the outcome without running the full computation.

That is the scary part for AI. Because we can't mathematically prove the model's logic beforehand, the best we can hope for is to find pockets of computational reducilibity within the networks. But in the broader scenario, observation is the only truth we have, and everyone ultimately has to rely on empirical "faith" that the black box will output the right answer.

added on the 2026-02-20 13:42:45 by rudi

Quote:

The shortcut problem you're describing is essentially the battle against computational irreducibility

Yes, I'm very much aware of computational irreducibility.

This is exactly why you cannot train the model on the reasoning outcomes alone and you need step by step reasoning examples. When you "reason" you simulate the process, but you cannot infer reasoning patterns in general just by looking at the outcomes.

This is a limitations of all intelligent beings. Even SGI won't be able to solve all mathematical problems in "one shot". In fact, in many cases raw computational power and classical algorithms is the only option there is.

added on the 2026-02-20 13:56:35 by tomkh

@tomkh: you hit the nail on the head there. If a problem is decidable, an SGI might find a brilliant reduced shortcut, but for formally undecidable problems, no amount of superintelligence can do it. For example Conway showed generalized version of 3x+1 problem are Turing complete because they are capable of universal computation, and therefore undecidable and equivalent to the Halting problem.

If the SGI is ever able to tell us which problems are decidable or undecidable, we can use classic rigid logic to verify the result.

added on the 2026-02-20 14:50:28 by rudi

rudi: I would be careful with mixing decidability here. The undecidable problems cannot be solved by any computation process in general.

Also even if generalized 3x+1 is undecidable, that doesn't mean 3x+1 is (I think the status is open for this one).

Those are also areas that humans struggle the most, so AGI would probably struggle here as well.

What is super interesting to me is finding proofs automatically. And this is related to halting problem in various ways. For example, there may exist a "simple" proof that Collatz (3x+1) halts for every input, but 1) we don't know how long is the proof, 2) we don't know how long we have to wait in each case (when it reaches 1).

So what humans actually do in this case? They actually just shot in the dark, until they will find a sequence of logical steps that is considered a proof. Now, AI can be used here various ways: 1) to mine quickly through prior knowledge, 2) to "intelligently" develop so called automated proof heuristic and try them out at much larger speed/scale than human-alone can.

added on the 2026-02-20 17:47:47 by tomkh

Quote:

my own mediocrity does not make want to work for the commodification of everything that I like doing, I still wake up, train for things I like to do, I still write my own music, I still play instruments, I still create… it's hard, but I work on it. You thought work for the people who are commodifying the shit out of everything we love, you're working with the people who will send us back to the fields for the sake of productivity, you're human trash to me.

And I don't need to be nice, not to things like you.

Is it possible to do those things, like wake up, train for things, write own code, do own physical art, and still see AI as nothing more than a glorified search engine like Google but bolted on image/audio generation which recycles content? And have no real emotional attachment one way or the other on the whole subject?

Or is there only 2 possible takes, and only one take is good. And the 3rd option, being completely ambivalent, is bad (because apparently being indifferent is allowing the current trends to happen, thus an pro- stance)?

Absolutely serious question. I am absolutely curious if such takes are even allowed in the Scene.

added on the 2026-03-03 20:11:30 by mudlord

ML!: I think it's quite complex and nuanced topic.

On the political side: unfortunately, dialogue is very polarized, so it's it seems you can be either fully against AI or otherwise, you will be immediately considered a supporter or even a promoter (as we can see this happening on pouet rn). This is good for nothing, because most people will just back off and prefer not to say anything at all.

Now, if AI can be seen as glorified search engine - I think not. The legal debate revolves around question whether or not model training on unlicensed data is fair use. There is obviously a lot of politics and money involved - so it should be taken with a grain of salt.

However, IMHO the truth is somewhere in-between. Those models indeed do not store exact images. They, in a very simplified view, store latent embeddings of small image patches (texture, shape, etc..), which encodes their visual properties and meaning, plus relationships between them (via attention mechanism for example in multi-modal models). So the model becomes a universal formula for image generation, rather than a database. That's why you can generate images (or music) based on your hand-made references and it will produce an absolutely novel piece out of it.

The moral issue is in order to train universal formula that gives good results, you need as many examples as you can get, so you scrape everything you can. You can also generate synthetic examples, and they do (mostly for augmentation), but it will never learn to make photorealistic images this way.

Finally, what do you allow any experimentation with AI in demoscene context is really up to you really. Personally, I don't care about opinions of so called "demoscene elites", but I can see many people do care. After all everyone wants to be accepted by their peers (to some extent).

added on the 2026-03-05 11:58:59 by tomkh

Quote:

Personally, I don't care about opinions of so called "demoscene elites", but I can see many people do care. After all everyone wants to be accepted by their peers (to some extent).

The reality is they still have influence. A lot of influence. And its shown.

Its one of the reasons why I stopped doing demos for 5+ years. I felt if they didn't like them, there is absolutely zero point to doing any form of content, since it has to measure up to their standards. You already have that with the up/down votes on many places, which to me is not what art is about. And it controls how your art is seen here. (less votes == less exposure)

Its, if you are savvy enough, easy enough to farm votes/glops, but to me, is that then really what I want when I just want to express myself? I'd rather just take myself elsewhere if the demoscene is really only about competition/dickwaving/popularity.

I want honestly to just create content that I myself enjoy, and that if other people enjoy it, is just a bonus. Its just sad that in The Scene, art is commodified. I just want to express myself with no real bounds of what I do.

added on the 2026-03-06 18:25:33 by mudlord

pouët.net

On the AI benchmarking

login