GPT-5.5: Mythos-Like Hacking, Open to All

JellyYelly · 2026-04-24T00:17:28 1776989848

They say its mythos like, without actually comparing it to Mythos (fair enough, it's not public) but the bar for a model to be mythos-like has to be that you can produce as many novel and high severity security vulns outlined in the Mythos redteam blog. I haven't seen any other lab produce a report like that yet. The proof is in the pudding.

WhiteDawn · 2026-04-23T23:18:09 1776986289

First you need to get through the safety net. I’ve had many productive gpt5.4 sessions hit a roadblock of “ethicality” and pollute the context with multiple rounds of trying to convince it to continue

nsingh2 · 2026-04-23T21:51:16 1776981076

These plots are terrible. Why is categorical data connected across categories with lines? Why not just use bar plots?

Like in the "Web Vulns in OSS" plot, white box data for Opus 4.7 is not available, but the absurd linear interpolation across categories implies it should be near 60.

scottyah · 2026-04-23T22:14:53 1776982493

It's just an ad thinly disguised as useful data.

wmf · 2026-04-23T22:31:15 1776983475

I think the x axis is meant to be time but they screwed it up.

mertcikla · 2026-04-23T22:34:58 1776983698

why does this read like an openai ad?

kibibu · 2026-04-24T01:55:12 1776995712

> GPT-5.5 doesn’t just improve — it pulls away

I think it's also self-aggrandizing.

strange_quark · 2026-04-23T22:15:40 1776982540

Wasn't it already confirmed that small open-weight models were able to detect most of the same headline vulns as mythos? How is this any different?

stanfordkid · 2026-04-23T22:40:33 1776984033

No, they are able to detect errors when pointed at them but they have a lot of false positives... making them functionally useless for a large unknown codebase. They also can't build and run an exploit post-identification. Mythos can find vulnerabilities (purportedly) and actually validate them by building and running exploits. This makes it functional and usable for hacking.

dlahoda · 2026-04-24T01:54:43 1776995683

i casually asked gemini and codex 200usd subs to find and verify bugs for weeks. it did wrote tests, injected mutations, verified fixes. just promts.

also i had to proxy remote mainnet with localhost to force them to do penetration and dos testing.

mythos is nothing new.

nardons · 2026-04-23T22:38:15 1776983895

Do you have a source for this? Not doubting it, but I would like to have something concrete the next time the Mythos horse manure is cited.

skirmish · 2026-04-23T23:59:03 1776988743

Probably this: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...

WalterGR · 2026-04-24T06:03:33 1777010613

Discussion:

https://news.ycombinator.com/item?id=47732020

“Small models also found the vulnerabilities that Mythos found” (aisle.com)

1,283 points | 12 days ago | 360 comments