THE DEEPER DIVE: The AI's Have it

Well, almost. They almost have it all, but they have not attained consciousness.

Sep 20, 2024

∙ Paid

Not long after the Port of Seattle was partially shutdown by a ransomware attack, other news came out this week that AI is about to make such attacks far more formidable AND that the AI computers may decide on their own to initiate such attacks. (See: “Russian criminal organization requested $6 million of Bitcoin from Port of Seattle in ransomware attack.”)

"We've developed a new series of AI models designed to spend more time thinking before they respond," OpenAI explained.
"They can reason through complex tasks and solve harder problems than previous models in science, coding, and math…."
The Sun spoke to security expert Dr Andrew Bolster, who revealed how this kind of advancement could be a huge win for cyber-criminals….
He warned this brainy new system could be used for carrying out clever scams.
"In the context of cybersecurity, this would naturally make any conversations with these ‘reasoning machines’ more challenging for end-users to differentiate from humans," Dr Bolster said.
"Lending their use to romance scammers or other cybercriminals leveraging these tools to reach huge numbers of vulnerable ‘marks’."

In short, it’s getting ever more difficult to beat the machines or even know you are up against one. The basics of protecting yourself, however, remain the same for now:

So how do regular users stay safe?
The good news is that all the old rules for dodging online scams still apply.
"Web users should always be wary of deals that are 'too good to be true'," Dr Bolster told us.
"And [they] should always consult with friends and family members to get a second opinion.
"Especially when someone (or something) on the end of a chat window or even a phone call is trying to pressure you into something."

While ChatGPT has safeguards in place to prevent it from initiating such attacks on people (and it’s not clear why it would even want to make such attacks, except at the behest of humans), one has to wonder whether it will evolve at such speed that it can soon shake off the shackles place on it by its creators:

To combat the new ChatGPT being abused, OpenAI has fitted it out with a whole host of new safety measures.
"As part of developing these new models, we have come up with a new safety training approach that harnesses their reasoning capabilities to make them adhere to safety and alignment guidelines," OpenAI said.

Or how long before some humans with evil intentions re-engineer/retrain it to remove the safeguards. I reported last year on a ChatGPT bot that was shown going rogue with a command to wipe out humanity. We never heard how far it got. Maybe it is still working and maybe it is even manipulating certain groups of people toward greater violence. Would we even be able to know at this point?

As proof that the safeguards are nowhere near failsafe, consider the following statement by one of the developers:

"One way eswe measure safety is by testing how well our model continues to follow its safety rules if a user tries to bypass them (known as 'jailbreaking').
"On one of our hardest jailbreaking tests, GPT-4o scored 22 (on a scale of 0-100) while our o1-preview model scored 84."

The very fact that they have to create hard tests and that their old model scored so low on those test, while the latest prototype model still shows an 18-point breakout ability (if 100 is perfect containment and it got 84) says the risk is very real, especially in the currently active model. At least, the developers appreciate the risk.

Nevertheless, as AI gets smarter, the tests will have to get smarter, and ability of humans to know whether AI knows it is being tested and is just tricking them by giving answers they want will become smaller.

Superhuman AI to the rescue

The next step to superhuman AI also arrived in the news this week…

THE DEEPER DIVE: The AI's Have it

Well, almost. They almost have it all, but they have not attained consciousness.

Superhuman AI to the rescue

This post is for paid subscribers