Unlemmy
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Lugh@futurology.todayM to Futurology@futurology.todayEnglish ·
edit-2
6 days ago

More powerful AI models can train weaker AI to be more persuasive, and this may mean weaker AI can be trained for misuse, too.

arxiv.org

external-link
message-square
0
link
fedilink
6
external-link

More powerful AI models can train weaker AI to be more persuasive, and this may mean weaker AI can be trained for misuse, too.

arxiv.org

Lugh@futurology.todayM to Futurology@futurology.todayEnglish ·
edit-2
6 days ago
message-square
0
link
fedilink
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
arxiv.org
external-link
To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, this report presents a comprehensive assessment of their frontier risks. Drawing on the E-T-C analysis (deployment environment, threat source, enabling capability) from the Frontier AI Risk Management Framework (v1.0) (SafeWork-F1-Framework), we identify critical risks in seven areas: cyber offense, biological and chemical risks, persuasion and manipulation, uncontrolled autonomous AI R\&D, strategic deception and scheming, self-replication, and collusion. Guided by the "AI-$45^\circ$ Law," we evaluate these risks using "red lines" (intolerable thresholds) and "yellow lines" (early warning indicators) to define risk zones: green (manageable risk for routine deployment and continuous monitoring), yellow (requiring strengthened mitigations and controlled deployment), and red (necessitating suspension of development and/or deployment). Experimental results show that all recent frontier AI models reside in green and yellow zones, without crossing red lines. Specifically, no evaluated models cross the yellow line for cyber offense or uncontrolled AI R\&D risks. For self-replication, and strategic deception and scheming, most models remain in the green zone, except for certain reasoning models in the yellow zone. In persuasion and manipulation, most models are in the yellow zone due to their effective influence on humans. For biological and chemical risks, we are unable to rule out the possibility of most models residing in the yellow zone, although detailed threat modeling and in-depth assessment are required to make further claims. This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
alert-triangle
You must log in or register to comment.

Futurology@futurology.today

futurology@futurology.today

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !futurology@futurology.today
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 55 users / day
  • 378 users / week
  • 1.48K users / month
  • 3.07K users / 6 months
  • 1 local subscriber
  • 3.09K subscribers
  • 250 Posts
  • 1.15K Comments
  • Modlog
  • mods:
  • voidx@futurology.today
  • Lugh@futurology.today
  • Espiritdescali@futurology.today
  • AwesomeLowlander@futurology.today
  • BE: 0.19.11
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org