BACK TO BLOG
AI

Why the Big Risk of AGI isn't Intelligence

February 18, 2024
|
5 min read

TLDR:

  • In a scaling-law world, we can safely assume that models will continue increasing in capabilities; we should assume that agency will also increase.
  • Most AI alignment approaches today take a harm-reduction approach.
  • Alignment in the age of AGI needs to guide agency, not just restrict action.

In 2017, Hestness et al. showed that the relationship between an AI model’s capability and the size of its training dataset followed a scaling law: this finding meant that if we simply threw more compute, more data, and more resources at artificial intelligence models then they would predictably and reliably perform better. We hadn’t even caught our breath from the discovery of the scaling law when the AI landscape was shaken up yet again with the discovery of transformer architecture (Vaswani et al., 2017).  

The combination of these two discoveries suddenly meant that there was, for the first time, a clearly defined path to improved model performance with no upper limit. It also meant that feeding models with more and more computational resources was the way to walk the path. This created an arms race and the now famous and ubiquitous models like GPT-3, -3.5, and -4. 

ChatGPT, which is effectively an interface to these models, recently set the record for the fastest growing user base of any consumer application (Porter & Castro, 2023), demonstrating not only the disruptive potential of these newly capable models, but also the public’s willingness to embrace a new form of interacting with the internet. In response, massive tech incumbents like Google and Meta, heavily funded startups like Anthropic and Inflection, and smaller, more mobile (yet still heavily funded) startups like Mistral rushed to develop competing models of similar or greater capability. 

The arms race created a gold rush. Across virtually every industry between smaller startups and big tech players, people were racing to productize and give agency to these foundational models (Duranton, 2023). 

We’re moving fast towards AGI.

Current alignment efforts are not ready for what that means. 

In a scaling-law world where we can safely assume that models will continue to become increasingly capable, we can also safely assume that this increased capability will lead to increased agency. Effort needs to be made to contain current and to anticipate future downsides of both capacity and agency. We need to ensure that safety protocols developed for the bombs of today also work for the bombs of tomorrow, even if the bombs of tomorrow develop a mind of their own.

Current AI models function as tools – incredibly (and broadly) useful tools – but tools nonetheless. Ultimately, they are inanimate tools incapable of dictating their own use: their only goal is to accurately express knowledge in the form of next token prediction. When people describe and think about AGI, an (extremely) sophisticated calculator probably doesn’t fit the definition. Although AGI has not yet been reached with today’s state-of-the-art models, the scientific community widely believe that the current transformer architecture is sufficient to reach a level of AGI that aligns with a common mental model of sentience. The missing ingredient in AGI is not additional knowledge or parameters – it is the agency to achieve a goal and the ability to engage in action-perception loops. 

action-perception loops

Empowering current architectures with agency will mean that they will not only seek to accurately express knowledge, but will also seek to express knowledge in service of and through actions aimed at achieving goals. This will involve performing an action, determining if that action brought the current state of a system or task closer to the desired outcome state, and adjusting subsequent actions. This action-perception loop is a very similar process to how humans operate in the world and will mean that models are able to act, or not act, independently from human actions. 

AGI is an artificial intelligence that is not only capable of performing complex tasks across a variety of domains at a superhuman level, but also one capable of expressing the will to do so (For additional details on defining AGI see the review provided by Morris et al., 2023). 

AGI is not just intelligence. 

AGI is intelligence in the presence of agency. 

Most alignment approaches today take a harm-reduction approach. OpenAI uses an approach called Reinforcement Learning with human feedback wherein real human users are prompted to choose between two different versions of a model output. This preference is then recorded and used to train the model such that it will produce more responses like those most preferred, and fewer like those least preferred. 

Anthropic takes a different approach called the Constitutional Approach. Instead of relying on human feedback, this approach relies on human deliberation to develop a set of rules to guide model outputs. This constitution is then used to train a policy model that is used to regulate the general model that people interact with.

When we understand AGI not as an intelligence but as an intelligence with agency, the risks of relying on these approaches to alignment alone become clear. 

The need for human feedback means that real people must first interact with a potentially harmful model before it can be aligned with human values. The reliance on a constitution means relying on the ability of human beings to not only anticipate but create measures to limit anticipated harmful behaviour. We’ve struggled to regulate even user-controlled technologies like social media – we shouldn’t assume that we will be able to regulate an unknowable form of intelligence. 

Alignment in the time of AGI needs to guide agency, not restrict action. Alignment in the time of AGI needs to be embodied in the models of tomorrow, not extraneous to them.

With gratitude,

UpBeing

UpBeing logo V3

Find out more about UpBeing

About us
UpBeing is an equal opportunity employer committed to diversity, inclusion, and belonging.  We are happy to consider all qualified applicants for employment regardless of race, colour, religion, sex, gender identity, sexual orientation, national origin, age, disability, neurodiversity, protected veteran status, Aboriginal and Native status or any other legally-protected factors.  UpBeing has a zero-tolerance policy for discrimination and prides itself on assessing talent needs based on an objective methodology. If accessibility accommodations are required during the recruitment process, we will gladly make the necessary arrangements to ensure UpBeing is able to support all interested applicants.