Perspective on Risk - August 3, 2024 (AI & LLM Stuff)

Recent Quotes; Overcoming Algorithm Aversion; LLMs May Have A ‘Theory Of Mind’; Transcendence; How Close Are We To AGI? Economics Of AI

Brian Peters

Aug 03, 2024

LLMs are scaling every 6 months, nowhere near the point of diminishing marginal returns, and becoming vastly more efficient.
Nevertheless, the amount of power these models need is doubling every 3 months.
Use of Generative AI seems to be overcoming ‘algorithm aversion.’
LLMs are exhibiting the human ability to reason about multiple mental and emotional states in a recursive manner, appear to surpass the abilities of the experts generating its data, and can reflect moral judgments with high accuracy.
A properly constructed AI model is way better at math than you - Google’s can achieve silver metal level performance at the International Mathematical Olympiad.1
AI Companions Reduce Loneliness2
ChatGPT is now as funny as The Onion3
GPT-4o is rated as more moral, trustworthy, thoughtful, and correct than that of the popular The New York Times advice column, The Ethicist.4
As predicted by Kurzweil in his 1999 book "The Age of Spiritual Machines" , we are very close to achieving artificial general intelligence.
There is some question on the economic effects; the macroeconomic effects appear nontrivial but modest—no more than a 0.66% increase in total factor productivity (TFP) over 10 years.

What will you do when the marginal cost of intelligence approaches zero?

Recent Quotes About Artificial Intelligence

From the Microsoft Build conference:

You could say Moore's Law was probably, you know, more stable in the sense that it was scaling at maybe 15 months, 18 months. We now have these things that are scaling every six months or doubling every six months. (Satya Nadella)

… we are nowhere near the point of diminishing marginal returns on how powerful we can make AI models as we increase the scale of compute. (Kevin Scott, CTO)

So I wanted to also just remind folks like this efficiency point is, is real. So while we're off … building bigger supercomputers to get the next big models out … we're also grinding away on making the current generation of models much, much more efficient. So between the launch of of GPT 4, which is not quite a year and a half ago, it's 12 times cheaper to make a call to GPT 4o than the original … GPT 4 model and it's also six times faster… (Kevin Scott, CTO)

Overcoming Algorithm Aversion

A thought that has stuck with me for many years was best expressed by Peter Hancock when discussing models. Paraphrasing:

If you don’t have an explicit model, you are relying on an implicit model in your mind.

An Anatomy of Algorithm Aversion (Sunstein, Gaff)

People are said to show "algorithm aversion" when (1) they prefer human forecasters or decision-makers to algorithms even though (2) algorithms generally outperform people (in forecasting accuracy and/or optimal decision-making in furtherance of a specified goal).
Algorithm aversion also has "softer" forms, as when people prefer human forecasters or decision-makers to algorithms in the abstract, without having clear evidence about comparative performance.
Algorithm aversion is a product of diverse mechanisms, including (1) a desire for agency; (2) a negative moral or emotional reaction to judgment by algorithms; (3) a belief that certain human experts have unique knowledge, unlikely to be held or used by algorithms; (4) ignorance about why algorithms perform well; and (5) asymmetrical forgiveness, or a larger negative reaction to algorithmic error than to human error.

LLMs May Have A ‘Theory Of Mind’

LLMs achieve adult human performance on higher-order theory of mind tasks

This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). …
We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. …

Transcendence

The “Wisdom Of Crowds” Of LLM Models.

Generative Artificial Intelligence and Evaluating Strategic Decisions

… We investigate the potential role of generative AI in evaluating strategic alternatives. Using a sample of 60 business models, we examine the extent to which business model rankings made by large language models (LLMs) agree with those of human experts. …
In pairwise comparisons of business models, we find that generative AI often produces evaluations that are inconsistent and biased. However, when aggregating all evaluations, AI rankings tend to resemble those of human experts.

Models Can Outperform The Experts That Train Them

Transcendence: Generative Models Can Outperform The Experts That Train Them

Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outperform the humans on their original objectives.
In this work, we study the phenomenon of transcendence: when a generative model achieves capabilities that surpass the abilities of the experts generating its data. We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset. …

This seems related to the ‘wisdom of crowds’ of models above, but maybe there is something more. So I did what I do these days, I asked ChatGPT 4o, Claude and Gemini to compare the papers.5

Both papers focus on generative AI, but Zhang et al. (2024) explore a broader range of generative models, while Doshi et al. (2024) specifically focus on LLMs. (Gemini)
Both studies highlight the importance of input configurations. Zhang et al. use low-temperature sampling to achieve transcendence, whereas Doshi et al. employ chain-of-thought prompting to enhance AI evaluations. (ChatGPT 4o)
The chess paper provides a theoretical framework for why transcendence occurs, while the business paper is more empirical in its approach. … The chess paper is more focused on understanding the phenomenon, while the business paper suggests practical applications for strategic decision-making. (Claude)

Getting AI To Use ‘System 2’ Thinking

Distilling System 2 into System 1

How Close Are We To AGI?

Answer: Pretty damn close

We have machines now that we can basically talk to like humans. It’s a remarkable testament to the human capacity to adjust that this seems normal, that we’ve become inured to the pace of progress.

AGI, or artificial general intelligence, is the holy grail of the field. Ray Kurzweil, futurist, has predicted we will develop AGI by 2029. This paper, From GPT-4 to AGI: Counting the OOMs (Situational Awareness), is one of the best at reviewing the recent history and predicting the near future.

AGI by 2027 is strikingly plausible. GPT-2 to GPT-4 took us from ~preschooler to ~smart high-schooler abilities in 4 years. Tracing trendlines in compute (~0.5 orders of magnitude or OOMs/year), algorithmic efficiencies (~0.5 OOMs/year), and “unhobbling” gains (from chatbot to agent), we should expect another preschooler-to-high-schooler-sized qualitative jump by 2027.

Thousands of AI Authors on the Future of AI

In the largest survey of its kind, we surveyed 2,778 researchers who had published in top-tier artificial intelligence (AI) venues, asking for their predictions on the pace of AI progress and the nature and impacts of advanced AI systems.
The aggregate forecasts give at least a 50% chance of AI systems achieving several milestones by 2028, including autonomously constructing a payment processing site from scratch, creating a song indistinguishable from a new song by a popular musician, and autonomously downloading and fine-tuning a large language model.
If science continues undisrupted, the chance of unaided machines outperforming humans in every possible task was estimated at 10% by 2027, and 50% by 2047. The latter estimate is 13 years earlier than that reached in a similar survey we conducted only one year earlier [Grace et al., 2022].

Economics Of AI

The Simple Macroeconomics of AI (hat tip to Richard O’Brien for forwarding)

This paper evaluates claims about large macroeconomic implications of new advances in AI. …
So long as AI’s microeconomic effects are driven by cost savings/productivity improvements at the task level, its macroeconomic consequences will be given by a version of Hulten’s theorem: GDP and aggregate productivity gains can be estimated by what fraction of tasks are impacted and average task-level cost savings. Using existing estimates on exposure to AI and productivity improvements at the task level, these macroeconomic effects appear nontrivial but modest—no more than a 0.66% increase in total factor productivity (TFP) over 10 years.
… predicted TFP gains over the next 10 years are even more modest and are predicted to be less than 0.53%. …
Empirically, I find that AI advances are unlikely to increase inequality as much as previous automation technologies because their impact is more equally distributed across demographic groups, but there is also no evidence that AI will reduce labor income inequality. Instead, AI is predicted to widen the gap between capital and labor income.

A layman’s article in the FT Daron Acemoglu is not having all this AI hype describes the findings thusly:

A year ago Goldman Sachs economists estimated that AI would increase annual global GDP by 7 per cent over 10 years — or almost $7tn in dollar terms. Since then Goldman’s forecast has become almost sober, with even the IMF predicting that AI “has the potential to reshape the global economy”. FTAV’s personal favourite is ARK’s forecast that AI will help the global GDP growth accelerate to 7 per cent a year.

Professor Acemoglu — a probable future Nobel Memorial laureate — is taking the other side.
… my calculations suggest that the GDP boost within the next 10 years should also be modest, in the range of 0.93% − 1.16% over 10 years in total, provided that the investment increase resulting from AI is modest, and in the range of 1.4%−1.56% in total, if there is a large investment boom.

All Depends On Your Definition Of “Soon”

AI in Finance

Research Review | 18 July 2024 | Artificial Intelligence and Finance

Six papers on AI and LLMs in finance applications. Highlights:

… ChatGPT-4o’s performance is generally comparable to traditional statistical software like Stata, though some errors and discrepancies arise due to differences in implementation.

… integrating ML into portfolio construction is not just an upgrade, but a significant innovation in asset management. This offers precision and efficiency beyond the capabilities of traditional methods, thereby increasing portfolio returns, reducing risk, and improving efficiency.

… integrating generative artificial intelligence into financial market forecasts can not only improve the accuracy of forecasts, but also provide powerful data support for financial decision

Siri Is Your Spouse

AI achieves silver-medal standard solving International Mathematical Olympiad problems (Google Deepmind)

AI Companions Reduce Loneliness (HBS)

How funny is ChatGPT? A comparison of human- and A.I.-produced jokes (PLOS One)

To provide a systematic test, we asked ChatGPT 3.5 and laypeople to respond to the same humor prompts (Study 1). We also asked ChatGPT 3.5 to generate humorous satirical headlines in the style of The Onion and compared them to published headlines of the satirical magazine, written by professional comedy writers (Study 2). In both studies, human participants rated the funniness of the human and A.I.-produced responses without being aware of their source. ChatGPT 3.5-produced jokes were rated as equally funny or funnier than human-produced jokes regardless of the comedic task and the expertise of the human comedy writer.

Large Language Models as Moral Experts? GPT-4o Outperforms Expert Ethicist in Providing Moral Guidance (PsyArXiv)

Here is the prompt I used:

I recently found these two papers on artificial intelligence. They both in ways speak to the ability of models to equal of outperform experts. They also both seem related to the "Wisdom of Crowds" effect. Please read both papers and tell me how they relate or differ.

Perspective on Risk