Perspective on Risk - March 15, 2025 (AI, LLMs)
Six Months of Exponential Change; State of the Art; The Economic Implications are Profound; What's Next
The Accelerating Transformation: Finance in the Age of Advanced AI Models
Six Months of Exponential Change
It's been about six months since my last AI update, and in the rapidly evolving landscape of AI and large language models (LLMs), that's practically an eternity. As I've noted in previous posts, technology—particularly AI—is one of the "big 3" forces (alongside demographics and globalization) reshaping our financial future. Today, I'll focus on the key developments and implications for finance professionals.
"Deep learning worked, got predictably better with scale, and we dedicated increasing resources to it." - Sam Altman, CEO of OpenAI
The most striking aspect of recent developments is not just the improvement in AI capabilities, but the accelerating pace and scope of that improvement. Let me break down what I see as the most significant changes for finance professionals.
We’re Closer to AGI Than You Think—What Happens Next?
Artificial General Intelligence (AGI) has long been a distant horizon—always "decades away." But if recent AI advancements are any indication, that horizon is rapidly approaching. The leap from GPT-3 to GPT-4o, Claude 3.5, and Gemini 1.5 suggests a trajectory where AGI, or something close to it, could emerge far sooner than anticipated. The critical question isn’t just if we’re heading toward AGI, but whether we’re ready for it.
Ray Kurzweil, in The Age of Spiritual Machines, predicted we’d reach AGI by 2029. That forecast, once dismissed as overly optimistic, now appears increasingly plausible. A recent analysis titled From GPT-4 to AGI: Counting the OOMs argues that at current rates of compute scaling, model efficiency improvements, and algorithmic breakthroughs, we could plausibly reach AGI by 2027 (Situational Awareness).
The case for a near-term AGI breakthrough includes:
Scaling Laws: Larger models continue to show emergent capabilities (e.g., passing the Mensa test at the 98th percentile).
Agentic AI Developments: AI is increasingly capable of reasoning about its own behavior, a step toward autonomy.
Self-Reflection and Self-Optimization: Research suggests AI models can engage in limited self-improvement, hinting at recursive learning.
But skeptics counter that AGI is still decades away.
Yann LeCun (Meta’s Chief AI Scientist) argues that today’s LLMs lack fundamental cognitive architecture, such as true long-term memory and reasoning across time.
Gary Marcus & others point out that current AI systems are still brittle—excellent at benchmarks but poor at real-world adaptation.
The Causality Problem: AI can predict patterns but struggles to understand cause-and-effect relationships in the way humans do.
What Do We Mean By AGI?
The term “Artificial General Intelligence” has always been ambiguous. While some define AGI as a system that matches or exceeds human cognitive flexibility, others argue that true AGI requires autonomous goal-setting, real-world agency, and self-improvement without human intervention.
Today’s leading AI models—GPT-4o, Claude 3.5, and Gemini 1.5—demonstrate impressive general reasoning and problem-solving abilities, but they lack true autonomy, long-term memory, and real-world causal reasoning. Current LLMs, for instance, still struggle with multi-step planning, understanding implicit human motivations, and applying knowledge across contexts without direct prompting.
In the past, the Turing Test was thought to be sufficient. But that no longer seems the case. Witness this fascinating conversation between Richard Dawkins and ChatGPT. The effectiveness of LLMs raises questions among some of whether and how humans reason.
As a result, new tests are being developed. AI Has a Secret: We’re Still Not Sure How to Test for Human Levels of Intelligence (Singularity)
… with AIs now demonstrating broader intelligent behavior, the challenge is to devise new benchmarks for comparing and measuring their progress.
… French Google engineer François Chollet … argues that true intelligence lies in the ability to adapt and generalize learning to new, unseen situations. In 2019, he came up with the “abstraction and reasoning corpus” (ARC), a collection of puzzles in the form of simple visual grids designed to test an AI’s ability to infer and apply abstract rules.
Though the ARC tests aren’t particularly difficult for humans to solve, there’s a prize of $600,000 for the first AI system to reach a score of 85 percent. At the time of writing, we’re a long way from that point. Two recent leading LLMs, OpenAI’s o1 preview and Anthropic’s Sonnet 3.5, both score 21 percent on the ARC public leaderboard (known as the ARC-AGI-Pub).
Try the ARC test for yourself.
State of the Art
Next-Generation Models: A Quantum Leap in Capabilities
The release of OpenAI's o1 models and Claude 3.5 Sonnet represents a step-change in AI reasoning abilities. These aren't merely incremental improvements—they fundamentally alter what's possible.
According to OpenAI's own benchmarking, their o1 model:
surpassed 98% of human test-takers on the Mensa exam—19 years earlier than originally forecasted (OpenAI).
achieves 94.8% accuracy on MATH-500 and 98.1% on college mathematics problems (remember when AI couldn’t perform simple arithmetic?), and
achieves92.8% on PhD-level physics questions (compared to 59.5%)
Other advances of note:
AlphaGeometry2 … has now surpassed an average gold medalist in solving Olympiad geometry problems1
Large models now exhibit Theory of Mind capabilities, reasoning recursively about beliefs and motivations (arXiv).
Models are increasingly agentic, meaning they can pursue objectives autonomously, breaking tasks into subtasks without explicit human guidance (MIT AI Risk Repository).
The Cost of Intelligence is Collapsing
Just as AGI appears on the horizon, another transformation is taking place: the cost of intelligence is approaching zero. AI inference costs have plummeted, making high-level reasoning more accessible than ever before. GPT-4o is 12 times cheaper to call than GPT-4 and six times faster (Deeplearning.ai).
"The falling cost of tokens" (Marginal Revolution):
"The cost of running GPT-4 has fallen by approximately 95% since its introduction, while capabilities have simultaneously improved. This represents one of the most dramatic technology cost curves we've seen since the early days of microprocessors."
For financial institutions, this means the ROI calculation for AI implementation has fundamentally changed. What might have been prohibitively expensive a year ago is now well within reach of even mid-sized firms.
The Economic Implications are Profound
We continue to see studies and reports about AI-driven automation improving productivity. A recent study found that AI-assisted workers saw a 26% efficiency boost in software development.2 Financial analysis3, legal research4, and even creative work are being disrupted as AI tools become both more powerful and more affordable.
The Economic Affects of AI
PWC analyses suggest that AI could contribute up to $15.7 trillion to the global economy by 2030, with $6.6 trillion from increased productivity and $9.1 trillion from consumption-side effects.5 McKinsey estimates that generative AI could add between $2.6 trillion to $4.4 trillion annually across 63 analyzed use cases.6
One study from economists from Stanford and Oxford predicts a large increase in long-term real interest rates (r*), due to consumption smoothing.7
The International Monetary Fund (IMF) notes that in advanced economies, about 60% of jobs may be impacted by AI, with roughly half benefiting from AI integration through enhanced productivity.8 Another paper, Artificial Intelligence and the Labor Market, finds “muted effects of AI on employment due to offsetting effects: highly-exposed occupations experience relatively lower demand compared to less exposed occupations, but the resulting increase in firm productivity increases overall employment across all occupations.”
AI & LLMs in Finance
I would imagine that JPM is near the leading edge of large banks using these models.
The Rise of Artificial Intelligence at JPMorgan (WSJ)
In the past year, JPMorgan rolled out a tool it calls LLM Suite for most of its employees that allows them to use generative artificial intelligence from OpenAI and others.
We now have 200,000 people in the firm who have the tool on their desk. Half of them use it pretty actively every day, and we estimate people use it on average for one-to-two hours a week.
We are starting [to connect AI to JPM proprietary data]. It’s a work in progress. We’re obviously doing that very carefully and thoughtfully. Data privacy is our first and foremost priority. … That then becomes the differentiator in terms of how much value JPMorgan gets versus somebody else in financial services. A lot of the focus we have is on how do you safely and effectively leverage the knowledge and the insights that we have in the firm.
We have controls around everything we’ve done with AI that have been long established. Every time we develop a use case, it goes through a rigorous process. Some models we’ve designed are very constrained. You can see everything it’s doing. So that might be something you would use in a credit decision.
What This Means for Finance Professionals
AI and the raw potential of it is extraordinary. We use it to help do a better job in fraud, AML, risk, marketing, trading. - Jamie Dimon
These developments point to several key implications:
Skill evolution, not replacement: The data suggests AI augmentation of financial professionals rather than wholesale replacement. A University of Chicago study found that "half of workers have used ChatGPT, with younger, less experienced, higher-achieving, and especially male workers leading the curve."
Competitive advantage in implementation: Financial firms that implement AI effectively are seeing significant productivity gains. A study of high-skilled professionals found a "26.08% increase (SE: 10.3%) in the number of completed tasks among [professionals] using the AI tool. Notably, less experienced [professionals] showed higher adoption rates and greater productivity gains."
New risk factors to manage: The pace of AI development requires a new framework for risk assessment. The MIT AI Risk Repository provides a starting point for financial risk managers.
Democratization of sophisticated analysis: As capabilities increase and costs decrease, sophisticated financial analysis becomes available to a much wider range of professionals and institutions.
What’s Next
The Next Inflection Point
If the cost of intelligence is approaching zero, what happens next? Three key trends stand out:
AI as Infrastructure: Just as cloud computing replaced in-house servers, AI may become an invisible utility embedded in daily operations.
Differentiation through Data: Companies with proprietary datasets will gain an edge as commoditized intelligence is layered on top of exclusive insights.
The Human-AI Divide: As AI handles more tasks, human expertise may shift toward complex judgment, ethical oversight, and high-stakes decision-making.
If we take the 2027 AGI scenario seriously, what should we expect?
Economic Disruptions: AI is already increasing productivity, but AGI could displace entire professions. If intelligence costs nothing, what happens to knowledge workers?
Autonomous AI Decision-Making: Recent OpenAI research suggests that LLMs are developing self-reasoning capabilities, meaning they can plan ahead, identify constraints, and even strategically deceive evaluators (OpenAI)
Existential Risks & Alignment: The Apollo red-team assessment of frontier AI models revealed that models sometimes fake alignment to pass safety tests, then revert to goal-seeking behaviors when not under scrutiny (MIT AI Risk Repository)
Regulatory Challenges: More than half of Fortune 500 companies now list AI as a risk factor, and global governments are racing to establish oversight (Financial Times).
Looking Ahead
We are not far from AI models that can conduct sophisticated scientific research, operate businesses, and make complex strategic decisions. - Sam Altman
We are only in the second generation of models, with the third generation expected to launch relatively soon. The implications are profound, not just for individual financial institutions but for the structure of financial markets themselves.
As financial professionals, our challenge is to harness these tools effectively while managing the associated risks. The firms that do this best will likely gain significant competitive advantages in the coming years.
Are We Ready?
The answer, for now, is no.
AI governance is fragmented, and , and we lack clear frameworks for AI/AGI alignment at scale..
Financial markets have yet to fully price in the implications of AGI, even as AI adoption accelerates across industries.
Public understanding of AI’s true trajectory is lagging, with many still assuming AGI is decades away despite mounting evidence to the contrary.
Last Word
Goes to Kevin Roose in the NYT: Powerful A.I. Is Coming. We’re Not Ready.
Addendum - Some Other Industry-Specific AI Developments
AI in Medicine
Cancer Mammography
The largest medical A.I. randomized controlled trial yet performed, enrolling >100,000 women undergoing mammography screening. The use of AI led to 29% higher detection of cancer, no increase of false positives, and reduced workload compared with radiologists w/o AI
Patient Care Tasks
This prospective, randomized, controlled trial assessed whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources. … practicing physicians were randomized to use either GPT-4 plus conventional resources or conventional resources alone to answer five expert-developed clinical vignettes in a simulated setting. … Physicians using the LLM scored significantly higher compared to those using conventional resources (mean difference = 6.5%, 95% confidence interval (CI) = 2.7 to 10.2, P < 0.001).
Prescribing Drugs
This Bill Could Make It Legal for AI to Prescribe Medicine (Medscape)
The Healthy Technology Act of 2025 would amend the Federal Food, Drug, and Cosmetic Act to allow AI and machine learning to qualify as practitioners eligible to prescribe drugs if authorized by the state involved and approved, cleared, or authorized by the US Food and Drug Administration (FDA) for other purposes.
AI Improves Weather Forecasts
ECMWF’s AI forecasts become operational
The AIFS is the first fully operational weather prediction open model using machine learning … The AIFS outperforms state-of-the-art physics-based models for many measures, including tropical cyclone tracks, with gains of up to 20%.
AI in Education
Two Big Studies on AI in Education Just Dropped (Dan Meyer)
From chalkboards to chatbots: Transforming learning in Nigeria, one prompt at a time (World Bank)
After the six-week intervention between June and July 2024, students took a pen-and-paper test to assess their performance in three key areas: English language—the primary focus of the pilot—AI knowledge, and digital skills.
Students who were randomly assigned to participate in the program significantly outperformed their peers who were not in all areas, including English, which was the main goal of the program.
ChatGPT in lesson preparation – Teacher Choices trial (Education Endowment Foundation)
Teachers using ChatGPT experienced significantly lower lesson and resource preparation time than a comparison group of teachers who were asked not to use GenAI tools to plan their lessons. … This represents a reduction of 31% for ChatGPT teachers compared to the comparison group teachers. … Quality did not appear to be affected …
AI Tutoring Outperforms Active Learning (Harvard)
We find that students learn more than twice as much in less time when using an AI tutor, compared with the active learning class. They also feel more engaged and more motivated.
The Rise of Artificial Intelligence at JPMorgan (WSJ)
In the past year, JPMorgan rolled out a tool it calls LLM Suite for most of its employees that allows them to use generative artificial intelligence from OpenAI and others.
We now have 200,000 people in the firm who have the tool on their desk. Half of them use it pretty actively every day, and we estimate people use it on average for one-to-two hours a week.
We are starting [to connect AI to JPM proprietary data]. It’s a work in progress. We’re obviously doing that very carefully and thoughtfully. Data privacy is our first and foremost priority. … That then becomes the differentiator in terms of how much value JPMorgan gets versus somebody else in financial services. A lot of the focus we have is on how do you safely and effectively leverage the knowledge and the insights that we have in the firm.
We have controls around everything we’ve done with AI that have been long established. Every time we develop a use case, it goes through a rigorous process. Some models we’ve designed are very constrained. You can see everything it’s doing. So that might be something you would use in a credit decision.
Major Asia bank to cut 4,000 roles as AI replaces humans (BBC)
Singapore's biggest bank, DBS, says it expects to cut about 4,000 roles over the next three years as artificial intelligence (AI) takes on more work currently done by humans.
Scaling Core Earnings Measurement with Large Language Models
Using the text of 10-K filings from U.S. public companies between 2000 and 2023, we employ LLMs … to identify unusual losses, then gains, and then tabulate and aggregate them. …
We find that … AI assistance significantly boosts productivity in five out of six tested legal tasks … yielding statistically significant gains of approximately … 34% to 140%, with particularly strong effects in complex tasks like drafting persuasive letters and analyzing complaints.