Rethinking Work: Lessons from CPP Investments’ AI-First experiments

Like many organizations, CPP Investments has approached AI as a productivity tool—fewer clicks, faster drafts, and quicker analysis. However, this approach captures only incremental value.

The transformational opportunity of AI rests in redesigning how work is organized, who does it, and what structures support it. In time, we expect AI will reshape not only workflows and processes, but roles, jobs, and organizations themselves.

Consider portfolio construction. Today, analysts recommend, portfolio managers make decisions, and risk teams validate—a structure born of scarce expertise and functional silos. Within the next 12 to 18 months, we expect AI to synthesize across asset classes, evaluate risk, and stress-test scenarios simultaneously, eliminating many current boundaries.

We set out to test this through an autonomous agent experiment. Unlike traditional AI that responds to prompts, these agents act like digital workers that break down problems, coordinate with other AI agents, and verify their work, replicating entire team workflows. We ran head-to-head trials across three real investment tasks—position reconciliation, performance attribution and a forward-looking tariff scenario—comparing simple prompting with a self-organizing, multi-agent setup using verification gates.

What we learned

Instead of merely accelerating workflows, AI made them obsolete. Today’s teams, specialists and committees exist because humans have limits. We can’t process everything at once or be experts in everything. When AI removes these limits, organizations can rebuild around continuous intelligence flowing through work, rather than through people.

Learning 1: Match AI to the work

Not all work is the same, so don’t force one “AI approach” on everything. We used three patterns:

AI Excels for clear, rules-based tasks;
AI Executes for heavy math requiring human oversight;
AI Explores for ambiguous problems where judgment drives value.

Every capability—the ‘what’ we do—fit one of these patterns and determined how we set up the system. Position reconciliation—verifying that internal portfolio records match those of custodians and counterparties after trades—and post-trade checks fall under AI Excels; performance attribution under AI Executes; and scenario analysis under AI Explores. Each requires a different setup to achieve reliable results.

Learning 2: Rails before horsepower: specification and verification

More agents didn’t magically improve results. What worked was strict governance: precise prompts, structured data checks, consistent currency conventions, and requiring all calculations to match within five basis points. We forced the AI to document every assumption between steps—like requiring students to show their work. Without this, the AI might assume USD in step one but mysteriously switch to EUR by step three. These controls consistently beat fancy multi-agent orchestration.

We turned these lessons into a five-gate system, that imposed discipline at each stage:

Frame: specify the problem and context;
Input: clean and validate the numbers;
Model: develop and/or run an analysis and flag uncertainties;
Validate: cross-check results;
Narrate: it must document every decision.

The AI can’t skip ahead without clearing each gate.

Where did humans make the biggest difference? At gates one and three: framing the problem and selecting or defining the analytical approach. Without human intervention at these two points, the agents tended to drift.

Example 1: Position reconciliation (AI Dominates)

We gave AI a $522M portfolio puzzle with hidden errors—wrong prices, missed corporate actions, calculation mistakes. Direct prompting failed badly: models disagreed, and missed critical errors (although the models tried to make up for it with fancy formatting, which of course didn’t help!).

The breakthrough came when we required AI to rate its confidence in each finding. Anything below 70% triggered human review. This self-assessment plus data checks boosted accuracy 45%. At best, the AI caught 83% of our planted errors (and even higher dollar materiality).

Takeaway: AI performed best as first-pass reviewer with human backup. Two AI models cross-checked each other, and humans reviewed uncertain cases. Enforcing basics (verifying sources, normalizing identifiers, tracing every figure) proved more effective than complexity.

Example 2: Performance & risk attribution (AI Executes)

Here, the question wasn’t ‘can AI do math?’ but ‘will the math be right and auditable?‘

Given a clear problem, clean data, and explicit methodology, AI generated code, ran attribution, and explained results. The main risk was method mis-specification or the AI choosing the wrong analytical approach, like using a monthly calculation when daily was needed. Processing time dropped dramatically and approached real-time once connected to live data feeds.

Takeaway: Once again, quality control beat quantity of agents. AI handled execution—running calculations, checking factor logic, and validating statistical confidence—while humans framed the problem and verified the results.

Example 3: Scenario impact (AI Explores)

We asked AI to model how semiconductor export restrictions would impact a globally diversified portfolio like ours. Without clear human framing, the AI produced nonsense, sometimes showing gains when there should be losses.

But with careful hybrid (AI-Human expertise) led framing it produced credible results: a 1.82% portfolio decline that aligned with analyst expectations and historical priors.

Takeaway: Speed without accuracy is worthless. “Autonomous AI” still needs human guardrails to avoid dangerously wrong conclusions.

Learning 3: Design for memory and state

Many AI ‘hallucinations’ aren’t AI making things up—they’re AI memory loss or gaps. When USD in step 1 somehow becomes EUR by step 3, it’s not creativity, it’s forgetting or responding with a reasonable guess where the actual input had not yet been specified. Getting the framing and analytical approach right, then forcing AI to preserve and pass forward every assumption through the five gates cut errors sharply.

Most wrong answers stem from forgotten context. It’s essential to build traceability into every step.

Learning 4: The operating model must evolve

Traditional teams—risk, research, portfolio management—reflect how humans divide labour today. AI changes that logic. Future org charts will be built around how humans and AI naturally divide work.

We see three patterns emerging:

AI Excels: AI does the work, humans handle exceptions (e.g. automated trading).
AI Executes: AI analyzes, humans decide (e.g. investment recommendations).
AI Explores: AI finds patterns, humans interpret and refine them (e.g. market research).

This shift points to smaller teams organized around outcomes, not departments, and three reimagined human roles:

Facilitators—manage AI workflows.
Architects—frame problems.
Leaders—make judgment calls.

The strategic shift

Economic activity has always been about crystallizing information into products. A pencil embodies the crystallized knowledge of forestry, mining, chemistry, and manufacturing—specialization that compensates for limited human bandwidth.

Investment management is similar: crystallized computation producing portfolios and decisions. Roles exist to work around these limits. An equity analyst exists because one person can’t simultaneously process credit risks, equity valuations, and macro trends. Our entire industry structure is built on human cognitive restraints.

Every institution now faces a choice: use AI to go 20% faster, or rebuild to be 10x different.

Our experiments suggest three immediate actions:

Stop optimizing, start reimagining. Identify workflows that only exist because humans can’t process everything simultaneously and test whether AI can collapse sequential steps into continuous processes.
Invest in governance, not complexity. Verification infrastructure—like our five-gates and confidence thresholds—matters more than sophisticated orchestration.
Keep humans in the driver’s seat. Run small experiments where AI leads and humans verify, or AI analyzes and humans decide, or AI explores and humans direct. Your future operating model will emerge from these patterns.

The transformation isn’t coming—it’s already here. The leaders will be those who accept that today’s organizational constraints are already obsolete.

Why every investor should embrace Responsible AI

As investors grapple with AI's rapid expansion, responsible development and deployment is becoming an increasingly important investment

Article June 21, 2024

Why generative AI marks a “step change” in democratizing technology

As Chief Operating Officer at CPP Investments, Jon Webster is on the front lines of the technological change reshaping global investing.

Article May 16, 2024

How investors are navigating the AI era

Generative artificial intelligence (genAI) is transforming the world of investing.

Article March 14, 2024

Like many organizations, CPP Investments has approached AI as a productivity tool—fewer clicks, faster drafts, and quicker analysis. However, this approach captures only incremental value. The transformational opportunity of AI rests in redesigning how work is organized, who does it, and what structures support it. In time, we expect AI will reshape not only workflows and processes, but roles, jobs, and organizations themselves. Consider portfolio construction. Today, analysts recommend, portfolio managers make decisions, and risk teams validate—a structure born of scarce expertise and functional silos. Within the next 12 to 18 months, we expect AI to synthesize across asset classes, evaluate risk, and stress-test scenarios simultaneously, eliminating many current boundaries. We set out to test this through an autonomous agent experiment. Unlike traditional AI that responds to prompts, these agents act like digital workers that break down problems, coordinate with other AI agents, and verify their work, replicating entire team workflows. We ran head-to-head trials across three real investment tasks—position reconciliation, performance attribution and a forward-looking tariff scenario—comparing simple prompting with a self-organizing, multi-agent setup using verification gates. What we learned Instead of merely accelerating workflows, AI made them obsolete. Today's teams, specialists and committees exist because humans have limits. We can’t process everything at once or be experts in everything. When AI removes these limits, organizations can rebuild around continuous intelligence flowing through work, rather than through people. Learning 1: Match AI to the work Not all work is the same, so don’t force one “AI approach” on everything. We used three patterns: AI Excels for clear, rules-based tasks; AI Executes for heavy math requiring human oversight; AI Explores for ambiguous problems where judgment drives value. Every capability—the ‘what’ we do—fit one of these patterns and determined how we set up the system. Position reconciliation—verifying that internal portfolio records match those of custodians and counterparties after trades—and post-trade checks fall under AI Excels; performance attribution under AI Executes; and scenario analysis under AI Explores. Each requires a different setup to achieve reliable results. Learning 2: Rails before horsepower: specification and verification More agents didn't magically improve results. What worked was strict governance: precise prompts, structured data checks, consistent currency conventions, and requiring all calculations to match within five basis points. We forced the AI to document every assumption between steps—like requiring students to show their work. Without this, the AI might assume USD in step one but mysteriously switch to EUR by step three. These controls consistently beat fancy multi-agent orchestration. We turned these lessons into a five-gate system, that imposed discipline at each stage: Frame: specify the problem and context; Input: clean and validate the numbers; Model: develop and/or run an analysis and flag uncertainties; Validate: cross-check results; Narrate: it must document every decision. The AI can't skip ahead without clearing each gate. Where did humans make the biggest difference? At gates one and three: framing the problem and selecting or defining the analytical approach. Without human intervention at these two points, the agents tended to drift. Example 1: Position reconciliation (AI Dominates) We gave AI a $522M portfolio puzzle with hidden errors—wrong prices, missed corporate actions, calculation mistakes. Direct prompting failed badly: models disagreed, and missed critical errors (although the models tried to make up for it with fancy formatting, which of course didn't help!). The breakthrough came when we required AI to rate its confidence in each finding. Anything below 70% triggered human review. This self-assessment plus data checks boosted accuracy 45%. At best, the AI caught 83% of our planted errors (and even higher dollar materiality). Takeaway: AI performed best as first-pass reviewer with human backup. Two AI models cross-checked each other, and humans reviewed uncertain cases. Enforcing basics (verifying sources, normalizing identifiers, tracing every figure) proved more effective than complexity. Example 2: Performance & risk attribution (AI Executes) Here, the question wasn't 'can AI do math?' but 'will the math be right and auditable?' Given a clear problem, clean data, and explicit methodology, AI generated code, ran attribution, and explained results. The main risk was method mis-specification or the AI choosing the wrong analytical approach, like using a monthly calculation when daily was needed. Processing time dropped dramatically and approached real-time once connected to live data feeds. Takeaway: Once again, quality control beat quantity of agents. AI handled execution—running calculations, checking factor logic, and validating statistical confidence—while humans framed the problem and verified the results. Example 3: Scenario impact (AI Explores) We asked AI to model how semiconductor export restrictions would impact a globally diversified portfolio like ours. Without clear human framing, the AI produced nonsense, sometimes showing gains when there should be losses. But with careful hybrid (AI-Human expertise) led framing it produced credible results: a 1.82% portfolio decline that aligned with analyst expectations and historical priors. Takeaway: Speed without accuracy is worthless. “Autonomous AI” still needs human guardrails to avoid dangerously wrong conclusions. Learning 3: Design for memory and state Many AI 'hallucinations' aren't AI making things up—they’re AI memory loss or gaps. When USD in step 1 somehow becomes EUR by step 3, it’s not creativity, it’s forgetting or responding with a reasonable guess where the actual input had not yet been specified. Getting the framing and analytical approach right, then forcing AI to preserve and pass forward every assumption through the five gates cut errors sharply. Most wrong answers stem from forgotten context. It’s essential to build traceability into every step. Learning 4: The operating model must evolve Traditional teams—risk, research, portfolio management—reflect how humans divide labour today. AI changes that logic. Future org charts will be built around how humans and AI naturally divide work. We see three patterns emerging: AI Excels: AI does the work, humans handle exceptions (e.g. automated trading). AI Executes: AI analyzes, humans decide (e.g. investment recommendations). AI Explores: AI finds patterns, humans interpret and refine them (e.g. market research). This shift points to smaller teams organized around outcomes, not departments, and three reimagined human roles: Facilitators—manage AI workflows. Architects—frame problems. Leaders—make judgment calls. The strategic shift Economic activity has always been about crystallizing information into products. A pencil embodies the crystallized knowledge of forestry, mining, chemistry, and manufacturing—specialization that compensates for limited human bandwidth. Investment management is similar: crystallized computation producing portfolios and decisions. Roles exist to work around these limits. An equity analyst exists because one person can't simultaneously process credit risks, equity valuations, and macro trends. Our entire industry structure is built on human cognitive restraints. Every institution now faces a choice: use AI to go 20% faster, or rebuild to be 10x different. Our experiments suggest three immediate actions: Stop optimizing, start reimagining. Identify workflows that only exist because humans can't process everything simultaneously and test whether AI can collapse sequential steps into continuous processes. Invest in governance, not complexity. Verification infrastructure—like our five-gates and confidence thresholds—matters more than sophisticated orchestration. Keep humans in the driver’s seat. Run small experiments where AI leads and humans verify, or AI analyzes and humans decide, or AI explores and humans direct. Your future operating model will emerge from these patterns. The transformation isn't coming—it's already here. The leaders will be those who accept that today's organizational constraints are already obsolete. Why every investor should embrace Responsible AI As investors grapple with AI's rapid expansion, responsible development and deployment is becoming an increasingly important investment Article June 21, 2024 Why generative AI marks a “step change” in democratizing technology As Chief Operating Officer at CPP Investments, Jon Webster is on the front lines of the technological change reshaping global investing. Article May 16, 2024 How investors are navigating the AI era Generative artificial intelligence (genAI) is transforming the world of investing. Article March 14, 2024

Language switcher

Year in Review