TL;DR
TL;DR: Matching and recommendations are about to get an agent layer. Most companies are not ready. Modern recommendation systems and matching systems have moved from multistage to generative. With the emergence of open source agentic protocols like A2A the next probable agentic wave will be agent-based recommendation systems and matching across e-commerce, insurance, dating, and hiring. As of today (11th of May, 2026) agents are hitting 38.4% Pass@1 on Mercor's APEX-Agents benchmark and are not ready for production recommendations. However, as foundation models improve, companies need to prepare for a possible agent to agent ecosystem in production. The companies that arrive ready will have the advantage in the agent first landscape. To do this, they will need to come prepared with a mature agentic continuous learning infrastructure that can be tested against modern recommendation and matching systems.
Why now?
Three things changed in the last 18 months.
First, generative recommendation systems have crossed the line from research to production at scale. Meta's HSTU, Kuaishou's OneRec, and Spotify’s unified search and recommendation are are the first time that generative recommendations have been put into production. [1]
Second, agent protocols are maturing. Google's A2A protocol moved to the Linux Foundation in June 2025. Anthropic's MCP joined the new Agentic AI Foundation in December 2025. Visa shipped its Trusted Agent Protocol with Cloudflare in October 2025. [2]
Third, the agents themselves remain unreliable. Mercor's APEX-Agents benchmark across tasks like law, consulting, banking, show frontier models completing only reaching 38.4% of real white-collar tasks on their first attempt and that’s with the ReAct harness. This unreliability translates to lack of consumer trust which lead to abandonment of agent first platforms: OpenAI quietly retired in-ChatGPT Instant Checkout in March 2026 after only a few Shopify merchants went live; conversion was 3 times worse than redirecting to the merchant. Microsoft's Magentic Marketplace simulations show severe first proposal bias.
Even if with ready protocols become the agents are not ready. Companies betting on consumer agentic matching today are betting too early. Companies that don't build the infrastructure to be ready will arrive late. The window for preparing the agentic foundation is now.
Three architectures for Recommendations & Matching.
The first two architectures, multistage and generative recommendation, are deployed at scale today. The third agentic recommender will sit at the protocol layer and the architectural pattern designed for future.
1. Multistage retrieval and ranking (a.k.a. the retrieval funnel)
Almost every production matching system in 2026 runs a multi stage cascade.
- Candidates are retrieved cheaply by combining keyword search with semantic embeddings.
- A more expensive reranker scores the survivors.
- A final ranking stage balances relevance against diversity, business goals, fairness, and whatever the company is looking for.
- An online learning loop updates the system from user feedback.
LinkedIn runs this with their JUDE pipeline [4]. Eightfold's talent matching layers their model on top of reference profiles from successful past hires. Amazon's product search, TikTok's Monolith (updating in seconds), and every serious e-commerce vendor (Algolia, Bloomreach, Constructor, Coveo) run variations of the multistage architecture.
It works. It scales. Every layer has a vendor you can buy from [5]. The trade off worth naming: each stage is its own model with its own training data, cold start requires bolt on solutions [6], and end to end optimization is tricky to get right. This system was designed for retrieving candidates from some pool; it wasn't designed for a world where users show up with their own agents.
2. Generative Recommendation Systems
The multistage system is being displaced by the more modern generative recommendation system. Items get quantized into short tuples of semantic codes: meaningful tokens that put similar items in similar neighborhoods, replacing the old multistage retrieval system for candidates. For instance, with Spotify songs each song is a vector of ids ie (13, 5, 69, 42) and a user’s song history becomes a sequence of these tokens. A model is trained to autoregressively generate the next item the user will engage with. Beam search produces the top k recommendations, ie top songs the user might like. This approach collapses the retrieve-then-rank into a single inference model where the predictions become the candidate recommendations and the reactions the learning signal. Ie, for Spotify a like or dislike updates the algorithm.
- Meta proved this scales. HSTU at 1.5 trillion parameters delivers 12.4% lift in A/B tests against their own production grade recommenders that hundreds of engineers built over years. They open sourced the code.
- Kuaishou's OneRec is an end to end production example replacing both retrieval, ranking, and reranking with a single encoder-decoder optimized with direct preference optimization (DPO).
- Netflix runs multiple generative approaches in parallel with the existing multistage, refusing to commit to one architecture.
The convergence story is the strategic one. The techniques that built and aligned modern language models are now flowing into recommendation. Matching is becoming language modeling. The trade offs: harder to interpret, more compute to train, and cold start advantages. Most companies will run hybrids before going fully generative; that's the Netflix path. The OneRec path is the more radical bet but matches the shift to end to end autonomous systems and likely will win in the long run.
3. Agentic recommendations, matching and archetypes
Now, the architecture being designed for the agent era.
Google's A2A protocol gives agents from different vendors a way to discover each other, exchange tasks, and coordinate. It sits alongside Anthropic's MCP, payment protocols from Visa, Mastercard, and Coinbase, commerce protocols from OpenAI, and AP2 from Google. As of April 2026, more than 150 organizations support A2A natively, including AWS, Microsoft, Salesforce, and SAP. The protocol layer is genuinely maturing evidenced by adoption in the Linux foundry and improvement of these opensourced protocols.
What's missing is the architecture for matching on top of it. Today's recommenders match a platform's catalog to a platform's users. Agentic matching changes the shape: the user shows up with their own agent, the platform exposes its own agent, and the two coordinate. This requires a piece nobody has built well at scale: continuously improving user representative agents that actually encode a user's preferences, constraints, and history over time.
We think the architecture will look like RQ-VAE applied to agents instead of items.
Recall how generative recommendation works: items get represented as short tuples of codes from a learned codebook. Items with similar content land in similar neighborhoods, ie for Spotify electronic music tends to separate from country music, for e-commerce pillows stay separated from food, for hiring similar candidates from similar backgrounds tend to be coded similarly, etc.
Now apply the same intuition to users. A user gets assigned to one of many learned agent archetypes. The archetype captures the vertical specific dimensions that matter most: in dating, values and relationship goals; in hiring, candidate skills, working style, and personality; in insurance, risk tolerance and life stage; in e-commerce, brand affinity and price sensitivity. A pool of archetypes handles cold start completely. As the user interacts, their archetype finetunes toward their specific patterns, eventually becoming closer to a personal agent than a shared archetype but simultaneously updating the archtype the same way as RQ-VAE.
For instance, suppose when Alice signs up for a dating app the system determines from her answered question she is most similar to (agent:values_caring, agent:likes_reading, agent:attachment_secure, agent: Alice). Bob signs up for a dating app (agent:values_caring, agent:likes_traveling, agent:attachment_secure, agent: Bob). If Alice and Bob go on a successful date then their agent representations are updated to show success and the other agents that compose them. Meaning that the alignment worked. The opposite is also true.
The architecture is speculative as this space rapidly evolves but it’s the landscape we’ve started preparing for at Interpret AI. The reason it isn't more widespread isn't technical, it's that the results are largely untested, scalability of such an ecosystem of agents running on the A2A is mostly unexplored, and most people don’t have the infrastructure to continuously improve agents.
What the Agentic recommendation & matching architecture requires and the trade offs
A user’s representative agent that doesn't update is just a static persona. The thing that turns archetypes from a clever idea into useful infrastructure is the same thing every production recommender already needs: a continuous improvement loop. The difference is that the loop has to update the agent itself, not just an embedding or a ranker.
Consider what this looks like concretely. A dating platform with archetypes generates a match. The match fails: the date is awkward, the relationship ends in three weeks, the user marks "do not show again." That failure is the reward signal. The user's archetype needs to update so the next match avoids the failure mode. The archetypes themselves need to update so users in the same neighborhood benefit. The matching policy needs to update so the failure case is less likely to recur. None of this happens automatically. It requires infrastructure to capture failure behavior, generate a reward, and update the pool of agents.
We have written about how to build this foundation at InterpretAI; the short version is that the same techniques that keep production recommenders fresh (observability, failure detection, evals, candidate training and redeployment) apply to agents, but the artifacts being updated and the failure modes look different.
The honest trade offs:
- Latency and cost. An agent mediated match is slower and more expensive per decision than a multistage lookup. For high stakes matches (hiring, insurance underwriting) this is acceptable. For low stakes feed ranking, it isn’t and a hybrid system that reduces the pools of candidates could improve the system (ie replacing the final ranking with agent to agent conversations).
- Alignment drift. An archetype that fine tunes on noisy user feedback drifts. With an improper reward signal with updates, a user who right swipes everything for a week corrupts their own archetype. Robust update rules and counterfactual checks matter more here than in traditional recommendation.
- Authority and trust. When a user's agent and a platform's agent disagree on what the user wants, who wins? The chargeback and dispute frameworks from card networks (Visa TAP, Mastercard Agent Pay, Google AP2) are starting to answer this for commerce. They don't yet exist for dating and hiring.
- Regulatory exposure. archetypes are inferences about users at the most intimate level. For regulated verticals, hiring under NYC LL 144 and the EU AI Act high risk obligations live August 2026, insurance under existing actuarial fairness regimes the bias audit is important, real and growing.
The verticals where archetypes land first will be the ones where the regulatory ground is clear, the failure signal is observable, and the stakes per match justify the cost. E-commerce (Amazon Rufus) and B2B procurement are closest. Dating will need to find a better way to use agents (Using agents to match humans (datemyagent.ai), Snack abandoned bot on bot matching because "it didn't have flirtation"; Match Group has lost more than 80% of its value since 2021, unclear how humans will engage with humans). Hiring (LinkedIn, Mercor, Final Round AI) is the highest stakes vertical with the deepest regulatory pressure. Insurance (Gradient AI) is the most underrated quote to risk pool matching is structurally a two sided matching problem and underwriters are starting to look at it this way.
The architecture doesn't ship without the foundation. The foundation is continuous improvement.
What to do now
If you run a matching or recommendation product, four moves are worth making before the agentic transition arrives in your vertical.
- Establish your baseline. If you're not on multistage recommendations with a real reranker, add one this quarter. If you have severe cold start pain or a long tail catalog, pilot generative retrieval for that surface.
- Build the continuous improvement loop for agents now. This is the highest leverage investment for the next 24 months. Agents will penetrate all verticals and is more a question of when than how. Getting ready for this ship with continuous improvement infrastructure will prepare for agentic adoption with or without agentic recommendation systems. The companies that arrive ready for agentic matching will be the ones who already know how to update their models nightly from user outcomes. (InterpretAI: Continuously Improving Agents)
- Prepare for A2A without committing to it. If agents are disrupting your vertical, start testing agentic recommendation systems. Stand up MCP servers for your internal tools if the agent needs them. Run an A2A pilot in a low stakes internal workflow possibly as a hybrid of your existing multistage; Microsoft's Magentic Marketplace is a free simulator. Watch the buyer side agent layer (Apple Intelligence, Google Gemini, OpenAI's agent). When that lands at scale and agent reliability begins betting your internal benchmarks with the release of foundation model improvements, the window has opened. Until then, A2A is plumbing without traffic.
Conclusion
The architectural transitions in matching will play out over years, not quarters. The feedback loop infrastructure you build during the cascade and generative eras is what makes the agentic era possible. The companies that arrive ready will not be the ones who picked the right architecture. They will be the ones whose continuous improvement infrastructure was mature when the architecture improved enough to be useable.