Contextual agent playbooks and tools: How LinkedIn gave AI coding agents organizational context
In early 2025, we set out to solve a challenge facing most engineering organizations exploring AI-assisted development: AI coding agents are incredibly capable, but they struggle to understand your company. They don’t inherently know your services, frameworks, tribal knowledge, data systems, infrastructure, or your code patterns. And as engineers understand, context is everything.
Facing that challenge drove us to consider a simple question: How can we help AI coding agents understand LinkedIn well enough to help our engineers?
The answer became our Contextual Agent Playbooks & Tools (CAPT), a unified framework that brings together deep organizational knowledge, safe access to internal systems, and executable engineering workflows.
Today, CAPT powers AI-assisted development for more than 1,000 LinkedIn engineers and has fundamentally changed how we build, debug, analyze, and operate software at scale. In this blog, we'll share how we built CAPT, the engineering challenges we solved, the real-world impact it's having across teams, and the lessons we learned that can help other organizations bring AI agents into their development workflows.
From vision to reality: Giving AI coding agents the context they need
Modern AI coding assistants are remarkable. They can write code, explain architecture, generate tests, and even perform complex refactorings. But they all share a fundamental limitation. Without organizational context, AI coding agents hit a ceiling.
For LinkedIn engineers, that organizational context is gained through lived experience and developing a comprehensive understanding of our tech stacks and developer environment, which includes:
Thousands of microservices built over two decades
Large-scale internal frameworks and libraries
Petabyte-scale data infrastructure and analytics platforms
Complex configuration management and multi-language service patterns
Specialized observability, logging, and deployment systems
New engineers take months to reach consistent levels of productivity in this environment, and even tenured engineers struggle outside their own domains.
As we started experimenting with AI coding assistants, we noticed the same pattern. Engineers would use tools like GitHub Copilot for generic coding tasks (writing boilerplate, refactoring functions, generating tests), but the moment they needed to interact with LinkedIn-specific systems, the assistants struggled or couldn't help. They didn't know how to query our data platforms, navigate our experimentation framework, or follow our service creation patterns. More fundamentally, these agents lacked context beyond the current repository. They couldn't see code patterns from related services or understand how different parts of our system connected. Combined with their limited ability to interact with internal tools, many engineers remained hesitant to rely on agent support for real coding tasks, viewing them as helpful but not essential.
This led us to a deliberate choice. Rather than build yet another coding assistant from scratch, we would augment existing AI agents. What they lacked wasn't intelligence. It was organizational context, access to internal tools, and structured guidance for LinkedIn-specific workflows.
That insight became CAPT. A way to give off-the-shelf agents a deep understanding of LinkedIn.
The core idea: Combine tools with executable knowledge
A unified integration layer via MCP
We built CAPT on the Model Context Protocol (MCP), an open standard for connecting AI agents to tools. Through MCP, CAPT provides agents with access to both LinkedIn's internal systems - like code search, data platforms, and observability tools - and third-party services for documents and tickets. Beyond individual tools, CAPT also exposes workflows implemented as playbooks that orchestrate multi-step tasks across these systems.
This immediately unlocked compatibility with any MCP-aware coding agents, standardized interfaces for hundreds of tools, provided the ability to leverage community-built MCP tools, and made it easier to maintain integrations as agents evolve.
Playbooks: Turning institutional knowledge into executable workflows
MCP gave CAPT a stable foundation, but it was our playbooks that made CAPT powerful. Documentation tells you what to do. Playbooks tell AI agents how to do it - step-by-step.
A playbook defines its purpose, inputs, file references, and a sequence of instructions written using Jinja2 templates. CAPT then exposes that playbook to agents as a tool rather than just a prompt template, so MCP-based agents can dynamically decide when to invoke it based on the problem at hand, chain it together with other tools, and reuse it compositionally across workflows, instead of relying on a single, static prompt.
That is where things start to feel different. Tools are no longer just thin wrappers over APIs. They are encoded workflows.
Example: Experiment cleanup
LinkedIn runs thousands of A/B tests to optimize product features and user experiences across the platform. Once an experiment concludes and we've determined the winning variant, engineers need to clean up the experimental code by reviewing the experiment context and results, identifying all code paths tied to the experiment, removing the losing variants and stale flags, and validating that the winning behavior is now the default across all services—a process that used to be manual and error-prone.
This cleanup process was not only time-consuming but also required deep knowledge of LinkedIn's experimentation infrastructure. Engineers unfamiliar with the system often struggled or needed help from a small group of experts, creating bottlenecks.
We captured that entire workflow in a single playbook that combines internal experimentation APIs, code search, and structured cleanup instructions. One engineer authored it. Suddenly, any engineer across any team could perform safe, consistent cleanup, even if they had never touched the experimentation stack before.
The same pattern spread quickly. Need to create a new gRPC service? Add a new endpoint? Debug a crash? Run a deep, domain-specific data analysis? There’s a playbook for that.
Playbooks became reusable building blocks. Agents can call on them like functions. Knowledge that once lived only in senior engineers’ heads is now encoded, validated, and executable.
Engineering challenges and how we solved them
Zero-friction distribution
Even the most powerful developer tools fail if they're difficult to adopt. Complex installation processes, manual configuration steps, and dependency management issues create friction that prevents widespread use. We knew that for CAPT to succeed at LinkedIn's scale, engineers needed to be able to start using it immediately without wrestling with setup.
We built CAPT as a Python package that works both as:
A command line interface tool for configuration and authentication
As a local MCP server that IDEs connect to automatically
Using LinkedIn’s internal developer tool distribution, CAPT ships to every laptop and updates silently in the background.
Setup is as simple as:
For external systems, engineers can either authenticate proactively with a one-time command or let CAPT handle it lazily. When a tool that requires those systems runs and authentication is missing, CAPT kicks off the OAuth flow and then stores the resulting credentials securely in the OS keychain.
No JSON editing, no config wrangling, no dependency issues. Adoption grew quickly because onboarding required essentially zero effort.
Scalable playbook management: Central + local
As CAPT adoption grew, we faced a tension: some workflows apply across multiple repositories within the company, while others are specific to individual teams or services. A purely centralized approach would either force every team's playbook into a global repository (creating bloat and maintenance burden) or leave teams unable to encode their specialized workflows at all.
CAPT supports two complementary categories of playbooks:
Central playbooks are cross-cutting workflows that apply broadly across LinkedIn: experimentation cleanup, common debugging patterns, data analysis flows, PR review helpers, and observability workflows. These act as safe, validated defaults for the entire company.
Local playbooks live directly in each repository and capture team- or service-specific workflows without bloating the global ecosystem.
For example:
When CAPT starts, it discovers both central and local playbooks and presents a unified surface to the agent. This model gave us a distributed contribution system. Any team can encode its workflows without waiting on a central platform team, while still benefiting from shared global playbooks. The result was less strain on central platform resources and faster developer velocity, as teams could ship new playbooks specific to their needs without going through lengthy review or integration processes.
Scaling MCP to hundreds of tools
As CAPT grew, exposing every internal service and playbook as an MCP tool quickly hit a practical limit. Most MCP clients work best with only a few dozen tools at a time.
Our first approach was to group related tools into namespaces organized by workflow or use case. For example, the "data" namespace contained tools for data analysis, while the "oncall" namespace bundled together logs, metrics, and deployment tools for incident response. Engineers would load only the namespaces relevant to their current task, keeping the tool list manageable. While this helped initially, it still pushed complexity onto engineers, who had to know which namespaces to enable. Additionally, it broke down when workflows naturally crossed domains. An engineer working on a data analysis task might suddenly need to debug a pipeline failure, requiring them to manually switch to the oncall namespace mid-workflow.
To fix this, we flipped the model. Instead of exposing every tool directly, the Capt MCP server now exposes a very small set of meta-tools that sit in front of thousands of underlying tools. These meta-tools let the LLM (not the user) discover tools by tag, inspect their schemas, and execute them (get_tools_for_tags, get_tool_info, exec_tool). Each underlying tool is tagged by function (e.g. experimentation, logs, metrics, deployments) and the LLM uses those tags to pick the right tool for a given prompt. Looking ahead, this design aligns well with emerging patterns like skills and advanced tool-calling from Anthropic, where agents search over large libraries of capabilities, load only the relevant ones on demand, and orchestrate them via code. As those ecosystems mature, we can swap or augment the implementation behind our meta-tools (for example, with richer tool search or programmatic tool calling) without changing how engineers write or use playbooks today.
This design trades a few extra seconds of tool discovery for simplicity and scale. The main LLM no longer sees a giant list of tools on every request, which reduces context bloat and improves accuracy. Engineers no longer have to manage namespaces just to keep their agent usable, and we can grow to hundreds of tools without changing the MCP surface.
Measuring impact through deep instrumentation
Building CAPT was only half the challenge. Without concrete data on how engineers were actually using the platform, we would have no way to prioritize which playbooks to improve, identify workflows that weren't landing, or demonstrate value to leadership. Developer tools often struggle to justify their existence because impact is hard to quantify.
We needed a way to move beyond anecdotes and gut feelings. So from day one, we instrumented every CAPT tool and playbook invocation. For each call we log when it happened, which repository it ran in, whether it succeeded, and which tool or playbook was used.
Those signals power internal dashboards that answer simple but critical questions:
Which playbooks are actually being used?
Which teams rely on CAPT the most?
Where does usage drop off?
Which workflows are failing frequently and need better guardrails?
That data shaped our roadmap. It also made it much easier to gain leadership support and secure continued investment in the platform, because we could show concrete impact rather than anecdotes.
Real-world impact
CAPT is not a research prototype. It runs in the middle of LinkedIn’s engineering workflows, and each new integration unlocked a new class of use cases. Bringing code search into Copilot meant engineers could stay in the agent and still see the right code without jumping to other tools. Adding Trino turned natural language into real data analysis. Integrations with third-party services and wiki pages then let agents read rich context and write back findings, so workflows started to feel truly end-to-end. As the surface area grew, coding agents became markedly more useful, and adoption spread from early-adopter engineers to a much broader set of engineers.
Data analysis for everyone
Before CAPT, complex data analysis often required help from a data scientist or someone deeply familiar with our internal query engines and tooling. With CAPT, engineers, PMs, and EMs can start with a natural language question, have an agent translate it into the right queries against our data platforms, iterate based on results, and then turn those queries into reusable dashboards or lightweight data apps.
A typical workflow now looks like a conversation: ask a question, inspect the results, refine the query, repeat. When the analysis stabilizes, the same agent can help convert it into a shareable artifact such as a dashboard or a simple app hosted on our internal data science platforms.
The result is that analysis that used to take days of back-and-forth can now be done in hours. In many cases, teams report roughly 3× faster time from question to usable insight, and they can get there themselves without needing deep expertise in our data tooling for each iteration.
Automated customer issue debugging
Customer issues used to require manually jumping between the issue tracker, logs, metrics, past incidents, and code. With CAPT, one of our playbooks orchestrates that workflow end to end.
Given a ticket, the playbook reads the description, pulls relevant logs, classifies the type of issue, searches for similar incidents, identifies likely root causes, and points the engineer to the most relevant code paths. It then summarizes its findings back into the ticket so the next person who looks at it has immediate context.
This doesn’t replace human judgment, but it gives engineers a “first pass” investigation without having to manually jump between multiple systems. In practice, we’ve seen initial triage time drop by around 70%.
On-call incident response
On-call incident response is stressful and time-sensitive. Engineers need to quickly gather information from multiple sources (e.g. metrics, logs, deployment history, past incidents) while under pressure to restore service. The cognitive load of remembering which dashboards to check, which queries to run, and how to correlate signals across systems makes incidents even more challenging.
CAPT helps streamline this process by automating the initial investigation. On-call engineers now regularly paste an alert link into an agent and ask it to debug what’s going on.
Behind the scenes, the agent first selects an appropriate playbook for debugging the alert, and that playbook guides it through the investigation. Using CAPT’s tools, the agent queries metrics, logs, deployment history, and incident records, looks for recent rollouts or related failures, and then surfaces a narrative: what changed, what is breaking, and where to look first.
Instead of manually stitching together multiple dashboards and consoles, the on-call engineer gets a coherent starting point and a set of concrete next steps.
AI-enhanced code review workflows
CAPT also changed how code review works. Before code is even sent for human review, engineers can ask an agent to take a first pass. It checks for obvious correctness issues, validates patterns against internal best practices, suggests missing tests, and flags incomplete documentation or edge cases. That pre-review step catches many issues that would otherwise generate multiple review rounds.
After review, another set of playbooks helps with the “last mile” of feedback resolution. Given a pull request with comments, an agent can propose concrete code changes to address the feedback, apply them, and push an updated commit, while still leaving the final approval to a human reviewer.
Together, these patterns have led to higher-quality PRs and shorter review cycles, with engineers reporting they get back several hours a week that used to be spent on mechanical fixes and context setup.
Automated debugging for data pipelines & ML training
Data pipelines and ML training jobs are particularly painful to debug: failures can be intermittent, logs are often spread across systems, and the underlying infrastructure is complex.
CAPT integrates with LinkedIn’s compute stack so that, when a Spark job fails or an ML training run stalls, an agent can inspect logs, cluster metrics, job configurations, and historical runs in a coordinated way. It then suggests likely causes such as skew, bad input data, or resource constraints and points to the relevant configuration or code.
For many teams, this has cut the time spent debugging failed jobs by more than half. Engineers who are not experts in our compute stack can still reason about issues and move forward, instead of waiting for a small set of specialists to become available.
What we learned
Building CAPT taught us lessons that, while specific to LinkedIn's infrastructure in their implementation, we believe apply broadly to any organization trying to integrate AI agents into their development workflows and can help avoid common pitfalls while focusing on what actually drives adoption and impact.
Open standards matter. MCP gave us a common way to integrate with multiple agents and made it easy to adopt improvements in the broader ecosystem, without betting on a single client or vendor.
Integrations compound. Every time we plugged CAPT into a core system starting with code search, then data platforms, then collaboration tools, the usage jumped and new workflows emerged, without changing the core architecture.
Context is more valuable than raw intelligence. CAPT works not because it uses exotic models, but because it grounds mainstream models in the right tools, systems, and playbooks.
Composability is a force multiplier. Small, focused playbooks combine into surprisingly sophisticated workflows when agents can chain them together like building blocks.
Decentralization unlocks expert-driven playbooks. Domain experts turn their best practices into playbooks that anyone else can safely reuse.
Starting with high-overhead, high-value workflows builds credibility. We focused early CAPT development on debugging, on-call support, and analysis; areas everyone knew were painful and time-consuming. So improvements there had outsized leverage and quickly created internal champions who pulled the platform into more and more areas.
The future
Even with its current impact, CAPT is still in the early stages of what it can become.
One area we are actively exploring is richer, more dynamic tool selection. The meta-tool design opened the door to thousands of tools; the next step is making tool discovery even more adaptive to the engineer’s context. The repository, the file being edited, the incident being worked on so that the agent feels less like a toolbox and more like a collaborator who gets where you are and what you’re trying to do.
We are also investing in automated playbook generation and maintenance. There is a clear opportunity to learn from how engineers use CAPT today. Which tools tend to be called together, which sequences repeat across teams and propose new playbooks or updates automatically. In the long run, we want CAPT to help maintain its own library of workflows instead of relying purely on manual curation.
We also see an opportunity to reuse CAPT for background agents, not just interactive coding sessions. The same tools and playbooks that power an engineer’s request in Github Copilot can also drive non-interactive workflows, batch jobs, recurring checks or maintenance tasks using the exact same building blocks.
The bottom line
Six months after launch, CAPT’s impact is clear.
More than 1,000 engineers use it.
Issue triage time has dropped by about 70% in many areas.
Data analysis is roughly three times faster for common workflows.
Over 500 playbooks have been authored across the company, and developer satisfaction with the platform is consistently high.
By wiring agents into the code, data, and document systems people already rely on, CAPT has also become a major driver of coding-agent adoption across engineering teams.
The more important change, though, is cultural. CAPT has started to shift how engineers at LinkedIn work. Instead of teams reinventing the same workflows, they encode them once and make them available to everyone. Instead of AI assistants operating in a vacuum, they are grounded in our systems, patterns, and practices.
By building on open standards like MCP and treating playbooks as executable, shareable knowledge, CAPT has become more than a collection of tools. It is a blueprint for how engineering organizations can bring AI into the heart of their development workflows safely, pragmatically, and at scale.