GitHub Actions Hit Our Limits. An AI Agent Handled the Rest.

We ran everything through GitHub Actions for two years. Build pipelines, test suites, deployments, linting, security scans. At peak, we had 47 workflow files across 12 repositories, and the system mostly worked. Kenji maintained the workflows, which meant he spent about four hours a week writing YAML, debugging runner issues, and explaining to the rest of the team why their builds failed.
The trouble started when we needed Actions to do things that weren't "run this command when this event happens."
Where Actions Works
I want to be clear: GitHub Actions is good at what it does. Push code, trigger a build. Open a PR, run the tests. Merge to main, deploy to staging. These are event-driven, deterministic workflows, and Actions handles them well. The marketplace has thousands of prebuilt actions. The runner infrastructure is solid. The pricing is reasonable for most teams.
Kenji set up our CI pipeline in about two days, and it ran reliably for 18 months with minimal maintenance. Build, test, lint, deploy. Four jobs, well-defined steps, predictable outcomes. For this kind of work, I wouldn't swap Actions for anything else.
The YAML syntax is verbose, and anyone who's written a complex matrix build knows the pain of debugging indentation errors at 11 PM. But that's a usability complaint, not a capability gap. Actions does what it promises.
Where We Hit the Wall
The first problem showed up when Anya asked for a workflow that would update configuration files across all 12 repositories whenever we changed our shared ESLint rules. Sounds simple. Update a JSON file, commit it, open a PR. Actions can do that with a script step.
Except the config files weren't identical across repos. Some used the base config. Some extended it with project-specific overrides. Two repos had legacy settings that couldn't be touched without breaking their build. One repo had the config nested inside a monorepo structure. "Update the ESLint config" wasn't a file copy. It was a context-dependent operation that required reading each repo's existing config, understanding the structure, deciding what to change and what to preserve, and generating the right diff.
Kenji wrote a bash script that handled six of the twelve repos. The other six had enough variation that he'd have needed a separate script for each one, plus maintenance overhead whenever the config structure changed. He estimated 15 hours to build, and the scripts would break the first time someone reorganized their repo's config layout.
The second problem was cross-repo auditing. Rafael wanted a weekly report: which repos had outdated dependencies, which had failing CI, which hadn't been touched in 90 days, which were missing required files like CODEOWNERS or LICENSE. Actions can check this for a single repo using a scheduled workflow. But aggregating results across 12 repos, correlating the data, and generating a readable summary? That's not a workflow. That's analysis.
We tried using a "meta" repo with a workflow that cloned all 12 repos and ran checks. The workflow took 25 minutes to run, the script was 400 lines of bash, and it broke every time someone added or archived a repository. Marcus spent half a day fixing it the first time it failed silently and we didn't get the audit report for three weeks.
The third problem was PR descriptions. We wanted every PR to include a summary of what changed and why, a checklist of testing steps, and links to related issues. Actions can enforce that a PR description isn't empty. It cannot read the diff, understand the intent, and generate the description. That requires reasoning about code, not executing commands.
The Reasoning Gap
All three problems share the same characteristic: they need the system to read context, make a judgment, and produce output that varies based on what it finds. Actions is built around a different model. You define the trigger, list the steps, and the runner executes them identically every time. There's no "read this file, figure out what it means, and decide what to do" step in the Actions vocabulary.
You can approximate reasoning with enough bash scripting and conditional logic. Kenji proved that with the ESLint updater -- he got it working for half the repos. But the scripts become the bottleneck. They're brittle, hard to test, and opaque to anyone who didn't write them. When Kenji went on vacation for two weeks, nobody touched the workflows because nobody understood the scripts well enough to debug them.
This isn't an Actions-specific problem. Jenkins, CircleCI, GitLab CI -- they all work the same way. Define steps, execute steps. The reasoning layer is always missing because these tools were built for automation, not judgment.
Adding the Agent Layer
We didn't replace Actions. We added agents on top of it for the tasks that required context and reasoning.
The config file updater agent solved the ESLint problem in a way scripts never could. When we update our shared config, the agent clones each repo, reads the existing config file, understands how it extends or overrides the base, generates the appropriate update that preserves local customizations, and opens a PR with a clear description of what changed and why. Kenji reviewed the first batch of PRs it generated and approved 10 of 12 without changes. The two that needed tweaks were edge cases involving deprecated config keys that the agent flagged but wasn't sure how to handle. It asked in the PR description rather than guessing.
For the cross-repo audit, we set up an agent that runs weekly. It reads each repo's dependency manifests, CI status, recent commit history, and file structure. Then it produces a summary organized by urgency: repos with security vulnerabilities at the top, repos with stale dependencies in the middle, repos that are healthy at the bottom. The summary takes about 90 seconds to generate instead of 25 minutes, and it hasn't broken once in four months because the agent adapts to repo structure changes rather than relying on hardcoded paths.
The PR template problem was the easiest to solve. An agent reads the diff when a PR is opened, generates a description that summarizes the changes, suggests testing steps based on what files were modified, and links related issues if the commit messages reference any. The author reviews and edits the generated description. It's not perfect every time, but it gives you a 70% draft in seconds instead of starting from a blank text box.
What Stayed in Actions
We didn't tear up our CI/CD pipeline. Actions still runs our builds, tests, lints, and deployments. Those are deterministic workflows that don't need judgment. Push code, run tests, report results. Actions is the right tool for that.
What changed is the boundary. Actions handles the mechanical work. Agents handle the work that requires reading, reasoning, and adapting. The two layers don't compete; they cover different parts of the automation spectrum.
Kenji's maintenance load dropped from four hours a week to about one. Most of the time he spent before was on the scripted logic -- the bash that tried to be smart about file contents or cross-repo state. That logic now lives in the agent layer, where it's more readable, more maintainable, and more capable.
The Comparison Nobody Makes
When teams look for GitHub Actions alternatives, they usually compare it to other CI/CD platforms. GitLab CI, CircleCI, Jenkins, Buildkite, Dagger. Those comparisons make sense if your problem is with Actions' runner pricing, YAML syntax, or platform lock-in.
But if your problem is that Actions can't reason about your code, switching to another CI/CD platform won't help. They all share the same limitation. The alternative isn't a different pipeline tool. It's adding a layer that can do what pipelines can't: read context, make decisions, and produce variable output based on what it finds.
Our 47 workflows are still running. We just stopped asking them to be smart.
Try These Agents
- GitHub Config File Updater -- Update configuration files across multiple repos with context-aware diffs that preserve local customizations
- GitHub Multi-Repo Audit Agent -- Weekly cross-repo health reports covering dependencies, CI status, and compliance
- GitHub PR Template Generator -- Auto-generate PR descriptions from diffs with testing checklists and issue links