Articles

We Stopped Triaging GitHub Issues by Hand. An Agent Does It Now.

Ibby SyedIbby Syed, Founder, Cotera
8 min readMarch 8, 2026

We Stopped Triaging GitHub Issues by Hand. An Agent Does It Now.

We Stopped Triaging GitHub Issues by Hand. An Agent Does It Now.

Every Monday morning, Rafael opened GitHub and stared at the issue backlog. Between Friday afternoon and Monday morning, anywhere from 15 to 30 new issues would appear across our four public repos. Bug reports from users. Feature requests. Questions that should have been discussions. Duplicates of things we'd already fixed. And occasionally, a genuine emergency buried three pages deep because nobody had labeled it yet.

Rafael's triage routine took about 45 minutes. Read each issue. Decide if it's a bug, feature request, question, or duplicate. Assign a priority. Figure out who should own it based on which part of the codebase it touches. Add labels. Close duplicates with a link to the original. Sometimes leave a comment asking for more information.

Forty-five minutes on Monday. Another 20 minutes on Wednesday to catch the mid-week arrivals. Fifteen more on Friday for anything that trickled in. That's roughly three hours a week of a senior engineer's time spent reading, classifying, and routing issues. Not fixing them. Just sorting them into the right piles.

The Backlog Problem

The real cost wasn't Rafael's time, though three hours a week of senior engineering time isn't nothing. The real cost was what happened when Rafael was busy.

During our Q3 launch, Rafael skipped triage for two weeks. He was heads-down on a production incident, then on the launch itself. When he came back to the backlog, there were 112 unprocessed issues. Some had been sitting for 14 days with no response. Three were duplicates of the production incident he'd just spent a week fixing -- if he'd seen them earlier, the user reports might have helped him diagnose the problem faster.

Tomás tried to help with triage during that stretch, but he didn't have Rafael's institutional knowledge. He assigned a payment-related bug to the frontend team because the reporter mentioned a UI error message. The actual bug was in the billing API. It sat with the wrong team for four days before someone noticed and reassigned it.

The triage process required someone who understood the codebase deeply enough to route issues correctly. That person was Rafael. And Rafael had other things to do.

What We Tried First

GitHub has built-in issue automation. You can create issue templates with pre-filled labels, set up auto-labeling based on file paths in pull requests, and use GitHub Actions to run workflows when issues are created.

We tried all of it.

Issue templates helped a little. We created separate templates for bug reports, feature requests, and questions. Each template had a different default label. This handled classification for about 60% of issues -- the ones where the reporter actually chose the right template. The other 40% either picked the wrong template (filing a bug as a feature request because they didn't read the options) or used the blank issue option, which had no template at all.

Auto-labeling based on file paths doesn't apply to issues. It works for PRs, where there's a diff to analyze. Issues don't reference specific files unless the reporter happens to include a file path in their description. Most don't.

We wrote a GitHub Action that ran on issue creation and tried to classify issues using keyword matching. If the body contained "crash" or "error" or "broken," label it as a bug. If it contained "would be nice" or "feature" or "request," label it as a feature request. This worked embarrassingly poorly. Users describe bugs in creative ways that keyword matching can't handle. "The export button doesn't seem to do anything" is clearly a bug report, but none of our trigger words appear in it.

The Agent Approach

In November, we set up a multi-repo audit agent that, among other things, processes new issues across all four of our repos.

When a new issue comes in, the agent reads the full issue text -- title, body, and any comments. Then it does something our keyword-matching Action never could: it checks recent commits and open PRs to see if anyone is already working on something related. It reads the repo's file structure to understand which component the issue likely affects. It looks at the issue author's history to see if they've filed issues before and whether those were typically bugs or feature requests.

Based on all of that context, the agent assigns a priority label (P0 through P3), a category label (bug, feature, question, duplicate), and an owner. For duplicates, it links the original issue and adds a comment explaining why it thinks the issues are related. For questions that should be discussions, it adds a comment suggesting the reporter use the Discussions tab and explains why.

The first week, Rafael reviewed every decision the agent made. Out of 53 issues processed, the agent got the classification right on 47. Four of the six misclassifications were borderline cases -- issues that could reasonably be categorized either way. Two were genuine mistakes: a bug report that the agent classified as a feature request because the reporter phrased it as "it would be great if the export button actually worked." Rafael corrected those and the agent's accuracy improved.

By the third week, Rafael was only spot-checking. He'd scan the issue list on Monday morning, confirm the labels looked right, and move on. His triage time dropped from three hours a week to about 20 minutes.

What the Agent Actually Catches

The most useful thing the agent does isn't classification. It's duplicate detection.

Before the agent, duplicates would sit open for days until someone happened to remember a similar issue from three months ago. We had one bug that was reported seven times over a six-month period. Each report had slightly different wording, and none of them referenced each other. The seventh reporter was understandably frustrated that the bug had been "known for six months" without a fix. In reality, each report had been treated as a new issue, triaged independently, and deprioritized because each one looked like a single user complaint.

The agent caught a duplicate within its first week that would have slipped past manual triage. A user reported that search results were returning stale data. Two days later, another user reported that "filters don't seem to update." The agent flagged these as likely duplicates because both issues referenced the same caching layer, even though the symptoms described by the users were completely different. Rafael confirmed they were the same underlying bug. Without the agent, those two issues would have been assigned to different engineers who would have independently discovered they were debugging the same problem.

Priority assignment is the other area where the agent outperforms manual triage. Rafael tended to assign priority based on how urgent the issue felt in the moment. Monday morning, everything looks medium priority. Friday afternoon after a long week, everything looks low priority. The agent is boringly consistent. It checks how many users are affected (based on reaction counts and duplicate reports), whether the issue impacts a revenue-generating feature, and whether there's a workaround mentioned in the thread. Same criteria every time, no fatigue.

The Numbers

Before the agent: 3 hours per week of Rafael's time on triage. Average time from issue creation to first label: 18 hours (longer on weekends). Average time from issue creation to assignment: 26 hours. Duplicate issues that lasted more than a week before detection: about 4 per month.

After the agent: 20 minutes per week of Rafael's time on spot-checks. Average time from issue creation to first label: 4 minutes. Average time from issue creation to assignment: 4 minutes. Duplicate issues lasting more than a week: zero in the past three months.

The response time improvement had an unexpected side effect. Contributors started filing better issues. When they saw that their issue was labeled and assigned within minutes, they felt heard. Several mentioned it in the repo's discussion forum. One contributor said, "I've never seen an open-source project respond this fast." We didn't respond faster. We just stopped letting issues sit unlabeled for a day while they waited for a human to read them.

What Still Needs a Human

The agent doesn't respond to issues. It triages them. The actual human response -- asking clarifying questions, proposing solutions, closing issues with explanations -- still comes from the team. We deliberately didn't automate that part because users deserve to interact with a person when they take the time to file a report.

The agent also doesn't handle issue escalation well. If a P2 issue gets ten angry reactions and five "me too" comments over the course of a day, the agent doesn't notice the momentum and re-prioritize. That's still Rafael's judgment call, and he's better at reading the room than any automated system.

But the sorting? The labeling? The duplicate detection? The routing to the right person? That's mechanical work that happens to require reading comprehension. And reading comprehension is exactly the part that keyword matching can't do and an agent can.

Rafael now spends his Monday mornings on the code he was supposed to be writing all along.


Try These Agents

For people who think busywork is boring

Build your first agent in minutes with no complex engineering, just typing out instructions.