Articles

AI Cold Email Agents Are Real. We've Run One for 6 Months.

Ibby SyedIbby Syed, Founder, Cotera
10 min readMarch 7, 2026

AI Cold Email Agents Are Real. We've Been Running One for Six Months.

AI Cold Email Agents Are Real. We've Been Running One for Six Months.

When I tell people we run an AI agent on our cold email operation, they assume one of two things. Either the agent writes our cold emails (it doesn't), or the agent is some chatbot gimmick that generates generic copy (it really doesn't).

The agent does something far less glamorous and far more useful. It monitors 22 active campaigns, cleans incoming lead lists, flags deliverability problems before they cause damage, pauses underperforming campaigns, and generates a weekly analytics report that tells us what's actually working across our outbound operation.

We started running it in September. It's March. Six months is enough time to share real numbers and honest observations about what works, what doesn't, and what an AI cold email agent actually looks like in practice.

What the Agent Actually Does

I want to strip away the marketing language around AI agents and describe exactly what ours does, task by task.

It monitors campaign health. Every campaign in SmartLead has metrics: open rate, reply rate, bounce rate, click rate, unsubscribe rate. The agent reads these metrics continuously and compares them against thresholds we set. If a campaign's bounce rate crosses 3%, the agent pauses it and sends a Slack notification to Marcus with the campaign name, current bounce rate, and the trend over the past 72 hours. If an open rate drops more than 15 percentage points within 48 hours, the agent flags it for review.

Before the agent, Marcus checked these metrics manually every morning. The gap between a problem occurring and Marcus noticing ranged from hours (if he was on top of it) to days (weekends, vacations, busy Tuesdays). The agent has no gaps. When a campaign's bounce rate spiked to 4.7% at 2 AM on a Saturday in November, the agent paused it at 2:03 AM. Marcus saw the Slack message Sunday morning and dealt with it Monday. Without the agent, that campaign would have run all weekend with a bad list, potentially damaging the sending account's reputation.

It cleans lead lists. Every new lead list goes through a validation pipeline before entering any campaign. The agent checks email format validity, removes role-based addresses (info@, hello@, team@), deduplicates against all active campaigns, and cross-references against our Salesforce instance to make sure we're not emailing existing customers.

In six months, the agent has removed 1,847 leads from incoming lists before they entered campaigns. That's 1,847 potential bounces, duplicate contacts, or awkward "why are you cold emailing your own customer" moments that never happened.

It generates a campaign performance report every Monday. This is the part Marcus cares about most. The report aggregates data across all 22 campaigns and breaks it down by persona (VP Engineering vs. VP Marketing vs. Director of Ops), by sequence length (three-step vs. four-step), by lead source (Apollo with "recently funded" filter vs. Apollo with "hiring" filter), and by sending account.

Before the agent, this analysis happened quarterly, maybe. Someone would export CSVs from SmartLead, combine them in a spreadsheet, build some pivot tables, and present findings at a team meeting. It took hours and the data was weeks old by the time anyone looked at it. Now it arrives every Monday at 7 AM, covering the prior week's data.

It manages campaign activation states. Based on the performance data, the agent can pause campaigns that have been underperforming for more than five consecutive days and send recommendations about whether to refresh the lead list, test a new sequence, or retire the campaign entirely. It can also flag campaigns that are performing above threshold as scaling candidates, meaning Marcus should consider adding more leads.

Month-by-Month Results

I'm going to share actual performance data because abstract claims about AI agents are worthless without numbers.

September (Month 1): We started with 14 active campaigns. The agent's first contribution was catching three campaigns with stale lead lists (average lead age over 90 days). Marcus hadn't noticed because the campaigns were still sending, they just weren't getting replies. We refreshed the lists and reply rates on those three campaigns went from a combined 1.1% to 3.6%.

October (Month 2): The lead cleaning pipeline removed 287 invalid or duplicate leads from new uploads. Bounce rate across all campaigns dropped from 3.4% to 2.1%. The weekly analytics report surfaced an insight that changed our strategy: campaigns targeting Director-level prospects were outperforming VP-level campaigns by 1.9 percentage points on reply rate. We reallocated budget to launch two new Director-targeted campaigns.

November (Month 3): The Saturday bounce rate incident I mentioned earlier. Agent paused the campaign automatically. Estimated saves: two days of sending to a bad list, potential reputation damage to one of our three primary sending accounts. Elena, who handles account health, said this single catch was worth the entire investment in setting up the agent.

December (Month 4): Holiday sending. The agent automatically reduced flagging sensitivity during the December 20-January 3 window because open rates naturally drop during holidays (people aren't checking work email). Without this adjustment, the agent would have paused half our campaigns for low open rates that were actually seasonal, not problematic. We configured this rule in advance based on November's data.

January (Month 5): Scaling month. We went from 14 campaigns to 22. Without the agent, Marcus said he wouldn't have been able to manage more than 15 manually. The agent handled the monitoring load without any configuration changes. The weekly report adapted automatically to include the new campaigns.

February (Month 6): Best month. Average reply rate across all campaigns: 4.2%. Average bounce rate: 1.4%. Marcus spent about 2 hours per week on campaign management, down from 11+ hours before the agent. The Director-level targeting insight from October had fully matured into our highest-performing campaign segment.

What the Agent Cannot Do

Honesty about limitations matters more than enthusiasm about capabilities.

The agent cannot write good cold email copy. We tried. The emails it generated were technically competent and completely forgettable. Cold email copy that gets replies needs a specific point of view, a genuine understanding of the prospect's problems, and a voice that sounds like a human with opinions. The agent doesn't have opinions. Marcus writes all our sequences.

The agent cannot handle replies. When a prospect responds, a human needs to read the reply, understand the context, and craft an appropriate response. Is this a "sounds interesting, tell me more" or a "we already use a competitor" or a "I'm not the right person, try Sarah in procurement"? Each requires a different approach, and the nuance is the difference between booking a meeting and losing the prospect.

The agent cannot make strategic decisions. It can tell you that campaigns targeting healthcare companies underperform campaigns targeting fintech companies. It cannot tell you whether to abandon healthcare as a vertical or try a different messaging angle. That decision depends on your TAM analysis, competitive positioning, and revenue goals. The agent provides data. Humans provide strategy.

The agent cannot fix a bad product-market fit. If nobody wants what you're selling, the world's most sophisticated cold email agent will just help you discover that faster. Our best campaigns work because the product solves a real problem for the people we're emailing. The agent optimizes the delivery of that message. It doesn't create the message or the product behind it.

What We'd Do Differently

If we were starting over, three things.

First, we'd set up the lead cleaning pipeline before anything else. Our first month's biggest win was catching the stale lead lists, but we could have prevented the problem entirely if every lead had been validated on entry from day one. Dirty data is the root cause of most cold email problems, and cleaning it at the source prevents issues downstream.

Second, we'd involve Marcus in defining the analytics report structure earlier. The first version included metrics he didn't use and excluded metrics he cared about. Three weeks of iteration to get the report right. Starting with Marcus's wish list would have saved those weeks.

Third, we'd start with fewer campaigns. We launched the agent across all 14 simultaneously, which made it hard to isolate what was working. Starting with three or four, validating, then expanding would have been smarter.

Is It Worth Setting Up?

Our setup took about two days. Marcus and Tomás spent one day defining thresholds (what bounce rate triggers a pause, what open rate drop triggers a flag, what constitutes a "stale" lead list). The second day was testing against historical data to validate the rules.

Ongoing maintenance is minimal. We've adjusted thresholds twice in six months. The holiday sending rule in December was the only new configuration we added.

Time saved: roughly 9 hours per week of manual campaign management. Over six months, that's around 234 hours. For Marcus, that's 234 hours redirected from checking dashboards and uploading CSVs to writing better sequences, analyzing what's working, and having conversations with prospects.

The reply rate improvement is harder to attribute solely to the agent, because we also made strategic changes (like the Director-level targeting shift) that the agent enabled but didn't execute. My best estimate: the agent's direct contributions (cleaner lists, faster problem detection, more consistent campaign health) account for about a 1.2 percentage point improvement in reply rate. The strategic insights from the weekly report account for another 0.8-1.0 points. Combined, we went from 2.3% to 4.2%.

An AI cold email agent is not magic. It's a monitoring and decision-support system that handles the boring, repetitive work humans are bad at (checking 22 dashboards daily) and frees humans for the work they're good at (writing compelling copy and making strategic calls). Six months in, I can't imagine going back.


Try These Agents

For people who think busywork is boring

Build your first agent in minutes with no complex engineering, just typing out instructions.