Our Instantly Open Rates Hit 68%. An AI Agent Got Us There.

In October, our best Instantly campaign had a 34% open rate and a 1.8% reply rate. By February, the same campaign structure was hitting 68% open rates and 5.2% reply rates. That didn't happen because we got lucky with a subject line. It happened because we changed how we approach optimization.
The short version: we stopped testing one thing per week by hand and started letting an AI agent analyze step-level performance data daily, surface winning patterns, and recommend changes faster than any human could.
Here's the longer version.
Month One: The Manual Grind
Tomás ran our Instantly optimization the old-fashioned way for the first month. He had a process. Every Monday, he'd review the previous week's campaign data. He'd pick the lowest-performing element, usually a subject line or opening sentence, create an A/B variant, and let it run for a week. The following Monday, he'd check results and repeat.
This process is fine. It's what most outbound teams do. The problem is speed.
One test per week means four tests per month. With a four-step email sequence across six active campaigns, Tomás had 24 elements he could optimize: subject lines, opening sentences, CTAs, sending times, sequence spacing, and personalization variables. At one test per week, covering every element would take six months. And by the time he circled back to the first element, the market had changed, the ICP had evolved, and the original test results were stale.
He was also optimizing blind. Instantly shows you campaign-level open and reply rates. It shows A/B variant performance. But it doesn't connect patterns across campaigns. A subject line style that works in Campaign 3 might also work in Campaign 7, but Instantly won't tell you that. Tomás had to notice it himself, and with six campaigns running simultaneously, he missed patterns constantly.
After month one, open rates were 37%. Reply rates were 2.1%. Marginal improvement. Tomás was frustrated. "I'm spending five hours a week on optimization and moving the needle by fractions of a percent. Something's wrong with the process."
Month Two: Adding the Agent
We connected a campaign performance tracker agent to our Instantly account. The agent pulls step-level analytics from every active campaign daily and analyzes performance across multiple dimensions.
The first thing it found was embarrassing. Our step 3 email, the second follow-up, had a 12% open rate across all six campaigns. Twelve percent. Tomás had been focused on optimizing step 1 subject lines because that's where the volume was. He'd never isolated step 3 performance because Instantly's dashboard shows blended campaign metrics by default. You have to dig into individual campaign analytics to see step-level data, and even then, comparing across campaigns requires manual work.
The agent flagged step 3 immediately. Its analysis: "Step 3 open rate is 67% lower than step 2 across all campaigns. Step 3 subject line patterns use 'Following up' or 'Quick follow-up' in 5 of 6 campaigns. Steps 1 and 2 use question-format subject lines. Recommend testing question-format subject lines on step 3."
We made the change across all six campaigns. Step 3 open rates went from 12% to 31% in the first week. Not because the suggestion was genius, but because the problem was so obvious that any analysis would have caught it. The agent caught it on day one. Tomás would have eventually noticed it on week... honestly, he might never have gotten to step 3. He was still working on step 1.
Month Three: Pattern Recognition
This is where the agent started paying for itself in ways we didn't anticipate.
With 60 days of daily analytics data across six campaigns, the agent had enough data to identify patterns that no human was going to find in a spreadsheet.
Pattern one: subject lines with numbers outperformed subject lines without numbers by 23% in open rate. "3 ideas for [Company]'s outbound" beat "Ideas for [Company]'s outbound." Consistently. Across every campaign. This isn't a new insight in cold email, but we hadn't systematically tested it because we were testing one thing at a time.
Pattern two: emails sent between 7:00 and 8:00 AM in the recipient's local timezone had a 41% higher reply rate than emails sent between 9:00 and 10:00 AM. Our sending schedule had been 9:00 AM because "that's when people check email." Turns out, that's also when everyone else sends their cold email. The 7:00-8:00 window caught people during their first inbox scan of the day, before the flood.
Pattern three, and this one surprised everyone: shorter sequences outperformed longer ones. Our four-step sequences had a 3.1% cumulative reply rate. When we tested three-step sequences, reply rates went up to 3.9%. Steps 3 and 4 in the four-step sequence weren't just underperforming. They were actively hurting overall campaign results. The agent's hypothesis: by step 4, prospects who hadn't replied were annoyed, and some were marking us as spam. Cutting the sequence from four to three steps reduced spam complaints by 44%.
Elena was skeptical at first. "You're telling me sending fewer emails gets more replies?" Yes. Because sending the right emails to the right people at the right time matters more than sending more emails to the same people.
Month Four: Compounding Gains
By month four, we were running what amounted to a continuous optimization loop. The agent analyzed daily. We implemented changes two to three times per week instead of once. Each change was informed by cross-campaign data rather than single-campaign intuition.
The results compounded. Open rates climbed from 37% (end of month one) to 48% (end of month two) to 57% (end of month three) to 68% (end of month four). Reply rates went from 2.1% to 3.1% to 4.4% to 5.2%.
These aren't magic numbers. They're the result of making many small, data-informed changes quickly. Subject line optimization. Send time optimization. Sequence length optimization. Step-level copy changes. Personalization variable testing. Each change moved the needle by a fraction. But dozens of fractional improvements compounded into a 2x improvement in open rates and a nearly 3x improvement in reply rates.
The speed difference between manual and agent-assisted optimization is the entire story. Manual testing at one change per week: 16 tests in four months. Agent-recommended testing at three changes per week: roughly 48 tests in four months. Three times as many iterations, each informed by better data.
Tomás, who was originally spending five hours per week on manual optimization, now spends about two hours per week reviewing agent recommendations and implementing the ones that make sense. Not all recommendations are worth implementing. The agent once suggested testing emoji in subject lines based on a pattern from one campaign. We ignored that. The agent also suggested a specific personalization approach for VP-level prospects that increased reply rates by 1.3 percentage points for that segment. We implemented that immediately.
What the Agent Can't Do
The agent doesn't write copy. It identifies what's working and what isn't, and it recommends structural changes. "Question-format subject lines outperform statement-format by 23%" is a recommendation. Writing the actual question is still a human job.
The agent also can't fix bad targeting. If your lead list is wrong, no amount of subject line optimization will save you. We tested this accidentally when a campaign targeting early-stage startups underperformed despite having our best-performing subject line templates. The agent flagged the low performance but its optimization recommendations didn't help because the problem wasn't the emails. It was the audience. Priya caught it during a review, swapped the ICP to mid-market SaaS, and reply rates jumped from 1.1% to 4.8% with the same email copy.
The agent is an optimization layer, not a strategy layer. Strategy is still human work: who are we targeting, what's the value prop, what's the offer. Optimization is the execution layer: given this strategy, what's the best way to execute it? That's where the agent excels.
The 68% Question
People ask if 68% open rates are real. They are. But context matters.
Our campaigns are highly targeted. Small lead lists, 200-400 leads per campaign. Personalized first lines. Warmed domains with clean sender reputations. We're not blasting 10,000 generic emails and hoping for the best. The 68% number is achievable because the foundation is solid.
If you're sending unpersonalized emails to unverified lists from fresh domains, an optimization agent won't get you to 68%. It'll get you from bad to less bad. Fix the foundation first: clean data, warmed domains, relevant targeting. Then let the agent optimize execution on top of that foundation.
The agent took us from 34% to 68%. That's a 2x improvement. But the 34% starting point was already above average because we'd done the foundational work. If our starting point had been 15%, the agent might have gotten us to 30%. Still a 2x improvement. Still worth it. But 30% isn't 68%.
Start with the basics. Then let the agent compound small gains over time. That's the formula.
Try These Agents
- Instantly Campaign Performance Tracker -- Daily step-level analytics across all campaigns with pattern detection and optimization recommendations
- Instantly Daily Campaign Digest -- Morning summary of campaign health, top performers, and metrics needing attention
- Instantly Apollo Cold Outreach -- Smart lead sourcing from Apollo with filters informed by campaign performance data
- Instantly Lead Quality Auditor -- Pre-screen leads to ensure clean data before they enter optimized campaigns