Articles

Pipedrive Contact Management: How AI Agents Fixed Our CRM Data Nightmare

Ibby SyedIbby Syed, Founder, Cotera
8 min readMarch 6, 2026

Pipedrive Contact Management: How AI Agents Fixed Our CRM Data Nightmare

Pipedrive Contact Management

Tomás pinged me on Slack last August with a screenshot that perfectly captured our CRM's state of decay. He'd searched for a prospect named "Sarah Chen" in Pipedrive and gotten four results. Sarah Chen, VP of Operations, Meridian Health. Sara Chen, VP Ops, Meridian Health Systems. S. Chen, VP Operations, Meridian. And my personal favorite: "Sarah Chen (2)", which someone had created when the first duplicate didn't show up fast enough in the search bar.

Four records. Same human being. Different data in each one. The first had her direct email but the wrong phone number. The second had the right phone number but no email. The third had notes from a discovery call that didn't appear on any other record. The fourth was completely empty except for the name and company.

I wish I could say this was an isolated incident. It was not. When I asked Priya to run a deduplication audit, she came back two days later looking like she'd seen something she couldn't unsee. "We have 11,400 contacts in Pipedrive," she told me. "About 2,300 of them are duplicates. Some are triplicates. A few are worse."

Twenty percent of our contact database was duplicates. One in five. And the duplicates weren't the worst part — they were just the most visible symptom of a deeper problem: nobody owned contact data quality. Reps created contacts on the fly during deal creation. Marketing imported lists from events without checking for existing records. Our web form integration created new contacts every time someone submitted a form, even if they were already in the system with a slightly different email address. The CRM was a landfill, and we'd been tossing garbage into it for two years.

Why Contact Data Rots (And Why You Should Care)

CRM contact data doesn't just degrade — it degrades in specific, predictable ways that compound over time.

The first is duplication, which I've already described. But duplication is just the entry point. Each duplicate creates a fragmented record. When you have three versions of the same contact, activity history splits across them. Emails land on one record, calls on another, notes on a third. No single record tells the full story. Reps make decisions based on incomplete context because the context is scattered across records they don't know exist.

The second is decay. People change jobs, get promoted, move companies. B2B contact data decays at roughly 30% per year. A third of your CRM becomes inaccurate annually through natural churn alone.

The third is incompleteness. Our lead scoring model weighted title heavily. Thirty-eight percent of our contacts had no title entered. Not because reps didn't know the title. Because typing it felt like paperwork when they were mid-deal. Diana told me point blank: "I'm a salesperson, not a data entry clerk." Fair. And also the root cause of our data quality problem. The system was maintained by people whose primary job is not maintaining it. That's a design flaw, not a people flaw.

The Import Problem

Before we could fix the existing data, we needed to stop the bleeding. New contacts were entering Pipedrive from five different sources: manual rep entry, web forms, event imports, marketing list purchases, and LinkedIn connections. Each source had its own format, its own level of completeness, and its own tendency to create duplicates.

Event imports were the worst offender. After every trade show, we'd get a CSV of badge scans — sometimes 200-400 contacts. Priya would do a quick cleanup, remove obviously bad entries, and import them into Pipedrive. But "quick cleanup" didn't include deduplication against existing records, because manually checking 300 names against 11,000 contacts is an afternoon nobody has. So every event added 15-25% duplicates to our database. Three events per quarter. You can do the math.

We set up a lead import agent that sits between our import sources and Pipedrive. Every contact that enters the system — whether from a form, a CSV import, or a manual entry — goes through the agent first.

The agent does three things before a contact touches Pipedrive. First, it checks for duplicates using fuzzy matching. Not just exact name matches — it catches "Sarah Chen" versus "Sara Chen," "Meridian Health" versus "Meridian Health Systems," different email formats from the same domain. Second, it enriches the record. Given a name and email, the agent pulls company data, title verification, LinkedIn profile, phone number, company size, and industry from public sources. Third, it normalizes the data. Job titles get standardized ("VP of Ops" and "Vice President, Operations" become the same canonical title), company names get cleaned up, phone numbers get formatted consistently.

The difference was immediate. Our first post-implementation event import — a fintech conference where we collected 287 badge scans — resulted in 241 new contacts and 46 merged duplicates. Every new contact entered Pipedrive with at least 12 populated fields, compared to the typical 3-4 fields from raw badge scan data. Kenji, who was assigned to follow up on those leads, told me it was the first time he'd received an event list where he could actually call people without first spending an hour looking up phone numbers.

The Deduplication Project

Stopping new duplicates was step one. The existing 2,300-odd duplicates were step two, and considerably more painful.

Automated deduplication sounds simple: find records that match, merge them. In practice, it's a minefield of edge cases. Which record is the "master" when you merge? The one with the most recent activity? The most complete data? The one attached to an active deal? What happens when two duplicate records have conflicting information — different phone numbers, different titles, different notes?

We learned this the hard way. The first automated merge pass, which I authorized with too much confidence and too little caution, merged 180 duplicate pairs. Most were fine. Twelve were disasters. The worst: two contacts with the same name at the same company who were actually different people. Father and son, both named Robert Vasquez, both at Vasquez Engineering. The agent merged them. We lost call notes, deal history, and a scheduled demo for the son's deal. Vivek, whose deal it was, had to call and explain why we'd sent a follow-up email referencing a conversation that happened with his prospect's father.

After the Robert Vasquez incident, we added a confidence threshold. The agent only auto-merges when it's 95% or more confident the records are genuine duplicates. Below that threshold, it flags the pair for human review and shows exactly what data exists on each record and what will be preserved or discarded during merge.

The human review queue took Priya about three weeks to work through, doing 30-50 reviews per day between other work. Tedious? Yes. But she caught seven more cases of same-name-different-person that would have been problematic if auto-merged. In a database of 11,000 contacts, having 7 near-collisions in 2,300 duplicate candidates is a 0.3% false positive rate. That's actually quite good. But 0.3% of ruined deals is still ruined deals, so the review step stays.

The Enrichment Layer

Deduplication cleaned the database. Enrichment filled it in.

Before enrichment, our average contact had 5.2 populated fields out of the 18 we track. After running the enrichment agent across our existing database, that jumped to 13.8 fields. The biggest gains were in phone numbers (went from 44% coverage to 81%), LinkedIn profiles (from 23% to 89%), and company firmographics like employee count and industry (from 31% to 92%).

Anya immediately noticed the difference. "I used to spend the first five minutes of every call prep looking up the person on LinkedIn to figure out their background," she said. "Now it's already in Pipedrive." Five minutes doesn't sound like much. Multiply it by 15 calls per day across 6 reps and that's 7.5 hours of daily research time that evaporated.

The enrichment isn't just one-time, either. The agent runs a refresh cycle every 90 days, checking for job changes, company updates, and data decay. Last quarter's refresh found that 14% of contacts had experienced some form of change — new title, new company, updated phone number. Without the refresh, that 14% would have silently gone stale. By Q4, we'd be making calls to wrong numbers and addressing people by outdated titles. Not a trust-builder.

One thing I wish we'd done earlier: enrichment-based segmentation. "All VP-and-above contacts at companies with 200-500 employees in fintech who we haven't contacted in 90+ days" — that's an actionable list we couldn't build before because the data was too sparse. Now Sonia runs segments like that weekly for the SDR team, and outreach quality has noticeably improved.

What Didn't Work: The Over-Automation Trap

I want to be direct about something that didn't work, because I see other companies making the same mistake.

We tried automating contact lifecycle management — moving contacts through stages (lead, prospect, customer, churned) based on activity signals. If a contact hasn't been engaged in 180 days, downgrade them. If they respond to outreach, upgrade them.

The execution was a mess. Contacts associated with multiple deals got pulled in conflicting directions. Rafael called it "contact whiplash." Worse, Diana had a contact she hadn't emailed in four months because they'd agreed to reconnect after a merger. The automation saw silence and archived the contact. Diana didn't notice for three weeks.

We pulled lifecycle automation entirely. The AI is excellent at surfacing information — "this contact hasn't been engaged in 143 days, deal closed-lost in June" — but mediocre at judging what silence means for a relationship.

The Scoring Impact

Clean, enriched contact data transformed our lead scoring. Before the cleanup, our scoring model was essentially guessing. When 38% of contacts lack a title, 56% lack a phone number, and 77% lack LinkedIn data, any model you build on that data is going to produce garbage scores. Garbage in, garbage out. We all know the phrase. We were living it.

After deduplication and enrichment, the same scoring model — without changing any weights or logic — improved its accuracy by about 27%. The model hadn't gotten smarter. It just had better inputs. Sonia compared the pre-cleanup scores against actual outcomes and found that the post-cleanup scores correlated with conversion at nearly twice the rate.

That finding changed how I think about CRM investment. We'd been shopping for better scoring tools when what we actually needed was better data. The scoring tool was fine. Our database was the bottleneck. I suspect this is true for more companies than would admit it.

The Numbers

Six months after starting the contact management overhaul, here's where we stand.

Duplicate rate: down from 20% to under 2%. New duplicates still occur occasionally — the fuzzy matching isn't perfect, and truly unusual name collisions will always be edge cases. But the days of finding four Sarah Chens are over.

Field completeness: average fields populated per contact went from 5.2 to 13.8 out of 18. That remaining gap is mostly fields that genuinely don't apply (not every contact has a direct phone line) or that require human context (like relationship notes).

Time spent on data maintenance: Priya used to spend roughly 12 hours per week on CRM cleanup. That's down to about 3 hours — mostly reviewing the deduplication queue and handling edge cases the automation surfaces for human judgment.

Rep research time per call: from an estimated 5-7 minutes to under 2 minutes. Reps open the contact record and the information is there. No LinkedIn tab-switching. No "let me just check something real quick" during the first minute of a call.

The number I care about most: outbound connect rate improved from 3.1% to 4.8%. Better phone numbers plus better targeting equals more conversations. Multiply that 1.7 point improvement by 400 weekly outbound attempts and it's roughly 7 additional conversations per week. At our conversion rate, that's meaningful pipeline.

The Ongoing Reality

Contact management is not a project with an end date. It's a discipline. The AI agents handle the heavy lifting, but Priya still reviews weekly. Edge cases still surface. Company acquisitions create naming conflicts. Common names remain tricky.

But data quality went from a background problem nobody owned to a system that maintains itself with human oversight. The mess doesn't accumulate anymore because the agents catch it at the point of entry.

If your Pipedrive CRM has been accumulating contacts for more than a year without systematic cleanup, the data is worse than you think. It's solvable — if you stop asking humans to do a machine's job.


Try These Agents

  • Lead Import Agent -- Clean, deduplicate, and enrich contacts before they enter your Pipedrive CRM
  • Contact Enrichment Agent -- Enrich existing contacts with firmographic data, LinkedIn profiles, and phone numbers
  • Activity Lead Scoring -- Score leads based on complete, clean contact data and engagement signals

For people who think busywork is boring

Build your first agent in minutes with no complex engineering, just typing out instructions.