Articles

LinkedIn Scraper: Why We Stopped Scraping and Started Using APIs

Ibby SyedIbby Syed, Founder, Cotera
9 min readMarch 6, 2026

LinkedIn Scraper: Why We Stopped Scraping and Started Using APIs

LinkedIn Scraper AI Alternative

Elena wrote the first version on a Friday afternoon. A Python script, 340 lines, BeautifulSoup and Selenium. It logged into LinkedIn with a burner account, navigated to search results pages, and pulled name, title, company, location, and profile URL for every result. She ran it against "VP of Marketing at SaaS companies" and got 2,100 profiles in about four hours.

The sales team was thrilled. Two weeks later, we had 11,000 profiles in a spreadsheet. Three weeks after that, our company LinkedIn page got restricted. Not the burner account. The company page. LinkedIn traced the scraping activity back to our domain somehow. We lost the ability to post from our company page for 22 days.

That was 14 months ago. We don't scrape LinkedIn anymore. But I understand why people still do. The data is sitting right there, behind a login screen, and LinkedIn charges a premium to access it through legitimate channels. The temptation is real.

How LinkedIn Scrapers Actually Work

There are two approaches. The dumb way and the slightly less dumb way.

The dumb way is raw HTTP scraping. Send requests to LinkedIn URLs, parse the HTML response, extract data from the DOM. This hasn't worked reliably since about 2021. LinkedIn renders most of their pages client-side now, so the HTML you get from a raw HTTP request is mostly empty JavaScript templates. You need something that executes JavaScript.

The slightly less dumb way is browser automation. Selenium, Puppeteer, Playwright. You launch a headless browser, log into LinkedIn, navigate pages, and extract data from the rendered DOM. This is what most LinkedIn scrapers do today, including commercial ones like Phantombuster's LinkedIn scraper phantom and various open-source tools on GitHub.

Elena's script used Selenium with a Chrome driver. It worked well. She added random delays between 4 and 12 seconds between page loads. She used a residential proxy to avoid IP-based blocking. She ran it during business hours EST to look like normal usage. She did everything the scraping tutorials recommend.

LinkedIn still caught it. Here's why.

Why LinkedIn Catches Scrapers

LinkedIn employs a layered detection system. Any single layer might miss you. The combination rarely does.

Rate limiting is the first layer. LinkedIn tracks how many pages you view per hour, per day, per week. Normal users view maybe 30-50 profiles on a busy day. Scrapers viewing 200+ profiles trigger this easily. Elena's script hit about 500 profiles per session, which in retrospect was aggressive.

Browser fingerprinting is the second layer. Headless browsers have detectable characteristics. Screen resolution of 0x0. Missing WebGL renderer. Specific navigator properties that differ from real Chrome. LinkedIn runs JavaScript checks that flag these. You can spoof most of them, but LinkedIn updates their checks regularly and there's an arms race.

Behavioral analysis is the third layer and the hardest to beat. Real humans don't view profiles in alphabetical order. They don't spend exactly 6.3 seconds on each page. They don't navigate in a perfect linear sequence from search results to profile to search results to profile. Even with randomized delays, the overall pattern of a scraper looks nothing like a human browsing LinkedIn.

Account clustering is the fourth layer, and this is what got us. LinkedIn tracks relationships between accounts. The burner account Elena used had listed our company as an employer during setup (mistake). When that account got flagged for scraping, LinkedIn looked at what company it was associated with. Our company page got swept up in the enforcement.

What Data You Can Actually Get Through APIs

LinkedIn's official API is surprisingly limited for most use cases. The Marketing API gives you campaign data. The consumer API gives you basic profile info for authenticated users. Neither gives you what scrapers are really after: bulk profile data for people you're not connected with.

But there's a whole ecosystem of legitimate data providers that sit between scraping and LinkedIn's API. They've done the legal and compliance work so you don't have to.

Apollo has LinkedIn profile data for about 275 million contacts. You can search by title, company, location, company size, and get back name, email, LinkedIn URL, and current role. Is every record perfectly accurate and up to date? No. But the data is sourced through partnerships and public records, not by scraping LinkedIn directly.

Clearbit, now part of HubSpot, offers similar enrichment. Give it a company domain, get back employee data including LinkedIn profiles. ZoomInfo, RocketReach, Lusha. They all operate in this space. Pricing varies from $50/month for individual plans to $15,000+/year for enterprise access.

The third approach is AI agents that work with public data and APIs. A LinkedIn Company Research agent can pull together company information, employee data, recent news, and hiring patterns without ever logging into LinkedIn or scraping a single page. It cross-references multiple data sources to build a picture that's often more complete than what you'd get from scraping profiles directly.

The Real Tradeoffs: Scraping vs. APIs vs. AI

I'll be honest about this. Each approach has genuine strengths.

Scraping is free (or nearly free, just server costs). You get exactly the data you see on LinkedIn, which is current and first-party. You control the query parameters completely. If you need "people who changed jobs in the last 90 days at companies that posted about data engineering," you can build that search. Try getting that from an API provider.

The cost is risk. LinkedIn restricts accounts, sends cease-and-desist letters to companies, and has sued multiple scraping operations. HiQ Labs won a court case about scraping public LinkedIn data in 2022, but LinkedIn continued to fight it and the legal situation remains murky. For a startup, one restricted company page can tank your employer brand and recruiting pipeline simultaneously.

API-based data providers cost money. Apollo's free tier gives you 10,000 credits. After that, $49-119/month per user. For a sales team of five doing heavy prospecting, you're looking at $3,000-7,000/year. The data isn't always fresh. Someone changed jobs two weeks ago and Apollo still shows their old title. But the data comes without legal risk and without the constant maintenance of keeping a scraper running.

AI agents cost somewhere in between and take a different approach entirely. Instead of giving you a database dump, they answer specific questions. "Who at Acme Corp would care about our product?" "What has this person been posting about lately?" "Which companies in our ICP just raised funding?" You don't get a CSV of 10,000 profiles. You get actionable intelligence about the 50 people you should actually contact.

What Elena Does Now

Elena didn't stop wanting LinkedIn data. She stopped wanting to maintain a fragile scraping infrastructure that could get the company in trouble at any moment.

Her current setup uses Apollo for bulk prospecting data. When the sales team needs a list of "senior data engineers at fintech companies in North America," they pull it from Apollo. About 4,200 records came back from that exact query last month. Maybe 85% of those are accurate and current. Good enough for top-of-funnel.

For deeper research on specific accounts, she uses AI agents. A LinkedIn Person Finder identifies the right contacts at target companies and pulls together what they've been sharing, what topics they engage with, and what the company has been up to recently. This used to take an SDR 20-30 minutes per account to do manually by browsing LinkedIn. The agent does it in seconds and catches things the SDR would miss, like a prospect commenting on a competitor's post three days ago.

For competitive intelligence, she runs a LinkedIn Engagement Analyzer weekly to see what's working in their space. Who's getting engagement. What topics are resonating. Which competitor's thought leadership is gaining traction. You don't need to scrape LinkedIn to get this. You need to read it systematically.

She also set up a LinkedIn Content Tracker that monitors specific companies and people. When a target account's CEO starts posting about a problem their product solves, the sales team knows about it that day. Not because they scraped LinkedIn. Because an AI agent noticed it.

The Scraping Arms Race Is Over

I say this as someone who scraped LinkedIn for months and thought we were clever about it. The arms race between scrapers and LinkedIn's detection is over, and LinkedIn won. Not because their detection is perfect. It isn't. But because the cost of getting caught has gone up while the alternatives have gotten good enough.

In 2020, getting your LinkedIn account restricted meant creating a new one. Now, LinkedIn ties restrictions to your company, your IP range, and your domain. The blast radius is bigger. And if you're a B2B company, your LinkedIn presence IS your top-of-funnel. Losing access to it for three weeks costs more than a year of Apollo licenses.

Tomás ran a scraping operation at his last company that pulled 50,000 LinkedIn profiles per month. He built an entire prospecting engine around that data. In March 2025, LinkedIn restricted both the scraping account and his personal account. The personal account restriction meant he couldn't post thought leadership content for 18 days. During those 18 days, his company's inbound pipeline dropped 31%. The free data was never actually free.

The math has changed. The tools have changed. LinkedIn's enforcement has changed. The scrapers that worked in 2022 are liabilities in 2026.

Build your prospecting stack on data sources that won't disappear overnight. Use APIs for bulk data. Use AI agents for intelligence. Use LinkedIn the way LinkedIn intends, as a platform where you show up as a real person and have real conversations. The companies doing this are outperforming the ones still running headless Chrome in a Docker container and praying they don't get caught.


Try These Agents

For people who think busywork is boring

Build your first agent in minutes with no complex engineering, just typing out instructions.