Replace OpenInsider scraping with the EdgarKit API

Plenty of indie projects start by scraping OpenInsider's HTML for Form 4 data. It works for a weekend, then breaks. Here's how to swap the scraper for a proper API in under an hour.

Want this data via API instead of reading about it? Get a free API key →

The problem

OpenInsider is great as a website. It is not great as a backend. The HTML structure shifts periodically, the site has no SLA, and aggressive scrapers get rate-limited silently. If your product depends on OpenInsider data, your product depends on a fragile dependency chain.

Worse: even when your scraper is working, OpenInsider only covers Form 4. You can't get Form 3, Form 5, 13F, 13D/G, or 8-K data from it. Anything beyond insider trades requires a second pipeline.

The EdgarKit API replaces all of this with a single integration.

The approach

We'll show three migration paths, in increasing order of completeness:

  1. Drop-in replacement for the most common OpenInsider scrape (recent insider buys).
  2. Add filters you couldn't easily do via scraping (officer-only, dollar thresholds, transaction-code filters).
  3. Webhook subscription to replace polling entirely.

Step 1: Drop-in replacement for recent insider buys

The most common OpenInsider scrape pulls the "latest insider trading" table. The equivalent EdgarKit call:

curl "https://api.edgarkit.com/v1/filings?form_type=4&transaction_code=P&limit=50" \
  -H "Authorization: Bearer YOUR_API_KEY"

That returns the 50 most recent open-market purchase filings as JSON. You can drop this into wherever your scraper used to inject Form 4 data.

In Python, the shape becomes:

import requests

resp = requests.get(
    "https://api.edgarkit.com/v1/filings",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    params={"form_type": 4, "transaction_code": "P", "limit": 50},
)
filings = resp.json()["data"]

for f in filings:
    print(f"{f['issuer_ticker']:6} {f['reporter_name']:30} ${float(f['total_value']):>12,.0f}")

That replaces a typical 100-line scraper with about a dozen lines.

Step 2: Add filters you couldn't do via scraping

OpenInsider's filters are limited to what its UI exposes. With the API you can specify exactly:

# Officer-only purchases above $250k from the last 7 days
resp = requests.get(
    "https://api.edgarkit.com/v1/filings",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    params={
        "form_type": 4,
        "transaction_code": "P",
        "min_value": 250000,
        "reporter_is_officer": "true",
        "since": "2026-06-12",
        "limit": 100,
    },
)

This kind of multi-condition filter would require either complex screen-scraping logic against OpenInsider's UI or post-filtering of a much larger dataset.

The filterable fields on /v1/filings:

  • form_type, 3, 4, 5, 13F-HR, 13D, 13G, 8-K, 10-K, 10-Q, S-1, DEF 14A, etc.
  • transaction_code, P, S, A, M, F, etc. (Form 4-specific)
  • ticker, single ticker
  • issuer_cik, single CIK
  • reporter_cik, filer CIK (Form 4)
  • min_value / max_value, dollar value bounds
  • reporter_is_officer, reporter_is_director, reporter_is_ten_percent_owner, Form 4 role filters
  • since / until, date range
  • limit, pagination

Step 3: Switch from polling to webhooks

If your project polls OpenInsider on a schedule, you can replace the entire poll loop with a webhook subscription. EdgarKit pushes new filings to your endpoint within ~30 seconds of SEC acceptance.

# Register a webhook to receive every Form 4 P-code purchase in real time
curl -X POST "https://api.edgarkit.com/v1/webhooks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-server.com/edgarkit-events",
    "filters": {
      "form_types": ["4"],
      "transaction_codes": ["P"],
      "min_value": 100000
    }
  }'

Your server now receives a POST for every qualifying filing, with the parsed JSON body. Verify the X-EdgarKit-Signature header on each request, there's a complete example in our webhook setup guide.

Putting it together

A minimal migration script that compares your existing OpenInsider data shape with the EdgarKit equivalent so you can verify parity before cutting over:

import requests
import json

# Old scraping result (read from your existing pipeline output file)
with open("openinsider_yesterday.json") as f:
    old = json.load(f)

# New EdgarKit result, same date range, same filter
resp = requests.get(
    "https://api.edgarkit.com/v1/filings",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    params={
        "form_type": 4,
        "transaction_code": "P",
        "since": "2026-06-18",
        "until": "2026-06-19",
        "limit": 200,
    },
)
new = resp.json()["data"]

# Compare counts and surface anything you had that EdgarKit didn't (or vice versa)
old_tickers = {f["ticker"] for f in old}
new_tickers = {f["issuer_ticker"] for f in new}

only_old = old_tickers - new_tickers
only_new = new_tickers - old_tickers
both = old_tickers & new_tickers

print(f"In both: {len(both)}")
print(f"Only in OpenInsider scrape: {only_old}")
print(f"Only in EdgarKit: {only_new}")

If the comparison shows full overlap (or EdgarKit returning a superset, which is typical because we cover smaller filings OpenInsider sometimes drops), you can cut over with confidence.

What you gain by switching

  • Stability. API contract is versioned. You won't wake up to a broken scraper because OpenInsider rearranged a table.
  • Broader coverage. EdgarKit covers all SEC filing types, not just Form 4.
  • Real-time delivery. Webhooks within 30 seconds of SEC acceptance.
  • Structured data. No HTML parsing. JSON with CUSIPs already mapped to tickers.
  • Footnote text. Form 4 footnotes (10b5-1 references, indirect ownership explanations) come through in the payload.

See the side-by-side breakdown at EdgarKit vs OpenInsider.

FAQ

Is the data really the same?

The source is identical: SEC Form 4 filings, fetched from EDGAR. Both OpenInsider and EdgarKit are downstream consumers of the same upstream data. EdgarKit additionally covers form types OpenInsider doesn't index.

How long does the migration take?

For most projects, under an hour. Swap the HTTP call, adjust field names (tickerissuer_ticker, etc.), and you're done. The hardest part is usually deciding which filters to add now that you have proper ones available.

Will my downstream code break?

Field names differ. EdgarKit uses snake_case consistently (issuer_ticker, reporter_name, transaction_date). If your existing pipeline expected OpenInsider's column names, you'll need a thin adapter.

What about historical backfill?

EdgarKit indexes Form 4 history back to 2003 (the post-Sarbanes-Oxley regime). You can pull historical date ranges via since and until parameters.

Is the free tier enough for migration?

For prototyping and verification, yes. The free tier covers 1,000 requests per month. For production workloads with frequent polling or many filters, the Basic ($19/mo) or Pro ($79/mo) tiers are worth pricing out.