How to detect insider cluster buying

Cluster buying is when multiple insiders at the same company independently buy stock within a short window. It is one of the most studied insider-trading signals. Here's how to build a cluster detector with the EdgarKit API.

Want this data via API instead of reading about it? Get a free API key →

The problem

Single insider purchases are noisy. CEOs buy for many reasons, not all informational. But when three or more *different* insiders at the same company all buy on the open market within the same 7-14 day window, the pattern carries real predictive value in academic studies. Building a detector means pulling Form 4 filings, filtering to open-market purchases, grouping by issuer, and counting distinct buyers.

The approach

We'll pull all Form 4 filings with transaction code P (open-market purchase) above a meaningful dollar threshold, group by issuer CIK, count distinct reporter CIKs within a rolling window, and surface anything with 3+ distinct buyers. The full pipeline is:

  1. Pull recent Form 4 filings filtered to code P and a minimum value.
  2. Group by issuer CIK.
  3. For each issuer, count distinct reporter CIKs in the last 14 days.
  4. Surface clusters of 3+.

Step 1: Pull the candidate filings

Hit the EdgarKit API for recent Form 4 P-code purchases. Use a 14-day lookback and a $50,000 minimum to filter out noise:

curl "https://api.edgarkit.com/v1/filings?form_type=4&transaction_code=P&min_value=50000&since=2026-06-05&limit=500" \
  -H "Authorization: Bearer YOUR_API_KEY"

The response is a JSON array. Each entry has issuer_cik, issuer_ticker, reporter_cik, reporter_name, transaction_date, shares, price_per_share, and total_value.

Step 2: Group by issuer

In Python:

import requests
from collections import defaultdict
from datetime import date, timedelta

API_KEY = "YOUR_API_KEY"

end = date.today()
start = end - timedelta(days=14)

resp = requests.get(
    "https://api.edgarkit.com/v1/filings",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={
        "form_type": 4,
        "transaction_code": "P",
        "min_value": 50000,
        "since": start.isoformat(),
        "limit": 500,
    },
)
resp.raise_for_status()
filings = resp.json()["data"]

by_issuer = defaultdict(list)
for f in filings:
    by_issuer[f["issuer_cik"]].append(f)

Step 3: Count distinct buyers per issuer

A cluster requires 3 or more *different* people buying. Group by reporter_cik, not by filing count (one person filing three times is not a cluster):

clusters = []
for cik, entries in by_issuer.items():
    distinct_buyers = set(e["reporter_cik"] for e in entries)
    if len(distinct_buyers) >= 3:
        total = sum(float(e["total_value"]) for e in entries)
        clusters.append({
            "issuer_cik": cik,
            "ticker": entries[0]["issuer_ticker"],
            "issuer_name": entries[0]["issuer_name"],
            "buyer_count": len(distinct_buyers),
            "filing_count": len(entries),
            "total_dollars": total,
            "buyers": sorted({e["reporter_name"] for e in entries}),
        })

# Sort by total dollar value, biggest clusters first
clusters.sort(key=lambda c: c["total_dollars"], reverse=True)

Step 4: Filter, format, surface

You can tighten further by requiring at least one buyer to be an officer (Form 4 includes the relationship flags):

# Filter to clusters that include at least one officer
officer_clusters = []
for c in clusters:
    issuer_entries = by_issuer[c["issuer_cik"]]
    has_officer = any(e.get("reporter_is_officer") for e in issuer_entries)
    if has_officer:
        officer_clusters.append(c)

Print the results:

for c in officer_clusters:
    print(f"{c['ticker']:6} {c['buyer_count']} buyers, ${c['total_dollars']:>12,.0f}")
    print(f"       {c['issuer_name']}")
    for b in c['buyers'][:5]:
        print(f"         {b}")

Putting it together

A complete daily-cluster detector script you can run on cron:

#!/usr/bin/env python3
"""Daily insider cluster buy detector. Run on cron at 6pm ET after the
SEC filing day winds down."""

import os
import sys
import requests
from collections import defaultdict
from datetime import date, timedelta

API_KEY = os.environ["EDGARKIT_API_KEY"]
WINDOW_DAYS = 14
MIN_VALUE = 50_000
MIN_BUYERS = 3

end = date.today()
start = end - timedelta(days=WINDOW_DAYS)

resp = requests.get(
    "https://api.edgarkit.com/v1/filings",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={
        "form_type": 4,
        "transaction_code": "P",
        "min_value": MIN_VALUE,
        "since": start.isoformat(),
        "limit": 1000,
    },
    timeout=30,
)
resp.raise_for_status()
filings = resp.json()["data"]

by_issuer = defaultdict(list)
for f in filings:
    by_issuer[f["issuer_cik"]].append(f)

clusters = []
for cik, entries in by_issuer.items():
    buyers = {e["reporter_cik"] for e in entries}
    if len(buyers) >= MIN_BUYERS:
        clusters.append({
            "ticker": entries[0]["issuer_ticker"],
            "name": entries[0]["issuer_name"],
            "buyers": len(buyers),
            "filings": len(entries),
            "dollars": sum(float(e["total_value"]) for e in entries),
        })

clusters.sort(key=lambda c: c["dollars"], reverse=True)

if not clusters:
    print("No clusters today.")
    sys.exit(0)

print(f"Insider cluster buys, last {WINDOW_DAYS} days, min {MIN_BUYERS} buyers:")
print()
for c in clusters[:20]:
    print(f"  {c['ticker']:6} {c['buyers']} buyers   ${c['dollars']:>12,.0f}   {c['name']}")

Schedule with cron (0 22 * * 1-5 for 6pm ET weekdays, accounting for UTC offset) and pipe the output to Slack, email, or a dashboard.

Tuning the detector

  • Tighten the window. A 7-day cluster is rarer and stronger than a 14-day cluster.
  • Raise the minimum value. $250k per filing surfaces the highest-conviction names; $50k catches more candidates.
  • Officer-only filter. Restrict to filings where the reporter is an officer. CEO/CFO clusters are the strongest variant.
  • Exclude 10b5-1 buys. Pre-arranged plans should be removed if you want pure discretionary cluster signal. EdgarKit returns a references_10b5_1 flag on Form 4 payloads.

FAQ

What's the empirical evidence for cluster buying?

Academic studies including Cohen, Malloy, and Pomorski (2012) found that opportunistic insider buys, cleaned up to discretionary clusters, outperform passive insider transactions. Cluster filters are a recurring feature of factor research because they isolate the signal from the noise.

How is a cluster different from a single big buy?

A single buy reflects one person's view. A cluster reflects multiple insiders, independently, choosing to commit personal capital in the same window. The multi-person confirmation is what makes the pattern statistically interesting.

Should I include sales in the cluster logic?

Probably not. Cluster sells are noisier than cluster buys because of 10b5-1 plans and tax timing. If you do include them, restrict to discretionary sales not referencing a 10b5-1 plan.

Does EdgarKit return the officer flag directly?

Yes. Form 4 responses include reporter_is_officer, reporter_is_director, and reporter_is_ten_percent_owner booleans so you can filter without re-parsing the relationship field.

That's a harder problem. You'd group by industry or sector rather than by issuer CIK and look for cross-issuer clustering. Useful for thematic signals but requires sector classifications EdgarKit doesn't ship in the response by default.