Deal Origination

Why Every PE Firm Is Looking at the Same Deals

Jeff Baehr·Oct 2025·14 min read

Last updated March 29, 2026

Praxis Rock Advisors' proprietary deal origination infrastructure surfaces acquisition targets from primary-source data, including government filings, regulatory databases, and industry registries, that commercial platforms have not indexed. The average PE firm sees only 16.5% of relevant deals (Bain & Company, 2024). Praxis Rock's AI-driven systems build bespoke target universes for each engagement, identifying companies that are off-market, misclassified, or entirely absent from the commercial databases that every other buyer is filtering.

Executive Summary

Most PE firms evaluate only 16.5% of relevant deals because commercial databases surface identical targets to every subscriber using the same underlying data.

The private equity industry has a sourcing problem it does not discuss openly. Despite record levels of dry powder, despite the proliferation of deal sourcing platforms, and despite the professionalization of business development functions across the industry, the vast majority of PE firms are filtering the same commercial databases, attending the same conferences, and calling on the same intermediaries. The result is a market in which competitive intensity concentrates on a narrow band of visible targets while the majority of potential acquisitions remain entirely outside the buyer's field of vision.

Bain & Company's 2024 Global Private Equity Report found that the average PE firm sees only 16.5% of the deals relevant to its stated investment criteria. That figure should be alarming. It means that for every acquisition a firm closes, there are roughly five comparable targets it never evaluated, never contacted, and never knew existed. The implications for portfolio construction, entry multiples, and long-term returns are significant.

This article examines why the problem persists, what primary data sources that generate off-market deal flow actually look like, and how AI-driven systems can build target universes that commercial databases have not indexed.

Why Does Every PE Firm See the Same Deals?

Every firm sees the same deals because commercial databases all scrape the same public sources, producing functionally identical target lists for any given search.

The answer is structural, not behavioral. The commercial databases that dominate PE deal sourcing are built on the same underlying data infrastructure. They scrape the same public filings, license the same third-party datasets, and apply similar classification algorithms to organize companies into searchable taxonomies. The result is a high degree of overlap in the companies each platform surfaces for any given search.

When a PE firm's business development team logs into a commercial database and filters for HVAC services companies in the Southeast with $10 million to $50 million in revenue, they retrieve a list that is functionally identical to the list their competitors retrieve using the same parameters. The platform's value proposition is convenience, not exclusivity. It organizes publicly available information into a searchable format. It does not generate proprietary intelligence.

Some newer platforms have made meaningful advances in using natural language processing to classify companies by what they actually do, rather than relying solely on SIC and NAICS codes. This is a genuine improvement. But the underlying data sources remain largely the same: company websites, LinkedIn profiles, news articles, and the same government filings that every other platform also ingests. The classification may be more accurate, but the universe of companies being classified has not expanded materially.

The consequence is that PE deal sourcing has become a competition over the same finite set of visible targets. Firms differentiate on speed of outreach, quality of relationships, and willingness to pay premium multiples, but not on the breadth of the opportunity set they evaluate. This is a structural inefficiency that the industry has largely accepted as a given.

It is not a given. It is a function of where firms look for targets, and most firms look in the same places.

What Primary-Source Data Actually Means

Primary-source data is information obtained directly from government agencies, regulatory bodies, or companies themselves, bypassing the coverage gaps and classification errors of aggregated databases.

Primary-source data, in the context of deal origination, refers to information obtained directly from the entity that created it, rather than from an aggregator or intermediary that has compiled, cleaned, and resold it. The distinction matters because aggregators introduce three forms of information loss: coverage gaps, classification errors, and temporal lag.

Coverage gaps arise because commercial databases are built to serve the broadest possible customer base. They prioritize companies that are most likely to be searched for, which means they systematically underrepresent businesses that operate in niche verticals, use non-standard business descriptions, or have minimal digital footprints. A propane distribution company operating under a DBA that does not mention propane anywhere on its website will not appear in a keyword-based search. A specialty chemical distributor that files its business license under a holding company name will not be associated with the chemical distribution vertical in any commercial database.

Classification errors compound the coverage problem. SIC and NAICS codes are self-reported and rarely updated. A company that was classified as a general contractor in 2008 may now derive 90% of its revenue from environmental remediation services, but its classification has not changed. Commercial databases that rely on these codes as a primary taxonomy will misclassify the company, and buyers searching for environmental services targets will never see it.

Temporal lag means that commercial databases reflect the state of a company at the time it was last indexed, which may be months or years in the past. A company that has grown from $5 million to $25 million in revenue since its last data refresh will not appear in searches filtered for companies above $15 million. A company that has been acquired will continue to appear as an independent target until the database is updated.

Primary-source data bypasses all three problems. Government filings, such as state business registrations, professional license databases, environmental permits, and healthcare facility certifications, are created by the companies themselves or by the regulatory bodies that oversee them. They are updated on regulatory timelines, not commercial ones. They cover every entity that is required to file, not just the entities that a commercial platform has chosen to index.

Industry association directories, certification body registries, and professional organization membership lists provide another layer of primary-source data. These sources identify companies by what they actually do, as verified by the industry itself, rather than by what a classification algorithm infers from their website copy.

The challenge with primary-source data is that it is fragmented, inconsistent in format, and distributed across thousands of individual sources. No single government database covers all industries or all geographies. Extracting, normalizing, and matching records across these sources requires significant technical infrastructure. This is precisely why commercial databases do not do it comprehensively. It is expensive, it is difficult, and it does not scale in the way that web scraping does.

How AI-Driven Deal Origination Works

AI-driven deal origination builds bespoke target universes from primary-source data through four stages: thesis decomposition, source ingestion, entity resolution, and scoring.

AI-driven deal origination, as practiced by Praxis Rock Advisors, begins with the construction of a bespoke target universe for each engagement. This is not a filtered list from a commercial database. It is a purpose-built dataset assembled from primary sources specific to the client's investment thesis.

The process operates in four stages.

Stage 1: Thesis Decomposition. The client's investment thesis is broken down into its constituent attributes: industry vertical, service lines, geographic footprint, regulatory requirements, customer types, and operational characteristics. These attributes define the search parameters, but they are expressed in terms that map to primary-source data, not to the taxonomies of commercial databases.

Stage 2: Source Identification and Ingestion. For each attribute, the relevant primary sources are identified and ingested. If the thesis targets propane distribution companies in the Midwest, the relevant sources include state propane licensing databases, DOT hazmat carrier registrations, state fire marshal permit records, and propane industry association membership directories. These sources are accessed, extracted, and normalized into a unified dataset.

Stage 3: Entity Resolution and Enrichment. The raw records from primary sources are matched, deduplicated, and enriched. A single company may appear in multiple databases under different names, addresses, or entity structures. AI-driven entity resolution links these records to build a comprehensive profile of each target, including its operating locations, license types, regulatory history, and corporate structure.

Stage 4: Scoring and Prioritization. The unified target universe is scored against the client's specific criteria, including estimated revenue ranges derived from operational proxies, geographic fit, service line alignment, and indicators of acquisition readiness such as ownership age, succession patterns, and regulatory compliance history. The output is a ranked list of targets, many of which have never appeared in any commercial database search.

This process is repeated and refined as the engagement progresses. New sources are added as the thesis evolves. Targets that respond to outreach provide feedback that sharpens the scoring model. The system learns from each engagement, but the target universe it builds is always proprietary to the client.

The 16.5% Problem

The 83.5% of deals that PE firms never see represent the actual opportunity for differentiation in entry multiples, proprietary deal flow, and platform building.

Bain & Company's finding that the average PE firm sees only 16.5% of relevant deals is not a reflection of laziness or incompetence. It is a reflection of the structural limitations of the tools the industry relies on. When every firm sources from the same databases, the visible market is defined by the coverage of those databases. Everything outside that coverage is invisible.

The 16.5% figure has specific implications for different aspects of the investment process.

Entry multiples. When multiple buyers compete for the same visible targets, entry multiples are bid up. The targets that every firm can find are, by definition, the targets that face the most competitive pressure. Off-market targets, those outside the 16.5%, face less competition and transact at lower multiples on average.

Proprietary deal flow. The term "proprietary deal flow" has been diluted to the point of meaninglessness in most PE contexts. A deal is not proprietary because a firm heard about it from a single intermediary. It is proprietary because no other buyer has identified the target as a potential acquisition. Achieving genuine proprietary deal flow requires looking where other buyers are not looking, which requires data sources other buyers are not using.

Thesis execution. Many PE firms articulate differentiated investment theses, targeting specific verticals, geographies, or operational profiles. But they execute those theses using the same undifferentiated tools. The result is that firms with genuinely distinct strategies end up competing for the same targets as firms with entirely different strategies, because the databases they all use surface the same companies regardless of the thesis applied.

Platform building. For firms executing buy-and-build strategies, the 16.5% problem is particularly acute. The add-on targets that would be most accretive to a platform are often the smallest, most niche, and least visible companies in a given vertical. These are precisely the companies that commercial databases are least likely to cover.

The 83.5% of deals that the average firm never sees represent the actual opportunity for differentiation in private equity. Accessing that opportunity requires a fundamentally different approach to deal origination.

What This Means for Your Firm

Firms of every size face an artificial ceiling on their opportunity set, and closing the gap requires primary-source data infrastructure, built internally or through a partner.

The implications of the 16.5% problem vary by firm size, strategy, and stage of development, but the underlying dynamic is the same for all buyers: the tools the industry relies on for deal sourcing have created an artificial ceiling on the opportunity set that most firms evaluate.

For large-cap firms with dedicated business development teams, the issue is efficiency. These firms have the resources to conduct proprietary research, but the cost of building and maintaining primary-source data infrastructure internally is significant. The question is whether that investment is better made in-house or through a specialized partner.

For middle-market firms, the issue is coverage. These firms are often pursuing theses in fragmented verticals where the majority of potential targets are small, private, and invisible to commercial databases. The gap between the visible market and the actual market is widest in precisely the segments where middle-market firms operate.

For independent sponsors, the issue is even more acute. Without institutional infrastructure, these buyers are entirely dependent on the tools and relationships available to them. Commercial databases provide a starting point, but they provide the same starting point to every other buyer. Differentiation requires a different approach.

Praxis Rock Advisors exists to close this gap. Our deal origination platform builds bespoke target universes from primary-source data for each engagement, surfaces companies that commercial platforms have not indexed, and runs the complete outreach operation on behalf of our clients. The result is a pipeline of acquisition targets that no other buyer has identified.

The 83.5% of the market that most firms never see is not inaccessible. It is simply unseen by the tools the industry has chosen to rely on. Seeing it requires looking somewhere else.

Share on LinkedIn Share on X

Frequently Asked Questions

Bain & Company's 2024 Global Private Equity Report analyzed deal flow patterns across the industry and found that the average PE firm evaluates only 16.5% of the acquisition opportunities that match its stated investment criteria. The remaining 83.5% of relevant targets are never identified, never contacted, and never evaluated. This is not a function of the firm's effort or capability. It is a function of the data sources the industry relies on for deal sourcing. Commercial databases cover a meaningful but incomplete subset of the private company universe. Companies that are not indexed by these platforms, whether due to minimal digital presence, non-standard business descriptions, or niche vertical focus, remain invisible to every buyer that relies on them.

Primary-source data is information obtained directly from the entity that created it: a state licensing board, a federal regulatory agency, an industry certification body, or the company itself through its regulatory filings. Commercial databases are aggregators. They compile information from various sources, apply classification algorithms, and present the results in a searchable format. The distinction matters because aggregation introduces coverage gaps, classification errors, and temporal lag. A company that holds a state environmental permit but has no website will appear in the state's permit database but not in any commercial platform. Primary-source data captures these companies. Commercial databases do not.

Traditional deal sourcing relies on three channels: commercial databases, intermediary relationships, and conference networking. All three channels surface targets that are, by definition, already visible to other buyers. AI-driven deal origination builds bespoke target universes from primary-source data for each engagement. The AI systems identify relevant data sources for a given investment thesis, extract and normalize records from those sources, resolve entities across multiple databases, and score targets against the client's specific criteria. The output is a proprietary target list that includes companies no commercial database has indexed. The process is systematic, repeatable, and scalable in a way that relationship-based sourcing is not.

In principle, yes. In practice, the investment required is substantial. Building primary-source data infrastructure requires identifying the relevant government and regulatory databases for each vertical, building extraction and normalization pipelines for each source, developing entity resolution systems that can match records across inconsistent formats, and maintaining these systems as sources change their formats and access methods. Most PE firms that have attempted to build proprietary sourcing infrastructure have focused on CRM systems and outreach automation, which improve the efficiency of contacting known targets but do not expand the universe of targets identified. Expanding the target universe requires a different kind of infrastructure, one focused on data acquisition and intelligence rather than relationship management.

The companies most likely to be absent from commercial databases share several characteristics. They operate in niche verticals where the business description does not map cleanly to standard industry classifications. They have minimal or no digital presence, meaning no website, no LinkedIn page, and no news coverage. They operate under holding company structures or DBAs that obscure the nature of their business. They are located in secondary or tertiary markets where commercial data providers have less coverage. And they are small enough that they fall below the revenue thresholds that commercial databases prioritize for indexing. In fragmented verticals like propane distribution, specialty waste services, or niche healthcare services, these characteristics describe the majority of potential acquisition targets. The companies that commercial databases cover well tend to be the largest, most visible, and most competed-over targets in any given vertical.

Deal Origination

Buy-Side M&A Advisory Explained: Process, Fees, and When You Need One

Buy-side M&A advisors charge 0.5-2% of transaction value to source and close acquisitions. Here's the full process, fee math, and decision framework.

Jeff Baehr · Mar 2026

Deal Origination

AI Deal Origination in Private Equity: How It Works in 2026

PE firms see just 16.5% of relevant deals. AI deal origination changes that with primary data, predictive scoring, and autonomous outreach at scale.

Jeff Baehr · Mar 2026

Deal Origination

Deal Origination Services for Private Equity Firms

PE deal origination services cost $5K-$25K/month and surface 3-5x more proprietary deals than in-house teams. Here's how to evaluate providers.

Jeff Baehr · Mar 2026

Ready to see what this infrastructure can do for your firm?

Schedule a Conversation

Why Every PE Firm Is Looking at the Same Deals

Executive Summary

Why Does Every PE Firm See the Same Deals?

What Primary-Source Data Actually Means

How AI-Driven Deal Origination Works

The 16.5% Problem

What This Means for Your Firm

Frequently Asked Questions

What does it mean that PE firms see only 16.5% of relevant deals?

How is primary-source data different from what commercial databases provide?

How does AI-driven deal origination differ from traditional deal sourcing?

Can a PE firm build this capability in-house?

What types of companies are most likely to be missed by commercial databases?

Related Articles

Buy-Side M&A Advisory Explained: Process, Fees, and When You Need One

AI Deal Origination in Private Equity: How It Works in 2026

Deal Origination Services for Private Equity Firms