Executive Summary
Most PE firms evaluate only 16.5% of relevant deals because commercial databases surface identical targets to every subscriber using the same underlying data.
The private equity industry has a sourcing problem it does not discuss openly. Despite record levels of dry powder, despite the proliferation of deal sourcing platforms, and despite the professionalization of business development functions across the industry, the vast majority of PE firms are filtering the same commercial databases, attending the same conferences, and calling on the same intermediaries. The result is a market in which competitive intensity concentrates on a narrow band of visible targets while the majority of potential acquisitions remain entirely outside the buyer's field of vision.
Bain & Company's 2024 Global Private Equity Report found that the average PE firm sees only 16.5% of the deals relevant to its stated investment criteria. That figure should be alarming. It means that for every acquisition a firm closes, there are roughly five comparable targets it never evaluated, never contacted, and never knew existed. The implications for portfolio construction, entry multiples, and long-term returns are significant.
This article examines why the problem persists, what primary data sources that generate off-market deal flow actually look like, and how AI-driven systems can build target universes that commercial databases have not indexed.
Why Does Every PE Firm See the Same Deals?
Every firm sees the same deals because commercial databases all scrape the same public sources, producing functionally identical target lists for any given search.
The answer is structural, not behavioral. The commercial databases that dominate PE deal sourcing are built on the same underlying data infrastructure. They scrape the same public filings, license the same third-party datasets, and apply similar classification algorithms to organize companies into searchable taxonomies. The result is a high degree of overlap in the companies each platform surfaces for any given search.
When a PE firm's business development team logs into a commercial database and filters for HVAC services companies in the Southeast with $10 million to $50 million in revenue, they retrieve a list that is functionally identical to the list their competitors retrieve using the same parameters. The platform's value proposition is convenience, not exclusivity. It organizes publicly available information into a searchable format. It does not generate proprietary intelligence.
Some newer platforms have made meaningful advances in using natural language processing to classify companies by what they actually do, rather than relying solely on SIC and NAICS codes. This is a genuine improvement. But the underlying data sources remain largely the same: company websites, LinkedIn profiles, news articles, and the same government filings that every other platform also ingests. The classification may be more accurate, but the universe of companies being classified has not expanded materially.
The consequence is that PE deal sourcing has become a competition over the same finite set of visible targets. Firms differentiate on speed of outreach, quality of relationships, and willingness to pay premium multiples, but not on the breadth of the opportunity set they evaluate. This is a structural inefficiency that the industry has largely accepted as a given.
It is not a given. It is a function of where firms look for targets, and most firms look in the same places.
What Primary-Source Data Actually Means
Primary-source data is information obtained directly from government agencies, regulatory bodies, or companies themselves, bypassing the coverage gaps and classification errors of aggregated databases.
Primary-source data, in the context of deal origination, refers to information obtained directly from the entity that created it, rather than from an aggregator or intermediary that has compiled, cleaned, and resold it. The distinction matters because aggregators introduce three forms of information loss: coverage gaps, classification errors, and temporal lag.
Coverage gaps arise because commercial databases are built to serve the broadest possible customer base. They prioritize companies that are most likely to be searched for, which means they systematically underrepresent businesses that operate in niche verticals, use non-standard business descriptions, or have minimal digital footprints. A propane distribution company operating under a DBA that does not mention propane anywhere on its website will not appear in a keyword-based search. A specialty chemical distributor that files its business license under a holding company name will not be associated with the chemical distribution vertical in any commercial database.
Classification errors compound the coverage problem. SIC and NAICS codes are self-reported and rarely updated. A company that was classified as a general contractor in 2008 may now derive 90% of its revenue from environmental remediation services, but its classification has not changed. Commercial databases that rely on these codes as a primary taxonomy will misclassify the company, and buyers searching for environmental services targets will never see it.
Temporal lag means that commercial databases reflect the state of a company at the time it was last indexed, which may be months or years in the past. A company that has grown from $5 million to $25 million in revenue since its last data refresh will not appear in searches filtered for companies above $15 million. A company that has been acquired will continue to appear as an independent target until the database is updated.
Primary-source data bypasses all three problems. Government filings, such as state business registrations, professional license databases, environmental permits, and healthcare facility certifications, are created by the companies themselves or by the regulatory bodies that oversee them. They are updated on regulatory timelines, not commercial ones. They cover every entity that is required to file, not just the entities that a commercial platform has chosen to index.
Industry association directories, certification body registries, and professional organization membership lists provide another layer of primary-source data. These sources identify companies by what they actually do, as verified by the industry itself, rather than by what a classification algorithm infers from their website copy.
The challenge with primary-source data is that it is fragmented, inconsistent in format, and distributed across thousands of individual sources. No single government database covers all industries or all geographies. Extracting, normalizing, and matching records across these sources requires significant technical infrastructure. This is precisely why commercial databases do not do it comprehensively. It is expensive, it is difficult, and it does not scale in the way that web scraping does.
How AI-Driven Deal Origination Works
AI-driven deal origination builds bespoke target universes from primary-source data through four stages: thesis decomposition, source ingestion, entity resolution, and scoring.
AI-driven deal origination, as practiced by Praxis Rock Advisors, begins with the construction of a bespoke target universe for each engagement. This is not a filtered list from a commercial database. It is a purpose-built dataset assembled from primary sources specific to the client's investment thesis.
The process operates in four stages.
Stage 1: Thesis Decomposition. The client's investment thesis is broken down into its constituent attributes: industry vertical, service lines, geographic footprint, regulatory requirements, customer types, and operational characteristics. These attributes define the search parameters, but they are expressed in terms that map to primary-source data, not to the taxonomies of commercial databases.
Stage 2: Source Identification and Ingestion. For each attribute, the relevant primary sources are identified and ingested. If the thesis targets propane distribution companies in the Midwest, the relevant sources include state propane licensing databases, DOT hazmat carrier registrations, state fire marshal permit records, and propane industry association membership directories. These sources are accessed, extracted, and normalized into a unified dataset.
Stage 3: Entity Resolution and Enrichment. The raw records from primary sources are matched, deduplicated, and enriched. A single company may appear in multiple databases under different names, addresses, or entity structures. AI-driven entity resolution links these records to build a comprehensive profile of each target, including its operating locations, license types, regulatory history, and corporate structure.
Stage 4: Scoring and Prioritization. The unified target universe is scored against the client's specific criteria, including estimated revenue ranges derived from operational proxies, geographic fit, service line alignment, and indicators of acquisition readiness such as ownership age, succession patterns, and regulatory compliance history. The output is a ranked list of targets, many of which have never appeared in any commercial database search.
This process is repeated and refined as the engagement progresses. New sources are added as the thesis evolves. Targets that respond to outreach provide feedback that sharpens the scoring model. The system learns from each engagement, but the target universe it builds is always proprietary to the client.
The 16.5% Problem
The 83.5% of deals that PE firms never see represent the actual opportunity for differentiation in entry multiples, proprietary deal flow, and platform building.
Bain & Company's finding that the average PE firm sees only 16.5% of relevant deals is not a reflection of laziness or incompetence. It is a reflection of the structural limitations of the tools the industry relies on. When every firm sources from the same databases, the visible market is defined by the coverage of those databases. Everything outside that coverage is invisible.
The 16.5% figure has specific implications for different aspects of the investment process.
Entry multiples. When multiple buyers compete for the same visible targets, entry multiples are bid up. The targets that every firm can find are, by definition, the targets that face the most competitive pressure. Off-market targets, those outside the 16.5%, face less competition and transact at lower multiples on average.
Proprietary deal flow. The term "proprietary deal flow" has been diluted to the point of meaninglessness in most PE contexts. A deal is not proprietary because a firm heard about it from a single intermediary. It is proprietary because no other buyer has identified the target as a potential acquisition. Achieving genuine proprietary deal flow requires looking where other buyers are not looking, which requires data sources other buyers are not using.
Thesis execution. Many PE firms articulate differentiated investment theses, targeting specific verticals, geographies, or operational profiles. But they execute those theses using the same undifferentiated tools. The result is that firms with genuinely distinct strategies end up competing for the same targets as firms with entirely different strategies, because the databases they all use surface the same companies regardless of the thesis applied.
Platform building. For firms executing buy-and-build strategies, the 16.5% problem is particularly acute. The add-on targets that would be most accretive to a platform are often the smallest, most niche, and least visible companies in a given vertical. These are precisely the companies that commercial databases are least likely to cover.
The 83.5% of deals that the average firm never sees represent the actual opportunity for differentiation in private equity. Accessing that opportunity requires a fundamentally different approach to deal origination.
What This Means for Your Firm
Firms of every size face an artificial ceiling on their opportunity set, and closing the gap requires primary-source data infrastructure, built internally or through a partner.
The implications of the 16.5% problem vary by firm size, strategy, and stage of development, but the underlying dynamic is the same for all buyers: the tools the industry relies on for deal sourcing have created an artificial ceiling on the opportunity set that most firms evaluate.
For large-cap firms with dedicated business development teams, the issue is efficiency. These firms have the resources to conduct proprietary research, but the cost of building and maintaining primary-source data infrastructure internally is significant. The question is whether that investment is better made in-house or through a specialized partner.
For middle-market firms, the issue is coverage. These firms are often pursuing theses in fragmented verticals where the majority of potential targets are small, private, and invisible to commercial databases. The gap between the visible market and the actual market is widest in precisely the segments where middle-market firms operate.
For independent sponsors, the issue is even more acute. Without institutional infrastructure, these buyers are entirely dependent on the tools and relationships available to them. Commercial databases provide a starting point, but they provide the same starting point to every other buyer. Differentiation requires a different approach.
Praxis Rock Advisors exists to close this gap. Our deal origination platform builds bespoke target universes from primary-source data for each engagement, surfaces companies that commercial platforms have not indexed, and runs the complete outreach operation on behalf of our clients. The result is a pipeline of acquisition targets that no other buyer has identified.
The 83.5% of the market that most firms never see is not inaccessible. It is simply unseen by the tools the industry has chosen to rely on. Seeing it requires looking somewhere else.