The Procurement Data Gap Explained

Antoine Simon2026-03-2611 min readv1.0.0

If you asked ten procurement professionals how large the European public procurement market is, you would get a confident answer: roughly 2.7 trillion EUR annually, representing 14-16% of EU GDP. That number is well-established, widely cited, and based on credible macroeconomic analysis from sources including the OECD.

Now ask a different question: how much of that market is captured in structured, queryable data that you can use for strategic decision-making? The answer is far less comfortable. The gap between total procurement spending and the data accessible on TED and national platforms to suppliers, analysts, and policymakers is enormous — and it has real consequences for every company that sells to governments.

Duke has spent years building one of the most comprehensive procurement databases in Europe: 61M+ procedures drawn from 300+ sources. That effort has given us an unusually clear view of where the data is, where it is not, and what the gaps cost. This article maps the procurement data landscape and explains why most market intelligence is working with an incomplete picture.

Gap 1: The TED-only illusion

TED (Tenders Electronic Daily) is the default data source for European procurement intelligence. It is free, standardized, and covers all 27 EU member states. For many intelligence providers and suppliers, TED is the entire dataset.

This is a problem, because TED covers only above-threshold procurement.

What TED captures

EU Directives require member states to publish contract notices and award notices on TED for procurements above specific value thresholds:

Central government supplies and services: ~143,000 EUR
Sub-central government supplies and services: ~221,000 EUR
Works contracts: ~5,538,000 EUR
Utilities sector: ~443,000 EUR

These thresholds mean that TED captures the largest, most visible contracts. For above-threshold procurement, TED provides excellent coverage — eForms standardization is improving data quality, and mandatory publication means minimal gaps.

What TED misses

Everything below those thresholds. And that is most of the market.

Consider Germany, Europe's largest procurement market. Duke tracks approximately 782,000 German procurement procedures. Of these, roughly 474,000 are above-threshold tenders captured by TED. The remaining 308,000 — 39% of tracked procedures — are below-threshold notices from Germany's 14 national platforms, most notably the DOE aggregator (270,000+ procedures).

These below-threshold procedures are not trivial. A contract worth 200,000 EUR is below the works threshold but represents substantial business for most companies. In aggregate, below-threshold procurement accounts for an estimated 50-70% of total procurement spending by number of procedures in most European countries.

A market intelligence provider that uses only TED data is giving you a map of the mountain peaks while ignoring the valleys where most of the opportunity actually lies.

The national platform fragmentation

Below-threshold procurement is published — when it is published at all — on national platforms. Each country has its own system:

France: DECP (Donnees Essentielles de la Commande Publique) captures 192,000+ procedures, plus BOAMP, AJI, Atexo, and 14 other sources
Norway: Doffin publishes 5,000+ national procedures annually
Finland: Hilma captures Finnish tenders at all threshold levels
Netherlands: TenderNed provides national coverage
Baltics: Estonia, Latvia, and Lithuania each operate independent national platforms

These platforms use different data formats, different classification systems, different identifier schemes, and different publication rules. Aggregating them into a coherent dataset requires significant engineering investment — which is why most intelligence providers default to TED-only coverage.

Gap 2: Document-level intelligence

Procurement notices — the structured data published on TED and national platforms — contain a summary of the opportunity. The actual intelligence is in the documents: specifications, qualification requirements, contract terms, evaluation criteria, draft agreements, and annexes.

The document problem

A typical above-threshold procurement notice on TED contains:

Buyer name and contact
Subject description (usually 1-3 sentences)
CPV codes
Estimated value (often omitted or ranges)
Submission deadline
Procedure type
Award criteria (weights, if published)

What it does not contain:

Detailed technical specifications
Qualification requirements (turnover, references, certifications)
Specific evaluation methodology
Contract duration and renewal options
Subcontracting requirements
Security clearance or location requirements
Penalty and liability terms

These details are in the tender documents, which are typically PDFs hosted on the contracting authority's platform. Accessing them often requires registration, navigation through platform-specific interfaces, and manual download.

Why document extraction matters

For any serious bid/no-bid assessment, you need document-level detail. The notice tells you that an IT services contract exists. The documents tell you whether you can actually win it — whether the qualification requirements match your profile, whether the evaluation criteria align with your strengths, and whether the contract terms are commercially acceptable.

Without systematic document extraction and indexing, procurement intelligence is surface-level. You know what exists but not whether it is worth pursuing.

The scale challenge

Across 300+ procurement sources, documents are hosted on different platforms, in different formats (PDF, Word, HTML, XML), with different access mechanisms (direct download, registration-gated, portal-specific viewers). A single tender may have 5-20 associated documents.

Extracting, processing, and indexing documents at the scale of European procurement is an engineering problem that most intelligence providers have not solved. The result is a gap between the structured notice data (what the opportunity is) and the document data (whether it is worth pursuing).

Gap 3: Entity resolution — the identity problem

Procurement data is only as useful as your ability to connect it. Who is buying? Who is winning? Are these two buyer records the same organization? Is this supplier the same entity that won a contract in another country?

Entity resolution — the process of linking different records to the same real-world entity — is arguably the most important and most neglected problem in procurement data.

Why entity resolution is hard in procurement

Government buyers and suppliers appear in procurement data under multiple names, identifiers, and formats:

Buyer identity fragmentation:

A German municipality might appear as "Stadt Munchen", "Landeshauptstadt Munchen", "City of Munich", and several variations with different punctuation
Different platforms may use different national identifier schemes
Organizational restructuring creates new entities that are operationally continuous with old ones

Supplier identity fragmentation:

A multinational corporation appears under different legal entity names in different countries
Subsidiaries, divisions, and joint ventures may bid under separate identities
VAT numbers, trade register numbers, and platform-specific IDs do not always cross-reference cleanly

The consequences of poor entity resolution

Without reliable entity resolution, critical procurement intelligence questions cannot be answered accurately:

"How much does this buyer purchase in our sector?" If the same buyer appears as three separate entities in your data, you undercount their spending and may miss them as a target account.

"How often does this competitor win?" If a competitor's subsidiaries appear as unrelated entities, their true market presence is invisible. You may think you face 5 competitors when you actually face 3 — or one large one.

"What is our win rate with this buyer?" If your historical bids and wins are not consistently linked to the correct buyer entity, your win/loss analysis is unreliable.

"Who are the biggest suppliers in this sector?" Market share analysis that cannot resolve entities will fragment large players into multiple small entries and generate misleading market structure insights.

Duke addresses entity resolution through a dedicated identity resolution system that maps buyer and supplier records across sources using national identifiers, corporate hierarchies, and name normalization. The result is a procurement dataset where "Stadt Munchen" and "Landeshauptstadt Munchen" correctly resolve to the same entity — a seemingly simple problem that requires significant engineering at scale.

Gap 4: Award data and outcome intelligence

The most strategically valuable procurement data is not about opportunities published — it is about outcomes: who won, at what price, and how the evaluation unfolded. This award data is systematically thinner than notice data.

Publication compliance varies

EU Directives require publication of contract award notices for above-threshold procurements within 30 days of award. In practice:

Compliance rates vary by country from 60% to 95%
Award notices are often published months after the actual award decision
Below-threshold award data is published inconsistently or not at all
Award values are sometimes omitted, published as ranges, or reported in inconsistent currency/VAT treatment

What is lost

Without reliable award data:

Price benchmarking becomes impossible. You cannot estimate competitive pricing without knowing what similar contracts were awarded at.
Win/loss analysis is incomplete. Your bid debriefs tell you about your bids, but industry-wide win patterns require comprehensive award data across all competitors.
Buyer spending analysis is unreliable. Tracking a buyer's total spend in your sector requires complete award records, not just the notices they published.
Competitive landscape is fuzzy. Market share analysis based on partial award data misrepresents the actual competitive dynamics.

The historical depth problem

Even where award data is published, historical depth varies. TED's structured data is most reliable from 2016 forward. Older data exists but with declining completeness and consistency. National platforms vary even more — some retain only 2-3 years of historical data.

For trend analysis, competitive benchmarking, and buyer pattern identification, historical depth matters. A company that appeared as a new market entrant in 2024 may have been active in the sector since 2018 through national platforms that your data does not cover.

Gap 5: Data quality and consistency

Even where data exists, its quality varies in ways that undermine analysis.

Classification inconsistency

CPV codes provide a universal classification system, but their application is inconsistent. Different contracting authorities assign different CPV codes to substantively similar procurements. An IT infrastructure contract might be classified as:

72000000 — IT services: consulting, software development, Internet, and support
72200000 — Software programming and consultancy services
48000000 — Software package and information systems
30200000 — Computer equipment and supplies

This inconsistency means that monitoring a single CPV code branch will miss relevant opportunities classified under adjacent or overlapping codes. Cross-classification systems — mapping CPV to national codes like NAICS or NACRES — add another layer of complexity.

Temporal inconsistency

Publication dates, deadline dates, and award dates are recorded with varying precision and accuracy across sources. A notice with a publication date that actually reflects the platform's processing date rather than the buyer's publication date can create timeline discrepancies of 1-5 days — significant when response windows are 15-30 days.

Value reporting

Estimated and awarded values suffer from multiple inconsistencies:

VAT inclusion varies by country and source
Currency conversion for multi-country analysis introduces exchange rate timing questions
Lot-level values may not sum to total contract value
Framework agreement values represent maximum potential, not guaranteed spend
Multi-year contracts may report annual or total value inconsistently

These issues sound technical, but they compound. A market sizing analysis that mixes VAT-inclusive and VAT-exclusive values across countries will be wrong by 15-25%.

Closing the data gap: the Duke approach

Duke's approach to the procurement data gap is systematic:

Multi-source aggregation

Rather than relying on TED alone, Duke ingests data from 300+ procurement sources across Europe and beyond — including the national platforms, below-threshold sources, and specialized portals that contain the below-threshold majority of the market.

Data normalization

Every source is normalized to a consistent data model: standardized CPV classification, ISO geographic codes, consistent value formats, and unified date handling. This normalization enables cross-source analysis that fragmented data cannot support.

Entity resolution

Duke's identity resolution system maps buyer and supplier records across sources, using national identifiers, corporate hierarchy data, and algorithmic name matching to ensure that analytical questions about buyers, suppliers, and competitors receive accurate answers.

Document intelligence

Systematic extraction and indexing of tender documents adds the detail layer that notice-level data lacks — making bid/no-bid assessments possible without manual portal-by-portal document hunting.

Continuous quality improvement

Data quality is not a destination — it is an ongoing process. Every new source, every format change, every regulatory shift in publication requirements creates data quality challenges that require engineering attention. Duke's 61M+ procedure database reflects years of continuous improvement across hundreds of sources.

What this means for your strategy

The procurement data gap has direct strategic implications:

If your market intelligence is TED-only, you are making strategic decisions based on 30-40% of the market. Your market sizing underestimates total opportunity, your competitive analysis misses below-threshold players, and your buyer profiling is incomplete.

If your entity resolution is weak, your competitive intelligence is unreliable. You may overestimate market fragmentation, miss the true scale of competitor operations, and fail to identify buyer concentration in your target sectors.

If your document intelligence is limited, your bid/no-bid decisions are based on insufficient information. You are committing resources to opportunities without knowing whether you can meet the qualification requirements.

If your historical data is shallow, your trend analysis is a snapshot, not a movie. You cannot identify procurement patterns, seasonal cycles, or buyer behavior shifts that inform forward-looking strategy.

The procurement data gap is not an abstract problem. It is the gap between the strategic decisions you are making and the decisions you would make with complete information. Closing that gap — through comprehensive, resolved, quality-controlled procurement data — is the foundation of genuine procurement intelligence.

Conclusion

The European procurement market is rich in data — far richer than most other B2B markets. The challenge is not the absence of data but its fragmentation, inconsistency, and incompleteness. These gaps are not uniformly distributed: they are concentrated in exactly the areas where strategic intelligence matters most — below-threshold opportunities, entity identification, document-level detail, and award outcomes.

Closing the procurement data gap requires more than better search tools. It requires systematic multi-source aggregation, rigorous data normalization, intelligent entity resolution, and continuous quality improvement across hundreds of sources and millions of records.

The suppliers and organizations that build their strategies on complete data will make better decisions than those working with fragments. In a market of 61M+ procedures and counting, the quality of your data is the quality of your strategy.

How Procurement Intelligence Changes Your Win Rate -- From data coverage to better bid decisions
How to Set Up Tender Alerts -- Build multi-source monitoring beyond TED
Procurement Trends in Europe 2026 -- How eForms and AI are closing the data gap
How to Use CPV Codes -- Overcome classification inconsistency with smarter search
eForms -- The standard that is transforming procurement data quality

Close your procurement data gap. Duke aggregates 300+ sources, resolves entities across platforms, and delivers complete market intelligence from 61M+ procedures. Start your free trial today.