Hard Lessons Won – Delivering on the Promises of the Semantic Web

The Semantic Web – the original “Web 3.0”, as proposed by Sir Tim Berners-Lee in 1999 – was meant to make information about organizations, people, places and things machine readable. Sir Tim predicted where we are today: we want to use data in an ever-increasing number of ways (apps, devices, assistants, services) to do an ever-increasing number of things. And that requires that it is the machines we use that understand the data online.

Based on the romantic idea that “the Semantic Web will allow machines to understand everything on the Internet,” the original vision was that all of our devices would be able to understand an organization and its offerings based solely on their website.

This would allow a range of scenarios for both B2C and B2B users. For example, entering a URL in a vehicle to navigate to the nearest store, dialing a URL and being able to select the most relevant phone number for a request, or accounting software that could automatically retrieve the data tax suppliers using their URL.

But the truth is that – to this day – the Semantic Web doesn’t allow us to do any of these things. In 2021, the Semantic Web allowed us to share fancy links on Facebook.

This is because most websites still do not use the structured data formats on which the Semantic Web is based (microdata, JSON-LD, RDFa). Of those who have adopted structured data formats, the overwhelming majority have done so only to the extent necessary for users of Facebook, WhatsApp, Twitter, LinkedIn, etc. can share fancy links to web pages.

Many of them do not use true Semantic Web standards, opting instead for simpler and massively refined standards created by Facebook (Open Graph), Twitter (card system) and others.

How was this opportunity missed? Partly because of complexity issues and partly because of the chicken-and-egg situation of innovative technologies.

Outside of universities and research organizations, the General Semantic Web failed because it was too complex for most companies to adopt. Ultimately, not enough companies have released machine-readable data to make it generally useful and reliably useful so that applications, devices, wizards, and services can be built on it. In other words, he failed because the complexity created the chicken and egg problem.

Only now – as consumer and business demand for what can be done with machine-readable data has grown so much – has this business opportunity overshadowed this problem of complexity. The problem has become that the companies that have stepped in to activate this data have an incredible monopoly.

Guardians of the web

Our ever-growing need for machine-readable data combined with the failure of the Semantic Web has led to the rise of centralized APIs offered by the web giants. Companies like Alphabet have built an empire by crawling the web, indexing its content, and storing it in their Knowledge Graph. They’ve built devices, applications, services, operating systems, and entire ecosystems on top of that data.

Developers looking to build applications that use machine-readable data on organizations face two options:

  • Browse the web like Alphabet did, store this data and try to keep it up to date
  • Use a paid, restricted and limited rate API

Option 1 is out of the realm of possibility for most developers and would only serve to further fragment Internet data.

Option 2 has rate limits and usage restrictions designed to limit competition.

The result is the stifled creativity of millions of developers and users who have no choice but to use privacy-compromising apps offered by the web giants. The way the gatekeepers of the web make data available to developers through APIs only strengthens their position. This must change.

Adopt machine-readable data

In the same way that organizations can independently deliver human readable websites directly to customers using open standards of the web; organizations need to be able to deliver machine-readable data directly to devices, applications and services used by their customers.

We know it can be done. We created NUM – a DNS alternative to the Semantic Web, providing the kind of data that was previously only available through APIs offered by the web giants.

Unlike APIs, NUM data access is available to developers for free, unlimited, and unrestricted. We launched in the UK, pre-populating data for millions of domains. Any domain owner can override their pre-populated data by adding NUM records to their own DNS or claim and update their pre-populated records using a simple user interface.

Almost all online organizations have a domain name – it’s their unique identifier and a small piece of the internet. The World Wide Web and electronic mail are two of the most successful standards ever created and both are based on the Domain Name System (DNS). NUM is also built on top of DNS, but most importantly doesn’t suffer from the same chicken and egg problem that killed the Semantic Web because we pre-populated the DNS with payload.

To find this data, we crawled 18 million domains for UK businesses and found around five million active business websites with useful public data. From these we extracted contact data, logos, company numbers, VAT numbers and more. We compared and mixed this data with other open public data sources like Companies House and posted all of this data to DNS as NUM records – nearly 10 million of them.

This data is available in DNS because it is one of the most efficient ways to store and serve small packets of data, in part due to the cached and distributed nature of DNS. By storing and serving data using DNS, it cannot be tracked, limited, or restricted. Developers can use it today and build apps with open source libraries.

As an example of how this data and standards can solve real problems for real users, we have developed CompanyDirectory.UK – a directory of some of the UK’s biggest companies, with all their specific contact information. and departmental services provided in a simple and searchable list.

But the list of apps is endless – and with unlimited free data, developers can experiment and innovate far beyond the confines of today’s Semantic Web.

Elliott Brown, Founder, Number

Comments are closed.