Free trial

6 October 2017

Using Fame's new Word Search feature to unearth hidden gems in annual reports

Alistair King

Today we launch "Word Search" functionality on Fame, the company database for the UK and Ireland.

This new feature allows you to perform highly tailored searches on text within the original documents of millions of British annual reports. And it goes well beyond the information that already feeds into the structured elements of the database.

As anyone who has trudged through some of the lengthier accounts submitted each year to HMRC will confirm, these documents can often contain some of the most revealing information in companies' financials. But, for file formatting reasons, they've traditionally been impossible to scan other than by eye – so this all-important but hidden-away material has remained extremely hard to find.

So, what's new and how can you capitalise on this with Fame?


Tagging – and its by-products

Searchability is something we've come to expect of text on the internet. But sometimes what we see as a piece of text, a computer will "see" as a meaningless image. This applies to unprocessed PDFs and similar file types, unless converted with expensive software that takes time to run.

But in 2010 HMRC adopted a new process that the indexing system eXtensible Business Reporting Language (XBRL), specifically inline XBRL (iXBRL), can exploit. Typically, accounts are submitted in a normal viewable format, with lines of information conforming to the UK's prescribed accounting taxonomy. When the iXBRL software is run over these submissions, it picks up on an enormous and detailed list of recognised terms and tags them accordingly. This enables a single document to provide both human-readable and – as a potentially time-saving by-product – machine-readable data.

Annual report information in the same tagged format makes its way to the Companies House database, and it's from this growing repository that Fame takes annual report data. We've engineered the new module to interrogate the vast dataset.


Searching this vast dataset in an instant

And it is vast. With 80% of UK-based companies now complying with the new process, we're approaching a critical mass. Including the recent archive, this brings the current number of searchable documents to around 8 million. So we were well motivated to build this tool.

How does it work?

In essence, Fame's "Word Search" feature bridges the gap between structured and unstructured data:

  • You select appropriate operators – "Must have", "Must not" or "Should have" – from an expandable list of dropdown menus, next to which you enter free text to create your combined search;
  • The "Should have" operator allows you to prioritise certain documents over others in your results based on whether they contain terms you deem non-compulsory; and
  • You can type the "~" proximity operator between words to further refine your search.

These often-sophisticated searches are then run across all of the original documents in our collection in an instant.

Here's an example search, along with the first few results:



Helping you to focus your research

The tool has many applications. You might be an accountant seeking business development opportunities with companies whose reports contain the phrase "defined benefit", a government officer tasked with a specific line of enquiry relating to an arcane word or phrase, or conducting statistical analysis on bulk sets of data that would otherwise be far less rich. Whatever your use case, your possibilities have expanded.

In summary, with Fame's "Word Search" you can:

  • Apply specific searches across the original documents of all registered company accounts that have been submitted through this system (> 80% of the total) in a matter of seconds, rather than going through them one at a time;
  • Discover particular legislative references, whether adhering to filing or regulatory requirements;
  • Compare results from these searches with Fame's structured data for the same accounts, gaining a better understanding of the figures within them, such as "cost of goods sold" and "turnover"; and
  • Perform very efficient, accurate and time-saving data interrogations, making use of multiple word-search steps and simple but versatile operators.

To discuss this new module, please contact your Bureau van Dijk account manager or email

Alistair King

Alistair King, Content Manager

Alistair contributes and edits material for all areas on this blog. With extensive knowledge of legal and company information, his specialist interests include compliance and corporate ownership.

Alistair contributes and edits material for all areas on this blog. With extensive knowledge of legal and company information, his specialist interests include compliance and corporate ownership.

bvdi white logo

How Bureau van Dijk can help you

Certainty is a highly-prized commodity in business. Data might be getting bigger all the time, but this only makes extracting value from it more difficult.

In capturing and treating private company information we aim to give you more certainty – and help you make better decisions and work more efficiently.



Our solutions are designed to help different business challenges and streamline your workflow. Many of our customers blend our information with their own internal data to get a more complete picture of the companies in their ecosystem.

Try our more certain approach –
welcome to the business of certainty.