Summary:

Financial regulators and company registration indexes are increasingly opening up their data. OpenCorporates is gluing that data together and visualizing it in a way that should not only help fight corruption, but clean up the data too.

OpenCorporates Goldman Sachs map

As any tax fairness campaigner will tell you, there is no better shield against scrutiny than a careful blend of complexity and obscurity. Public filings should in theory take care of the obscurity element, but that leaves complexity – those labyrinthine networks of corporate subsidiaries and holding companies spread around multiple jurisdictions.

So, how do we shine a light on these structures and figure out who really controls what? How do we follow that money? That’s where OpenCorporates, a UK startup that blends publicly available data from around the world and presents it in an understandable format, comes in. The company is being incubated in the UK Open Data Institute, and has also received a grant from the Alfred P Sloan Foundation.

It’s a similar concept to that of Duedil, a London-based firm that buys, bakes and serves up corporate data so that VCs can do due diligence and businesses can read up on their supplier — only it’s open rather than paid-for, and it has far greater ambitions. Everything OpenCorporates serves up can be consumed for free, unless you’re going to stick it in a proprietary report or database that you don’t intend to give back to the community (OpenCorporates will launch its price list in a week’s time). It has corruption in its sights, and also aims to clean up data for financial regulators and other institutions around the world.

Feast your eyes

But what are we actually talking about here? For a taster, check this out:

OpenCorporates Goldman Sachs map

That’s a searchable map, done in collaboration with data visualization outfit Kiln, of Goldman Sachs’s insanely complex corporate structure – or at least a hefty chunk of it – based on data from public filings and company registrations in the U.S., New Zealand, the Cayman Islands, Luxembourg, the UK and so on. Every one of those dots is a company: Goldman has 1,475 subsidiaries registered in the U.S. and 739 in the Caymans alone.

“By visualising it by country, it shows particularly in the cases of Goldman Sachs and Morgan Stanley, just how critical the Cayman Islands is to those networks,” OpenCorporates co-founder and CEO (and former journalist) Chris Taggart told me. “That’s the sort of thing you could have done as an academic study based on this data, but maybe half a dozen people would have read it. This is an almost automatic byproduct of putting this into a single open dataset.”

Disparate sources

That’s not a typical OpenCorporates visualization, though — it’s more a demonstration of the power of the startup’s unified datasets. Here’s a more standard example, showing the structure of Barclays Bank and its myriad subsidiaries and shareholdings (at least, the ones OpenCorporates knows about with a high degree of confidence):

OpenCorporates Barclays map

This open data corporate network platform launched this week, taking in three key datasets as the initial ingredients: the New Zealand company register, the U.S. Federal Reserve’s National Information Center, and the U.S. Securities and Exchange Commission’s 10-K and 20-F filings.

Pulling this data together is not trivial. The New Zealand company register is pretty up-to-date and easy to access, but the Fed’s data is generally “locked away in horrendous PDFs”, as Taggart said in a a blog post, and the SEC filings are “the most problematic of all, being published once a year, as arbitrary text”.

But when it’s pulled together, the data can be incredibly useful for a variety of players, from financial regulators and lawyers to tax fairness campaigners and journalists. “It shows the depth of those networks, up to 15 paths long, and I think people haven’t really understood the nature of that complexity — not just the size but the complexity of those chains,” Taggart told me.

“That’s really important because, when you have the case of an American company which is controlled by another American company but the route through that is by the Cayman Islands or let’s say Bermuda or something else, that really matters because the Cayman Islands is a different jurisdiction with different laws. What if there’s a coup in the Caymans tomorrow – what does that mean for the U.S. banking system?”

Open data is better data

But there’s another crucial aspect to what OpenCorporates is doing: the company aims to help improve the quality of the data it’s sucking in. This is the big selling point that OpenCorporates is offering potential data sources, in the hope of getting them to add further ingredients to its big data stew.

Open source software provides a great analogy. There’s a reason that open source has taken over the server software and, to an extent, the security software industries – scrutiny reveals errors, and the result is code that’s far better tested and more secure than proprietary code.

“Open data is not just a common good, but it’s also about quality,” Taggart said. “You often get the names subtly wrong or the jurisdiction wrong — this is true of all the official data… Generally the stuff people have access to, whether it’s free or paid-for, they know it’s not good quality. And the reason it’s not good quality is because it’s closed.”

Not only does visualization of the data show up inconsistencies, but OpenCorporates also makes the quality of the data explicit by assigning “confidences” to links. For example, Fed data is more granular, structured and detailed than data from an SEC filing, so the resulting links get assigned a higher confidence of being accurate. These levels of confidence will come in handy when OpenCorporates starts soliciting crowdsourced data in a month or so.

And the more data OpenCorporates sucks in, the more value it will be able to produce. The team’s goals are lofty, but there’s no doubt they’re worth aiming for. As Taggart said:

“If everyone did what New Zealand did and publish shareholding data we’d have a pretty fantastic [result]. It would improve competition by showing where the gaps and opportunities are. It would improve corruption and law enforcement, because you’d be able to see where companies are fraudulent and how they’re connected. It would improve shareholder value, because you could avoid things like Enron where the opacity of the network hid from shareholders what was going on.

“We’re gradually solving that information bit by bit, and I think the payoff for businesses, shareholders and regulators is tremendous, and I’m genuinely really proud of what we’ve done.”

Comments have been disabled for this post