MyHeritage automates record-matching as genealogy wars heat up

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

When it comes to social networks, few are more important – and harder to pin down – than the family tree. So it’s no surprise that the fierce competition between the two leading platforms, and MyHeritage, is getting ever more technologically advanced.

Derrick covered some of the techniques being used by back in June, and today we can reveal the latest weapon in MyHeritage’s arsenal: automated record matching.

Both platforms lean heavily on records as a way of augmenting the drier names and dates that make up family trees, but the Israel-based MyHeritage – which already has its own angle by explicitly treating the service like a social network – reckons it now has the edge.

According to CEO Gilad Japhet, MyHeritage has had its Record Matching tech ready for some time, but needed to set up a server farm, then clear a backlog of four billion historical records (including the world’s largest historical newspaper collection, acquired through the company’s FamilyLink buy last year), before launching it today.

“They come from original documents, birth records, marriage certificates, passenger lists going through Ellis Island, tombstones – in a few cases user contributed, as some people take snapshots of gravestones and upload them – public information, census records, newspaper articles and books. Record Matching covers both text-based and structured records, those that can be filled into a regular database,” he told me.

As an example, let’s say you don’t know the date of birth or death for your grandfather, but you do know his name. MyHeritage has a big database of wills, but again, you’re lacking dates. So the service would use its already-existing Smart Matching technology to compare the known information with that on other family trees, perhaps pinning down dates through other relatives’ connections.

Then, armed with that, it would find what it can in those historical records, using semantic analysis to deal with the free-text newspaper cuttings for example.

The smart thing, and one that Japhet hopes will pull in more subscribers and pay-as-you-go credit users, is that Record Matching works automatically and provides snippets of information for free. If you’re a user, you’ll just get an email telling you what’s been found. If you want to see the full record, you pay, but it doesn’t require that step to prove its worth.

So why did MyHeritage decide to shun the cloud for all this?

logo“We found it wasn’t very efficient to run this in the cloud because the CPU power you get is typically smaller, as a lot of these servers are virtual,” Japhet said. “We wanted serious number-crunching capabilities, and found it more efficient for us to purchase high-end servers, put together a large farm, run it all and accumulate the matches. It’s an ongoing real-time system.”

Japhet also claims other advantages over, an older and larger service (38 million family trees to MyHeritage’s 23 million). For one thing, he points out that MyHeritage is available in 38 languages and its rival in just half a dozen – that makes a difference when you consider the international aspect of genealogical research.

What’s more, MyHeritage intends to “launch a massive crowdsourcing based transcription system” for its users within the next year, he added. And so the battle for family history continues.

3 Responses to “MyHeritage automates record-matching as genealogy wars heat up”

  1. Mary Ann Walker Hubbell

    A. I never heard of MyHeritage until today.

    B. Gathering info that is out there in records and notifying you? Isn’t that already done on For all the good I HATE it when they connect the wrong family and idiots keep sharing the incorrect info.

  2. ourFamilyology

    Neither one of these services are concerned about people building a family tree that is reliable backed up with evidence that supports important dates and relationships. They only care about building “Large” family trees, regardless of accuracy.