With nearly 400 million consumer profiles, Rapleaf is a key data provider to everyone from banks, retailers, anti-fraud firms and a whole lot of startups. Whichever way you look at it, Rapleaf is part of any Internet privacy conversation that affects you.


Earlier, I posted about San Francisco-based Internet information aggregator Rapleaf, a service that collects, sorts and repackages data about many of us who spend an inordinate amount of time on the Internet. I started poking around and discovered many startups that are using data from Rapleaf, but it’s not just startups. Just take a look at this article on Rapleaf in Fast Company from last year:

By accessing its database of 378,968,953 consumer email profiles, banks, retailers, and anti-fraud firms (all of which it counts among its clients) Rapleaf can quickly confirm legitimate customers and weed out scammers, cutting verification costs and improving the user experience. “Companies spend as much as $100 getting customers to their site. The goal is to filter out the bad people and keep as many good people as possible,” (Joel) Jewitt (Rapleaf’s VP of Business Development) says. “If a customer’s email address is attached to three or four social networking sites with 300 friends, the email likely isn’t fake and the retailer can put that person in the ‘good’ pile.”

One of our readers pointed out that because Rapleaf is sending data to these companies, which may be caching your information, there’s more information leaking out about you on the web. Opting out of Rapleaf’s service isn’t going to do you any good. Let’s put it bluntly: For better or worse, the genie is out of the bottle.

How Rapleaf Works

To better understand how, exactly, Rapleaf works, I did some investigating. On a basic level, Rapleaf is like a credit card company’s database. When you’re at a store and the cashier slides your credit card through, the store checks your card information against the credit card company’s database to make sure your card hasn’t expired and you have enough credit.

Rapleaf’s database contains email addresses. Say an airline offers a discount coupon, as long as you provide your email. When you sign up for the coupon, the airline looks up your email address in Rapleaf’s database; Rapleaf confirms the email is valid by checking it against your profile in its database; and the airline knows it can send you its email newsletter.

When I contacted Rapleaf, they said the company has built a database by crawling the web, looking for connections and building profiles based on their own technology. “Like Google, we crawl publicly available data on the web – as long as robots.txt allows search engines like us to crawl (we stop crawling if people disallow search engines),” CEO Auren Hoffman emailed. He added:

Rapleaf is working hard to protect consumers. We are a data company that, like 99 percent of data companies, is opt-out (rather than opt-in). But we are a white-hat data company who helps companies safely provide a more personalized experience to their customers. We try really hard to protect consumers (see) – we’ve thought a lot about consumer protection and are proud of everything we are doing. However, we are open to ideas on how we can improve and I encourage your readers to email me at auren.hoffman@rapleaf.com with ideas on how we can improve and better protect consumers. While we cannot commit to implementing any idea from your readers, we can commit that we will consider all thought-out suggestions.

The company argues what it does is no different from various ad networks, and that its policies are more consumer-friendly. You can opt out of Rapleaf by visiting this location, Hoffman said. Nevertheless, Rapleaf’s services are clearly much in demand, based on this response from CEO Hoffman:

Today we help hundreds of top retailers, hotels, advertising agencies, large brands, tech startups, educational organizations, and nonprofits personalize their customers’ experiences. (We sign NDAs with our customers so we cannot release their names.)

Think of Rapleaf as the provider of the FICO score about an email address. That email address comes with Facebook ID, Flickr ID, Twitter account information and other social details. For a marketer, or even someone trying to hit you up for business, this is pretty relevant data, for it allows them to target a customer and connect them socially. In another scenario, you can buy an email list of a million addresses for $1000, check them against Rapleaf and end up with about 10,000 emails worth targeting. That’s a pretty good deal.

A Good Email ID Is Worth Money

In order for Rapleaf to be successful, it needs to keep growing its database of good email addresses, which is why it’s giving startups like Facebook game and social CRM companies liberal access to its APIs. When a social CRM company, such as Rapportive, plugs into your Gmail account, it confirms to Rapleaf that your email address is valid. Since the social CRMs create profiles of the people who email you, the services confirm to Rapleaf that your friends’ addresses are valid, too. Technically, no data is exchanged, but the sheer quantity of look-ups is enough to beef up Rapleaf’s database.

Think of it this way: Companies like Rapportive, by making simple queries, are becoming the sources of the best and highest quality emails/IDs that Rapleaf has ever obtained. I think this is the crux of the problem. Here’s a question I sent to Rapleaf and the answer I received (emphasis mine).

Does Rapportive (and others like them, such as Gist) pay for the service? If yes, how much? What happens to the queries that originate from Rapportive? Say email x@x.com. Does that data get stored in your databases?

Unfortunately we’re not able to go into details about specific relationships because of our confidentiality agreements, but all of our customers pay us for our service.  We do have a free API (up to 1000 queries per month) that many companies use — but companies need to pay for Rapleaf for queries above that. We only allow companies to learn more about their existing customers (and we have never given out email addresses) and when they query their customers’ email, we return the most updated information Rapleaf has associated with that email. If this is a new email we have not seen before, it may be cached to provide better user experience in the future or it can be removed via opt-out.

Given that Rapleaf’s core competency is its ability to take email addresses, map them with data on the web and build a profile, I find the argument that data is cached for better user experience hard to swallow. With nearly a billion email addresses in its database, any look-up helps Rapleaf cull out the best emails from the giant morass of addresses. There are at least two companies I spoke to who have declined to work with Rapleaf and refused its offer of free data, mostly because, in their opinion, they found the workflow unsavory, to put it mildly.

Rapleaf’s Startup Web

Regardless, here is a list of Internet startups that have access to data from Rapleaf. Clearly it is incomplete, and, for some of these companies, it is not clear if they send data back to Rapleaf (I’ve noted the companies that confirmed that they only look up data). I am going to update this post with more comments as I get them.

  • Rapportive. The CEO has confirmed that the company doesn’t pass any data back and forth.
  • eTacts. They say they are not passing information back to Rapleaf.
  • Gist. The CTO confirmed the company isn’t passing any information back to Rapleaf.
  • Flowtown. Co-founder Ethan Bloch left a comment indicating Flowtown doesn’t pass any information back to Rapleaf.
  • IntroMojo
  • SafetyWeb
  • SocialShield. Arad Rostampour denied passing any data back to Rapleaf.

As I said earlier, even if the companies aren’t passing any data, every time they do an email-based look-up against Rapleaf’s database, they are essentially helping make Rapleaf’s database more powerful.

Casting the Social Web

Verifying emails is one thing. But today, there is a lot more valid social information about demographics, interests, location, etc. available that a company like Rapleaf could use to fill out its profiles. I’m as concerned about startups using Rapleaf’s API as I am about how the company continues to mine data from huge data-rich social services such as LinkedIn. LinkedIn data is ending up on Rapleaf, and from there, it’s appearing on other services such as Flowtown. When I contacted LinkedIn, its spokesperson sent the following response:

As we’ve always said, our user data belongs to our users. It is provided by them and unless they have restricted it, is available on our site. We don’t share personally identifiable information with third parties without user consent. We also have teams that help protect our members’ professional profiles from scraping, spamming and any other activity that violates our terms of service. We don’t have any business relationship with Rapleaf.

However, LinkedIn data ends up at Rapleaf and, via Rapleaf, at other services through scraping of the publicly available data. Some people with knowledge of the subject believe that alternative tactics are being used to get around the API limitations of services such as LinkedIn. (If you know more, please get in touch with me.)

To be clear, I don’t have old-fashioned notions about privacy on the Internet. I know the realities of today’s Internet life. In order to enjoy the convenience of using web-based services, one has to make some sacrifices, and living socially online will eventually lead to an erosion of privacy. However, what I find egregious is how the information is surreptitiously collected all over the web, then aggregated to be sold, without us having any control or ability to look into that data. Sure we can opt out, but only if we know that we’re being profiled. (Ironically, you have to register to opt-out.)

I don’t want to blame only Rapleaf — ad networks are doing this as well, giving it cutesy names like behavioral targeting. U.S. Reps. Edward Markey (D-Mass.) and Joe Barton (R-Texas)  recently sent a letter to Mark Zuckerberg and Facebook, questioning him about privacy breaches at the social network. In August 2010, these same congressmen asked for information from various web services on cookies and how they use them. Maybe they should consider looking at these data-collectors as well. Perhaps they will come to the conclusion that this industry needs some kind of oversight.

Related content from GigaOM Pro (sub req’d):

You’re subscribed! If you like, you can update your settings

  1. Is Facebook’s Proposed User ID Solution Sufficient? Thursday, October 21, 2010

    [...] Check out this awesome piece by Om Malik highlighting how Rapleaf works. addthis_pub = 'biznickman';   Tags:Privacy [...]

  2. Regarding “how we can improve and better protect consumers” … how about providing a public interface where individuals can verify whether or not the information which RapLeaf has collected is in fact correct ?

  3. Irina Issakova (Rapleaf) Thursday, October 21, 2010


    My name is Irina Issakova and I work in Marketing at Rapleaf. Thanks for the comment! We actually have a way for people to see the information and manage it. You can go here to check it out:

    We continue to make improvements and appreciate any suggestions. You can email our CEO Auren Hoffman directly at auren.hoffman@rapleaf.com.

  4. Irina Issakova (Rapleaf) Thursday, October 21, 2010


    My name is Irina Issakova and I work in Marketing at Rapleaf. Thanks for your suggestion! We actually have a page where people can see the information and manage it. Check it out here:

    We continue to make improvements to this page and definitely appreciate suggestions. You can email Rapleaf CEO Auren Hoffman at auren.hoffman@rapleaf.com with ideas.

    1. Thank you for your follow-up, Irina.

      Yes, upon re-reading the article, I followed the link under the text “You can opt out …”.

      I appreciate the ability to investigate the status of information held on file. My only comment after doing so has to do with the fact that registering my e-mail address with Rapleaf was required prior to viewing the information.

      While I recognize that this requirement protects the holder of the e-mail account from public disclosure of Rapleaf’s “dossier” on a particular account, it also provides Rapleaf with an IP address which it can add to the information on file.

      What is the policy of Rapleaf with regards to the disclosure or use of IP addresses ?

      1. That’s a great question about IP addresses. We do not collect, store or manage IP addresses.

        Our CEO Auren Hoffman wrote about this in July:

      2. @IrinaIssakova:

        Thank you for your reply.

        In the article which you referenced, Mr. Hoffman writes:

        “IP addresses should be thought of as privileged information. From our tests, IP addresses perfectly identify about 30% of U.S. households. That means that from IP address, a site can know your exact address.”

        This raises a contradiction: you can not run ‘tests’ which determine that “IP addresses perfectly identify about 30% of U.S. households” *without* collecting and storing IP addresses, yes ?!

    2. You actually can’t see most of the info Rapleaf has, even if you sign up. I did. Rapleaf only shows generic stuff, like that I like “social networking” — not the fact that it actually has my ID for Facebook, Flickr, Twitter, Livejournal, etc.

    3. @Irina_Issakova: While most would appreciate the open tone in your comment regarding your service’s ability to view and opt-out of the data you store, its extremely ironic that *registration* (i.e. submitting data to you) is required to do this.

      As good as you make your company seem, its core is still rooted in an evil consumer data space.

      Perhaps you are one of the better ones, but any type of data gathering, especially ones that leverage and exploit publicly available content and social profiles for the purposes of selling insights or information to other companies – you simply can’t position yourself with any “good” sentiment. When one reaches your scale, you are extremely “evil” because more insights can be extracted about more people; data = power.

      Behavioral and profile-based targeting or even something as simple as email verification *requires* business practices which are questionable and that most consumers would oppose to. Default opt-in is morally controversial activity in today’s technologically connected world. In this case, you are what you do unfortunately.

  5. We sign NDAs with our customers so we cannot release their names

    You only sign NDAs if you’re trying to keep something quiet. What are they trying to hide? Are these companies ashamed? They should be!

    I’ve just registered with rapleaf and looked at my profile for one of my email addresses. All I can say is… wow. They have a lot of data on me. I’ve opted out…

    Great journalism, Om. Thanks.

    1. Pete

      Thanks for the kind words. And I agree, this whole NDA this is a hairball and a lot more mess is hiding behind it. I am working on follow up post as well.

    2. Most companies require NDAs with vendors to prevent their competitors from learning about the details or existence of the vendor relationship.

      Signing an NDA in a customer/vendor relationship is not nearly as nefarious as you imply.

      Enterprise software companies sign NDAs with customers all the time, as a matter of routine.

  6. » What the web knows about me Pete Davies Thursday, October 21, 2010

    [...] suppose I shouldn’t be surprised. But wow. Thanks to this great piece by Om Malik, I’ve learned all about [...]

  7. I don’t see much of a concern with what Rapleaf does as long as they don’t violate any laws or anything. The information they gather is out there for the taking. Facebook is probably a much bigger privacy concern because they have real data. And there are other websites such as the http://www.dirtyphonebook.com that are probably even worse threats to personal privacy. Google also seems to be dumping their “Do no evil” pledge and should be closely watched.

    @Irina, I’m glad to see you mention that the suggestion about managing your personal information is a possibility. I didn’t know that about your service. Thanks.

  8. This article misses the point.

    The social networks have more value and scale faster if everyone is willing to make their information public. They also make this information available to search engines, because it helps them scale even faster. Search for someone’s name on google and you will see their linkedin page. Enter an email in facebook and you will see the public information on this person. This information is public.

    The problem isn’t Rapleaf and other companies that scrape publicly available data on the internet. In fact, google’s whole business model is based on scraping content that does not belong to them.

    The problem is that this information is public and it was in the social network’s best interest to make it public and people are more interested in connected with other people and do not grasp the implications of making their information public.

  9. Om, good stuff. Rapleaf is run by a Russian born entrepreneur and has a decidedly Russian business plan. I have issues with their lack of transparency and feel like its the next Offerpal-like story built around Facebook. The data that important here is Facebook not the other networks and Facebook needs to police and authenticate this in building its brand.

  10. Just as an experiment, I’ve decided to completely opt out (I think) of Rapleaf’s database. I’m not so concerned about privacy, security, etc. I just want to see if I notice what will happen with display ads, etc.

    1. So how will you be able to tell a difference?

      1. I’m actually not sure. Will I be be served relevant display ads? Will I not be retargeted anymore? (I find retargeted ads annoying). Can anyone point to a downside of opting out of Rapleaf altogether?

Comments have been disabled for this post