Blog Post

Much Ado About Comscore…

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Did you get a load of the PR game being played out between Jeevansathi and Bharatmatrimony, using comScore (NSDQ: SCOR) figures? On March 4th, Info Edge’s (BOM: 532777) matrimonial site issued a press release stating that it’s number one in the matrimonial market based on comScore numbers, claiming that it’s leading in “Unique Visitors, Total Page Views, Time Spent and Average Pages per Visitor”. A day later competitor from the Consim Group countered (doc file), quoting comScore, Alexa, and Juxtconsult to indicate that it’s number one, claiming that the number of their unique visitors is equal to the sum of unique visitors received by all the national matrimony players (for that period) and also double of the nearest competitor.

But then how does comScore arrive at these numbers? We contacted both Info Edge and Consim for a complete comScore report with methodology, and after quite a few reminders, both companies sent us different figures – Jeevansathi (jpg) and Bharatmatrimony (Excel)…but no methodology. So we contacted comScore for details on their methodology, and Owen West, Account Director, sent back a marketing presentation with no specifics of methodology, just that “our panel size in India is currently 16,000 strong.” So how many from this fairly small sample of 16,000 use matrimonial sites anyway?

We’ve received no information from comScore, Info Edge or Consim about the sampling methodology for India, the demographic information (age, sex, income, education) of the audience surveyed (except that respondents are in the 15+ age group), and the geographic information (region, city size, population density, urban/rural)…And they expect this data to be taken seriously? At the end of the day, who benefits? I’d say that with the increased attention, both Consim and Info Edge benefit, and gullible publications and blogs bite the bait without doing their own due diligence.

We’ve again asked comScore, Info Edge and Consim to explain the methodology. We’ll update you if that happens.

Update: Murugavel, CEO of the Consim Info has told contentSutra that “People always make a mistake (intentionally?) of comparing BharatMatrimony traffic leaving our 15 regional sites”, inferring that the comScore data used by Jeevansathi did not include their 15 regional sites. However, there’s nothing on the comScore methodology and demographic data.

Disclosure: I have an inconsequential number of shares of Info Edge

6 Responses to “Much Ado About Comscore…”

  1. Tameh Kaseer

    I think IMRB International’s WAM, is covering the data for Cyber Cafe. From what I hear. IMRB has been very transparent with their methodology and sample recruitment. 

    ComScore, I agree, has not even mentioned the detailed explanation of their Universe. However, its still used country-wide across agencies. Hmm.. Interesting 

  2. Nikhil you are right, as long as there is no baseline study (to map demographics or other factors) done by any agency offline and the universe representations are not their in the online sample in the panel, the findings of any study and any research company can be misleading.

    I am sure comScore being one of the oldest understands it, possibly today they are aggregating findings from India for region level or world level interest and their panel in India is to bring only a regional representations (APAC).

    I am sure when they look at Indian market specifically, they will make these corrections.

    Just an assumption though… may be comScore guys can clarify… they may not even bother to respond as they are anyway getting media mileage though negative :-)

    Nonetheless, "the need to become No.1 and even communicate that we are No.1" will remain prevalent among the Indian Internet Businesses as they are just graduating from Sales to Marketing Era. They also rely on PR as they borrow monies for so called marketing (basic awareness building) anyway from someone called VC who keeps pressurizing them for revenues… it is a vicious cycle today, I guess we should not worry much… all this will change over the period with the verticals within online industry maturing in business.

    About research companies… they are anyway in a paracite business, if the industry won't pay they can improve their methodology… I doubt if any of the above two subscribing to ComScore access… the log-in is expensive. They nust be using the bits of data that comscore, alexa, JuxtConsult etc. distribute for free.

  3. comScore Media Metrix Methodology Overview

    Audience Measurement

    comScore provides industry-leading Internet audience measurement that reports details of online media usage, visitor demographics and online buying power for the home, work and university audiences across local U.S. markets and across the globe. Using proprietary data collection technology and
    cutting-edge methodology, comScore is able to capture great volumes of extremely granular data about online consumer behavior, including:
    • Actions (starts, stops, clicks etc.)
    • Audience behaviors (exposures, time spent etc.)
    • Consumer behaviors (shopping, commerce)
    • Online behaviors (IM, email, gaming, streaming etc.)
    comScore deploys passive, non-invasive measurement in its collection technologies; projects the data to the universe of persons online; and continuously strives to identify, understand, quantify, and eliminate bias to the maximum extent possible.
    The following are the core steps in the comScore methodology:
    1. Establish the universe via enumeration
    2. Obtain respondents via online recruitment
    3. Collect data
    4. Identify the User
    5. Projection and Bias Elimination

    Establishing the Universe via Enumeration
    comScore conducts a monthly enumeration survey by telephone collecting information on detailed demographics and Internet usage such as:
    • Personal demographics (age, gender, education, etc.)
    • Internet usage status
    • Connection speed
    • Census region
    • Household size
    • Computers in home
    • ISP
    • Operating System
    • AOL usage
    • Work usage
    Each month comScore uses data from the most recent wave of the survey and from the 11 preceding waves to estimate the proportion of households in the U.S. with at least one member using the Internet and also the average number of Internet users in these households. We then take the product of these two estimates and multiply by a Census-based estimate of the total number of households in the U.S. to get an estimate of the total number of Internet users.

    Obtain Respondents via Online Recruitment
    comScore uses an array of online recruitment techniques to acquire the members of its panel. These include affiliate programs and partnering with third party applications providers who meet comScore’s quality standards. To recruit people for the calibration panel, which is used to eliminate effects of bias in the panel recruited online, comScore adheres to stringent standards for recruiting a probability sample. In all cases panelists opt-in through a registration process that includes a stringent privacy practice.

    Collect Data
    During registration, panelists configure a software agent which allows comScore to “see” user activity at the machine or screen-side resulting in a view of the user experience, as opposed to site-centric measurement. This software yields not only the URLs of web pages requested by users but also information such as search strings, products purchased and referral requests. As a result, comScore can capture just about anything exchanged using the HTTP and HTTPS protocols and others such as
    streaming, AOL proprietary and IM environments.

    All of the panelist’s Internet activity is captured regardless of type of browser used (Note: the comScore panel only includes computers running a Win32 operating system, and does not include computers using other operating systems such as McIntosh or Linux). Activity is captured regardless of whether an Internet connection is established via a commercial Internet Service Provider (ISP) or an office-hosted LAN.

    Data capture and reporting are conducted in adherence to strict, industry-leading privacy protection policies. Data about user identity is stored in an encrypted, access-controlled database. Internet audience and behavior data is reported only in aggregate form.

    Identify the User
    Except for people in the calibration panel, comScore does not ask the people in its panel to identify themselves when they use the Internet. Instead, comScore infers who is at a computer at any point in time, using data that include biometric measurements (measurements of keystrokes and mouse clicks), the time of day that the computer is being used, and text strings in the data being accumulated (such as first names in forms being posted.) Consequently, comScore’s panelists are not constantly reminded that their Internet use is being monitored and so the monitoring is much less likely to influence their use of the Internet.

    Projection and Bias Elimination
    comScore calculates and applies weights to the data accumulated for panelists when aggregating the data to get the measurements it publishes. One purpose of these weights is to project measurements made across the Internet users in the panel to the much larger number who are not. The other purpose is to eliminate bias that may occur when online recruitment yields disproportionately few or many people from some segments of Internet users (for example, too many intensive Internet users and too few Internet users from high income households). Panelists from a segment that is more poorly represented get bigger weights and those from a segment that is over-represented get smaller weights.

    Targets for the distribution of these weights have two sources: the enumeration survey and the calibration panel. The enumeration survey provides targets for the distribution of weights across categories of demographic variables, such as gender, age group and household income. The calibration panel is a panel of Internet users recruited using methods that comply with stringent standards for obtaining a probability sample. comScore’s software is installed on the computers of people in the calibration panel, and the data accumulated for these panelists is used to derive targets for the distribution of weights across categories of behavioral variables, such as total minutes of Internet use.

    Page View Definition and Methodology

    Instant Messengers Legacy Entities explantion for MMX [pdf]

    A Page View is defined as a page that has been fully loaded into a browser. In a general sense, a page consists of an HTML file plus all of the images/objects requested by the HTML. Page Views are also counted in online services (e.g. AOL and MSN Explorer) and applications, as long as the content is loaded into a page.

    Page View Counting Rules
    Page Views represent the act of a user requesting a page (e.g. HTTP, HTTPS, AOL) from a site and the transmittal of that page to the user through the browser or online service.

    What is Included in a Page View?
    • All pages with a browser status code between 200 and 299
    • In the case of redirects, only the destination page
    • Http:// and Https:// requests
    • AOL://protocols from the AOL Proprietary Service
    • MSN://protocols from the MSN Explorer Service
    • Framed Pages (see explanation below)
    • Locally cached pages (have a return code of 304)
    What is Not Included in a Page View?
    • All pages in a redirect, with the exception of the destination page (have a return code of 302)
    • URLs stopped by a user or partially downloaded pages
    • URLs with the following extensions: .GIF, .JPG, .VBS, .BMP, .ZIP, .RAM, .MOV, .MP3, .MP2, .AVI, .MPG, .WAV, .PDF, .PNG, .SWF
    • All URLs used for File Transfer Protocol ("FTP")
    • Ad Banners and Pop-Up or Pop-Under pages
    • Refreshed and auto-refreshed web pages
    • Web crawlers, spiders, bots or other automated engines
    • All streaming media URLs
    Duration Methodology
    Based on millions of page duration observations, comScore has determined that the following heuristics best explain audience behavior:
    • The average time between 2 consecutive web pages is quite small: 26 seconds (or less than half of one minute).
    • For a general-purpose web site, 98% of pages have duration of less than 2 minutes. In addition, 99.8% of pages have durations of less than 10 minutes.
    • comScore conforms to the industry standard, which considers that any gap of more than 30 minutes is an indication of inactivity, and signals that a session has ended. In such a case, the last web page of the session gets a credit of 1 minute (double the web page average).
    • There are exceptions for games, news, and email sites, as well as instant messenger applications, which are detailed below.
    In general, there is a 2-minute cap applied to all web pages excluding the types of sites and applications listed in fourth bullet above. This rule applies remarkably well as approximately 98% of total duration is captured within a 2-minute interval. However, we have noted some specific exemptions. In particular, content intensive pages (news, careers etc.) sometimes have a greater than two minute duration. An "engaged" user could spend longer than 2 minutes on a page to read a complex article, for example.

    We have identified this "engaged user" phenomenon in our data and can identify them through a process called URL pattern-matching. This means that if the next URL pattern requested by the user matches the current one, it is likely that the user is still active on the same site or site area. Once identified, these engaged users have the duration cap for that page extended to 10 minutes. For these types of pages, such as news or other content sites, the 10-minute rule works very well in capturing 100% of the cumulative pages and cumulative duration.

    The pattern matching process allows us to automatically identify when the engaged user rule applies without having to pre-select certain sites. This allows for a more consistent and fair treatment across the entire Internet.

    Exceptions to Duration Methodology

    NEWS SITES: The 2-minute cap is inadequate for news sites. Whereas 98% of typical pages fall below 2 minutes, only 83% of news site pages do so. On the other hand, 99.4% of news pages have duration of less than 10 minutes as the chart below illustrates. Consequently, a 10-minute cap is applied to News sites.

    E-MAIL SITES: A further exception was also empirically determined for e-mail sites. If someone is composing a complex e-mail online, it is entirely possible that a user could consume a considerable amount of time on a given page. We have determine that allowing a cap on e-mail sites of 30 minutes per page view is adequate to accurately represent over 99% of consecutive pages, and allowing for the growing trend of reading and writing e-mails offline, while eliminating the implausible "long tails" of the distribution where a user is likely to be inactive. The following chart illustrates the frequency distribution of duration for a typical e-mail site:

    The chart above implies the impact of differing potential duration caps on capturing the cumulative pages or duration behavior. For example, if this email site were capped at 5 minutes of duration per page, the data would reflect 96% of pages for the month yet only 57% of the recorded duration. The 5-minute cap in this example would understate this e-mail site's duration. However, at a 30 minute cap accounts for 99.7% and 92.9% of the pages and recorded duration.

    INSTANT MESSENGER APPLICATIONS: The distributions with respect to IM usage were similar to the e-mail applications. The comScore Media Metrix proprietary technology recognizes that many Instant Messengers pop up but are not actively used by the user. An "engaged IM user" is defined as someone who sends a message. The comScore technology allows comScore to see more data and provide a potentially more accurate read of the multi-tasking user experience. The following chart shows that IM pages have a high degree of variability.

    Because some IM sessions can have a high variation among the time interval between send commands measured by the comScore proprietary technology, we have extended the cap for the IM send interval to 30 minutes, again supported by empirical evidence that this number best reflects the capture of cumulative UV's and duration.

    GAME SITES: Game sites are also similar to instant messenger applications in that the interval between consecutive on-line events could be longer than normal. For instance a 2-minute cap covers only 82% of the pages, and it takes a 30-minute cap to cover over 99% of the pages. This is understandable since a user can download a game and play with it for a while without the need for a 'refresh' from the web or game server.

    Minimum Reporting Standards

    comScore Media Metrix 2.0 employs Minimum Reporting Standards to:
    • Limit the number of smaller sites available in Media pick lists, and
    • Enforce cell-level reporting requirements.
    This section describes the criteria that determine whether an entity is available and reported during a given month.

    Entity Availability in Media Pick Lists

    After the entire universe of reportable entities is determined using the rules in the previous section, comScore then applies Entity Availability rules in order to keep the Media pick lists to manageable lengths. In order to be available in the Media Metrix pick list in a given month, an entity must meet one of the following criteria for any given month:
    1. An entity must have > 30 Raw UVs from the US panel, or
    2. An entity must have > 15 RAW UVs from the non-US panel
    An entity that meets either of the criteria is available in the Pick List for all locations. A property that does not meet these criteria will not be available for selection. Additionally, all children of this property will not be available for selection.

    An entity that by itself does not meet one of these criteria will not be available for selection in Media Metrix reports. Sites that have met the above criteria in previous months (but do not in the current month) would be available in the Media Metrix report output if its parent satisfies one or both of these criteria and the entity satisfies the Cell Level Reportability rules below.

    Cell Level Reportability

    Cell Level Reportability helps to ensure that we do not report measures based on less than reliable sample sizes for the entity.

    Any entity meeting requirement 1.) or 2.) above is also subject to cell level reporting requirements. A cell is defined as a specific value in a report that is based on a specific country, location and target audience.

    These cell-level requirements vary depending on the measure being analyzed:

    Sample Size "Insensitive" Measures

    Sample size "insensitive" measures are defined as measures that can be accurately projected and reported, even though the aggregate sample size for that sample is small. Sample size insensitive measures will be reported for cells that have > 1 Unprojected Unique Visitors. The sample size for a measure that describes the visitors to a web entity is effectively the number of visitors in the entire audience. So we can report UVs (and measures derived from UVs, such as % Reach and % Composition UVs) for any web entity

    Sample size "insensitive" measures include:
    • Total UV
    • % Reach
    • % Composition UV
    • Composition Index
    • Average Daily Visitors
    Sample Size "Sensitive" Measures

    Sample size "sensitive measures" are those that can accurately describe an entity's visitors only if we have a reasonable number of visitors to the entity. To accurately describe visitor's behavior, a more robust sample size is required. Sample size sensitive measures will only be reported for cells that have > 30 Unprojected Unique Visitors. The 30 Unprojected UV requirement is implemented regardless of whether entity Availability rule 1.) or 2.) was implemented.

    Sample size "sensitive" measures include:
    • Average Usage Days per Visitor
    • % Composition Pages
    • % Composition Minutes
    • Total Pages Viewed
    • Total Minutes
    • Average Minutes per Visitor per Usage Day
    • Average Minutes per Visitor
    • Average Minutes per Page
    • Average Pages per Usage Day
    • Average Pages per Visitor
    Cutoff Rules for Media Metrix and XPC Entity Lists

    Media Metrix 2.0 leverages an expanded panel to report 500+ additional media entities previously unavailable in Media Metrix. These additional media entities, referred to as the 'XPC Entity List', will be reported in syndicated Media Metrix reports along with the currently available list of Media Metrix entities, the latter of which will continue to be reported off the Media Metrix panel.

    Properties that do not qualify for reporting off the Media Metrix panel are reported off the expanded XPC panel. To qualify for the XPC Entity List, a property must have achieved at least 70,000 Unique Visitors in the most recent month (projections are based on the expanded panel).

    The following rules are applied to determine from which panel properties (and their children) are reported:
    1. A property must have a three-month average greater than 120,000 projected UVs to be reported off the Media Metrix panel. The children of these properties will also be reported off the Media Metrix panel if they satisfy the cell level reportability standards outlined below.
    2. If a property does not have a three-month average greater than 120,000 projected UVs from the Media Metrix panel, but has at least 70,000 projected UVs for the current month, it and its children will be reported based off the XPC panel.
    3. If a property does not have > 70,000 projected UVs, it and its children will not be available in MM 2.0.
    XPC Entities will be represented in all Media Metrix reports with a lower case '(m)' next to the entity name to alert users that data for these sites are based upon the expanded XPC panel.

    Frequently Asked Questions

    The following are some FAQ's that will help explain the Reporting Standard rules:

    Q: Why isn't a particular media entity showing up in a MyMetrix report output?

    A: Because the media entity did not meet the cell level reportability cut-off rules in the country and month you have selected.

    Q: I cannot find a certain media entity in the pick lists or search on it, but when I expand its parent in a Key Measures report, I can see data for that entity. Why is that?

    A: If you can see a media entity in a report, this means it made the cell level reportability cut-off in that month. It was not in the pick list because it did not meet the entity availability cut-off.

    Q: So if a media entity didn't make the media pick list cut-off, but I can see it in a report, doesn't that mean the data available for the entity is invalid?

    A: No. We implement the entity availability cut-off merely to keep media pick lists to a manageable size. The cell-level rules ultimately prevent the reporting of unstable data. So if you see a number in the interface, it is based on valid sample. In the case of very small sites, sample size insensitive measures will be reported, but sample size sensitive measures will not.

    For More Information

    For methodology questions related to specific comScore products, review the product user guides available from the MyMetrix interface. Should you have additional questions please feel free to contact your account representative for more information.