Blog Post

Publishers Propose New Controls on Search Engines’ Access to Content

Publishers unveiled a proposal today to establish more flexible rules governing how search engines index their content. Currently, the AP notes, publishers can use a “robots.txt” file to tell a search engine not to index a given page. The plan, announced at an NYC gathering of a publishers’ consortium today and known as Automated Content Access Protocol (ACAP), would give publishers more say in what search engines could do. Rather than a simple do/do not index request, publishers could come up with specific rules, such as how long a search engine could retain content, or what links it can follow. AP CEO Tom Curley said that the technology could play an important role in blocking sites that distribute content without permission. A Google (NSDQ: GOOG) spokesperson said the company still needs to evaluate ACAP to ensure that it would work for a wide variety of websites, not just those that are backing it. Search Engine Land’s Danny Sullivan said robots.txt ”’certainly is long overdue for some improvements.’ But he questioned whether ACAP would do much to prevent future legal battles.”

2 Responses to “Publishers Propose New Controls on Search Engines’ Access to Content”

  1. It's disturbing that a group wanting to make changes to a system knows so little about how the system currently works. Publishers already have the ability to tell search engines what links not to follow by using the REL property of the A tag and filling it with the value NOFOLLOW. Additionally, they can use a META tag with CONTENT of EXPIRES to tell browsers, caches, and robots when that page's content expires. Additionally, Google has created a META tag of their own named "unavailable_after" which tells it when to stop indexing the page.

    We don't need a new standard, we just need to educate people how to use the ones that already exist.