Jimmy Wales, the founder of not-for-profit Wikipedia and for-profit, San Mateo, Calif.-based Wikia is part of a growing number of people who are discomforted by the growing control Google has over search. And he is doing something about it. His company, Wikia, last week bought the distributed crawler Grub from LookSmart and plans to make it available in open source. Not that LookSmart was really using it anyway — and they also did ad business with Wikia.
Wales’ bet: like Linux became a migraine for the monopolist of the last generation, open-source search tools will keep companies like Google honest. It is not an easy task, for Google is firmly embedded into our digital lives.
“Search is part of the fundamental infrastructure of the Internet. And, it is currently broken,” Wales said back in December 2006, when Wikia launched Search Wikia effort. “Why is it broken? It is broken for the same reason that proprietary software is always broken: lack of freedom, lack of community, lack of accountability, lack of transparency.”
Wales launched Search Wikia earlier this year, and the Grub acqusition is part of that strategy. (You can run Grub on your Windows or Linux-based PC, either in the background or as a screensaver.) Following the announcement, we spoke with Wales, who outlined that with Grub, and other tools such as Lucene, an open-source indexing software, innovation around search can thrive.
By marrying these search results and the human context provided by Wikia wikis, the final search results could actually become useful once again. Grub, Lucene and Nutch (a web crawler based on Lucene) are the powder and spark of the open search revolution.
Grub is not by any means the final move, and should be viewed as a first concrete step in a long-term strategy. Jeremie Miller, inventor of Jabber and XMPP protocol, who is leading the Search Wikia efforts (and also CTO of Wikia) gave a talk at OSCON about the architecture of open-source search. Miller pointed out that the monolithic search can be broken into three components, and interested parties could implement one or more of the three components.
The three components are - factories that crawl, present and present content; collectors who rate and rank content from multiple sources; and brokers who direct user queries to the collectors or factories. Miller believes that this is a five-year process. Grub is one of the many components that will be needed for building a truly open-source search infrastructure. The biggest hindrance to any search start-up taking on Google (or Microsoft, Ask or Yahoo for that matter) is the high cost of infrastructure.
Sure Amazon’s EC2 service has helped, but it isn’t enough. Google, thanks to its money machine, has been able to build an infrastructure that lets it crawl, index and show results at a faster pace. Even if a start-up comes up with a better alogrithm, it still needs to sink millions into infrastructure to just get into the business, and offer as fast of an experience as most people associate with Google.
Grub, on the other hand, is a way to build a massive, distributed user-contributed processing network. Another nascent but promising open-source P2P search engine, Yacy, coming out of Germany. (Also check out Faroo, a German P2P search start-up.)
Can it work?
Wales faces an uphill climb. First he has to ensure that there are enough people using Grub, and are more importantly, are hacking enhacements to the software. At the same time, he has to address other concerns, as pointed out by this commentator on the Search Engine Land and other blogs.
While Google might be impossible to beat in a full-frontal assault, it is vulnerable to smaller, more focused attacks. While Linux may not have been able to kill Microsoft, it has stolen opportunities from the OS giant. It has been particularly effective in the Internet infrastructure (data centers.)
Open source search can do precisely the same - take away opportunities from large search engines. Perhaps, like with Linux, we will see a shift away from Google, and venture capitalists, for long scared by the prospect of competing with Google, will loosen their purse strings.
If Linux ended up spawning devices as diverse as TiVo and mobile phones, open-source search can lead to many more specialized search engines, also called vertical search engines. Today, the cost of building a good vertical search engine is millions of dollars. However, building and operating a vertical search engine is not for the faint of the heart.
In an interview with Fast Company magazine earlier this year, Wales quipped:
“The other thing we’re looking to is some of the second-tier search companies,” he admits. “We’ve talked to–I can’t say who–different people, asking, would they be better off participating in a project that helps quality search results to become a commodity?”
Put it another way - Wales is hoping for death by a thousand cuts to the search incumbents.
More @ Resource Shelf.
46 comments so far
6:04 AM PT
Great article, very exciting stuff. You should fix the link to Grub, however. It’s grub.org not grub.oom. Also, looks like you forgot to close the link tag after nutch.org because it goes on for several paragraphs.
6:52 AM PT
You left an anchor tag open for multiple paragraphs.
7:12 AM PT
Wow. I agree a change to that extent will take time, but with the movement towards open source “everything” the internet might be ready to handle this. The real question is how far are hackers and software developers willing to go to really make this mean anything at all against giants like google?
7:19 AM PT
Andrew,
thanks for the catch. fixed it. something went wrong when posting in wordpress from my blog editor.
7:23 AM PT
Just when I blast you , you come out with a good albeit dated article.
nice.
7:24 AM PT
You need an editor. The grammar mistakes make this barely readable.
7:38 AM PT
The big issue with Wikia in particular is how many users will it find who are willing to contribute, considering that it is a for-profit entity. I for one, contributed heavily to Wikipedia, but will not touch Wikia.
On the other hand, Grub looks really interesting, and should be fantastic. However, I think Google wont be too worried, since it is good enough for most people, and it is far too entrenched in our collective mindset.
7:39 AM PT
[...] Wikipedia founder Jimmy Wales would like you to help him build the revenue for his new “for profit” venture Wikia. Wikia has acquired the distributed crawler Grub from LookSmart and Wales plans to make it open source. He’d like to invite the community to line his coffers. [...]
7:41 AM PT
Google vs Jimmy Wales & Open Source Search
This story has been submitted to Stirrdup. Your support can help it become hot.
8:07 AM PT
I bet Google is laughing hysterically now. Jimmy’s biting off more than he can chew. From the outside everybody’s a genius.
“You need an editor. The grammar mistakes make this barely readable.” He’s needs to learn how to write. In the short deadline world of blogging, the bloggers need to know grammar, know when to look things up in dictionaries (hyphen? open? closed?), and know the AP or NYT style manual.
8:48 AM PT
I for one look forward to replacing my Google toolbar with the Grub toolbar the day it is released. Google was good but now sucks, the search results have become so bad I have given up on searching
9:54 AM PT
“The grammar mistakes make this barely readable.”
Yup. I’m glad someone else noticed. Very annoying.
10:06 AM PT
You have mistakes in your article:
“…part of a growing number of people who are discomforted by the growing control Google HAS over search.” (the word HAS is missing)
“…it is impossible to get away from Google that WHICH is firmly embedded into our digital lives.”
(The word “which” is a more appropriate choice than “that”)
“…Wales launched Search Wikia earlier this year, and THE Grub acqusition is part of that strategy. ”
(the word THE is missing)
“…By marrying these search results and the human context provided by says Wikia wikis, the final search results could actually become useful once again”.
(What is supposed to folloy the preposition “by”? This sentence has structure issues)
“…Miller in his talk pointed out that the monolithic search can be broken into three components…”
(Either use a comma after “Miller”, or restructure as…”In his talk, Miller pointed out…”)
“…The three components are - factories that crawl, present and present content;”
(You use the word “present” twice)
“Google, thanks to its money machine has been able to build an infrastructure that lets it crawl, index and show results at a faster pace”
(You should add a comma after “machine”)
“Even if a start-up comes up with a better alogrithm, it still needs to sink millions into infrastructure, to just get into the business, and offer a fast-experience most people associate with Google.”
(This is a run-on sentence and would be best as two)
“Grub, on the other hand is a way to build massive, distributed user-contributed processing network, and can help offset with the power of a wiki to form social consensus, the open source Search Wikia project has taken the next major step towards a future where search is open and transparent. ”
(This is a perfectly terrible sentence. Missing words (the word “a” before “massive”) and unconnected fragments. It not clear what is being said)
“Another nascent but promising open source P2P search engine, Yacy, coming out of Germany”
(You are missing the word “is” in front of “coming”)
“Wales faces an uphill climb. First he has to ensure that there are enough people using Grub, and are more importantly are hacking enhacements to the software.”
(There are too many “ARE” words. This sentence also has structure problems)
“Perhaps like Linux, we will see a shift away from Google, and Venture Capitalists, for long scared by the prospect of competing with Google, will loosen their purse strings.”
(The preposition “for” is not needed)
10:12 AM PT
I agree with Bevis. You need an editor in a bad way. I am amazed that you have one awards for your writing.
Excellence in Journalism Ward
The gold award from American Society of Business Publication Editors
Senior Editor at Forbes.com
You have to be joking. There is no way that this article was written by a person with those credentials.
10:34 AM PT
I agree with the comments about the poor quality of the writing. Om, step it up! We expect much more from you!
11:44 AM PT
[...] Posted by comartslibrarian on July 30th, 2007 Wikipedia enters the search fray… Read more here. [...]
12:54 PM PT
At Om’s request I just went through and did a light copy-edit.
1:06 PM PT
hey guys,
sorry about the grammatical errors - post the wrong draft, and it was pretty late at night, so apologize for the mistakes.
I am sorry for the errors!
1:26 PM PT
@Amazed Reader: if you are going to criticize other people’s writing you could at least do something about your own. “I am amazed that you have one awards for your writing.”. I think the word you wanted was “won”.
A few grammatical errors and some spelling mistakes renders the article unreadable to some people? Now, that is amazing.
2:08 PM PT
This is a little ballsy of Wales to go after Google in such a way. It’s a little weird to see Wikipedia acting like the proud beggar with the hat in their hand meanwhile their boss is trying to figure out how to get rich.
And “lack of freedom, lack of community, lack of accountability, lack of transparency” is pretty rich. Aren’t those the major problems with Wikipedia?
3:20 PM PT
[...] In a post today Om Malik quoted Jimmy Wales from something he said in December 2006: ?Search is part of the fundamental infrastructure of the Internet. And, it is currently broken,? Wales said back in December 2006, when Wikia launched Search Wikia effort. ?Why is it broken? It is broken for the same reason that proprietary software is always broken: lack of freedom, lack of community, lack of accountability, lack of transparency.? [...]
3:45 PM PT
[...] Google vs Jimmy Wales, GigaOM [...]
6:40 PM PT
the irony of google is that if you type any subject, the results on the first page will most likely have a link to a wikipedia.org web page containing that subject.
7:45 PM PT
It seems like Grub wouldn’t be all that helpful for breaking news type content since there is an additional level of processing. If a search engine can’t be counted on for EVERY type of search, I don’t know how it will gain traction. However, I’d love to be proved wrong.
8:50 PM PT
It’s great that Wales is trying, but Wikipedia is fundamentally different from search - the difference between structured and unstructured data - so there’s no reason to believe that he will be any more likely to succeed than all the other smart people out there.
8:57 PM PT
The vast majority of readers are looking for up-to-date information on the Web from a trusted source. Om delivers on that promise, which is why I read his blog. I hope that Om doesn’t recruit “editors” to sanitize the personality of GigaOm (which would also slow things down) in order to appease a tiny minority of readers intoxicated with the finer points of English grammar.
9:12 PM PT
I was going to mention Nutch, but you mentioned it, oh well. I am a big fan of human powered search, but no one seems to have perfected this method yet. I had actually begun pursuing venture capital several years ago, but abandoned it for personal reasons at the time.
1:59 AM PT
[...] reads: Are Google’s days numbered? Wikia details plans for search rival to Google Google vs Jimmy Wales & Open Source Search Search Wikia Takes Steps To Crawl; Acquires Grub Wikia’s Outrageous Exploitation of the Human [...]
2:12 AM PT
[...] NowPublic來自加拿大溫哥華,這個我小時候曾待過八年的城市,有它可愛之處,但對這個城市以及當地的互聯網狀況,我也是非常了解。它的氣氛仍比矽谷要低迷許多,甚至不到三小時車程遠的西雅圖的氣氛的三分之一。我們知道Flickr來自溫哥華,但Stewart Butterfield樂於離開那邊到矽谷工作; NowPublic不像Jimmy Wales這種名氣,可以站出來抵抗Google的統一而投資人還繼續支持它買單,這筆1000萬美元資金表示,「老A牌」也可以是無國界的,就算台灣創業家覺得海外資金太遙遠,但台灣的投資人仍不會比海外少太多,不會完全沒有這樣的機會,只是估值會比較低一點,或許投資二百萬台幣要拿你40%的股票。但「老A優勢」依然存在,所以好好發揮老A優勢,而不是三寸不爛之舌,老A優勢會讓新創業家發出自然的柔和美光。 [...]
3:49 AM PT
I consider myself as a bit of a litmus test for change - I’m usually in the last 10% of people to “get it”; and I am “getting” fed up with google giving me advertising results instead of search results. I’m sold on the idea that an OS project can take on google. Just getting someone like me to write a comment should be a sign that ordinary people are ready to support something new. And no, it’s not lost on me that I read this article from a link from google-news, who obviously find it funny that such an aricle is being ridiculed becuase of grammar - not content.
4:20 AM PT
Informative and interesting article.
I’ve tried installing GRUB on my PC. But, as I’m accessing net through a proxy, GRUB couldn’t pass through. I will try again on my laptop.
As its already said, its a long way to compete with the search giants like google, et al. Nevertheless, its a credible start.
11:15 AM PT
Daily SearchCast, July 30, 2007: Microsoft Buys Ad Exchange; Search Wikia Gets Grub Crawler; Search Engines Try Games To Get More Queries & More!
Microsoft buys the AdECN display ad exchange; AuctionAds gets sold, too. Search Wikia cuts deal with LookSmart to get the Grub crawler in return for carrying ads. Everyone’s generating search traffic through games, and Microsoft and Google go head-to…
4:59 PM PT
[...] Google vs Jimmy Wales & Open Source Search Jimmy Wales, the founder of not-for-profit Wikipedia and for-profit, San Mateo, Calif.-based Wikia is part of a growing […] [...]
8:36 PM PT
[...] Sean Fisher notes in the comments here, the downtown Lebanese restaurant formerly next to the Terrace Hotel, Mejana, has moved to [...]
12:47 AM PT
We definitely need an alternate for Google for searching. And if Grub can fill the space nothing like it as it’ll get inputs from the open source communities. The search results should be more oriented towards the query’s context rather than just keywords or phrase. Great article.
1:52 PM PT
[...] Posts: Google vs Jimmy Wales & Open Source Search Share This | Sphere | Print Posts | Topic: Web [...]
3:01 PM PT
[...] Google vs Jimmy Wales & Open Source Search [...]
7:24 PM PT
[...] related idea was announced last week by Wikipedia’s Jimbo Wales: the Search Wikia search engine is making [...]
11:06 AM PT
[...] back into that to get the raw data and avoid doing their own crawling. Even GigaOm says “The biggest hindrance to any search start-up taking on Google (or Microsoft, Ask or Yahoo for that m…“ Let’s share those costs and further defray them by having a big player like Yahoo [...]
12:36 PM PT
[...] tie back into that to get the raw data and avoid doing their own crawling. Even GigaOm says “The biggest hindrance to any search start-up taking on Google (or Microsoft, Ask or Yahoo for that m…“ Let’s share those costs and further defray them by having a big player like Yahoo help [...]
8:19 AM PT
[...] editors. So far, the company doesn’t have anything to show for its efforts, but it recently announced the purchase of the open source web crawler Grub from LookSmart (remember them?). As part of the company’s PR efforts, Wales has tried to make [...]
1:25 PM PT
It’s going to be pretty tough for open-source companies like this to compete against giants like Google, but hopefully Jimmy Wales will succeed in keeping Google honest.
12:46 AM PT
[...] rightfully so. Nick Carr writes For the past year, Chief Wikipedian Jimmy Wales has been doing a lot of trash-talking about taking on Google in the search business. Now Google’s striking [...]
7:41 AM PT
[...] directly with Google, as Wikipedia is doing with its new search engine, and it may come back on [...]
3:14 PM PT
[...] технология web-crawling’а. TechCrunch | ReadWriteWeb | GigaOm | Mashable | [...]
7:56 AM PT
I am looking in future that Most of the people are using Grub toolbar or something else instead of Google Toolbar, Google will no longer be a standard in future. But in reality Google is undisputed winner of internet atleast today.
Leave a Comment