Sun, 7 January 2007
Intro: Right before the 2006 holidays Jimmy Wales, creator of the online encyclopedia Wikipedia, announced the Search Wikia project. This project will rely on search results based on the future sites community of users. In this podcast we take a look at popular search engine technologies and discuss the Search Wikia project concept.
Question: I know this project was
really just announced. Before we get into the technology involved - can you
tell us what phase the project is in?
Question: What makes this concept
fundamentally different than what Google or Yahoo! Are doing?
Question: This sounds a lot like digg - am I on the
Question: Can you provide a bit more
detail on how Google works?
Question: That's Googlebot, how does
the indexer work?
Question: So now that everything is
indexed, can you describe the search query?
Question: Can you run us through,
step by step, a Google search query?
1. User accesses google server at google.com and makes query.
2. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book--it tells which pages contain the words that match any particular query term.
3. The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.
4. The search results are returned to the user in a fraction of a second.
Question: OK, so now we know how
Google and Yahoo! How will this new Search Wikia type search engines work.
There are a couple of projects called Nutch and Lucene, along with some others that can now provide the background infrastructure needed to generate a new kind of search engine, which relies on human intelligence to do what algorithms cannot. Let's take a quick look at these projects.
Lucene: Lucene is a free and open source information retrieval API, originally implemented in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License.
We mentioned Nutch earlier. Nutch is a project to develop an open source search engine. Nutch is supported by the Apache Software Foundation, and is a subproject of Lucene since 2005.
With Search Wikia Jimmy Wales hopes to build on Lucene and Nutch by adding the social component. What we'll end up with in the end is more intelligent and social based search tools. Now, don't think Google, Yahoo!, Microsoft and all the rest are not working on these kinds of technologies. It will be interesting to watch how these new technologies and methods are implemented.
Wikipedia creator turns to search: http://news.bbc.co.uk/2/hi/technology/6216619.stm
Search Wikia website: http://search.wikia.com
Search Wikia Nutch website http://search.wikia.com/wiki/Nutch
Lucene Website: http://lucene.apache.org/java/docs/
Wikipedia Website: http://wikipedia.org/