Gordon And Mike's ICT Podcast
Perspectives on Technology and Education from Gordon F. Snyder, Jr. & Mike Qaissaunee

Intro: Right before the 2006 holidays Jimmy Wales, creator of the online encyclopedia Wikipedia, announced the Search Wikia project. This project will rely on search results based on the future sites community of users. In this podcast we take a look at popular search engine technologies and discuss the Search Wikia project concept.

Question: I know this project was really just announced. Before we get into the technology involved - can you tell us what phase the project is in?
According to the BBC Jimmy Wales is currently recruiting people to work for the company and he's buying hardware to get the site up and running. 

Question: What makes this concept fundamentally different than what Google or Yahoo! Are doing?
When Wales announced the project he came right out and said it was needed because the existing search systems for the net were "broken". They were broken, he said, because they lacked freedom, community, accountability and transparency.

Question:  This sounds a lot like digg - am I on the right track?
Yes you are - what you end up with is a digg like application, or what Wales is calling, a "people-powered" search site.

Question: Can you provide a bit more detail on how Google works?
Googlebot is Google's web crawling robot. Googlebot finds pages in two ways: through an add URL form, www.google.com/addurl.html, and through finding links by crawling the web.

Source: www.google.com 

Question: That's Googlebot, how does the indexer work?
Googlebot gives the indexer the full text of the pages it finds. These pages are stored in Google's index database. This index is sorted alphabetically by search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. This data structure allows rapid access to documents that contain user query terms.

Source: www.google.com

Question: So now that everything is indexed, can you describe the search query?
The query processor has several parts, including the user interface (search box), the "engine" that evaluates queries and matches them to relevant documents, and the results formatter.

PageRank is Google's system for ranking web pages. A page with a higher PageRank is deemed more important and is more likely to be listed above a page with a lower PageRank.

Source: www.google.com 

Question: Can you run us through, step by step, a Google search query?
Sure - this is also off of Google's site, Here's the steps in a typical query process:

1. User accesses google server at google.com and makes query.

2. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book--it tells which pages contain the words that match any particular query term.

3. The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.

4. The search results are returned to the user in a fraction of a second.

Source: www.google.com

Question: OK, so now we know how Google and Yahoo! How will this new Search Wikia type search engines work.
I can give some details based on what I've taken a look at. As we've said the Search Wikia project will not rely on computer algorithms to determine how relevant webpages are to keywords. Instead the results generated by the search engine will be decided and edited by the users.


There are a couple of projects called Nutch and Lucene, along with some others that can  now provide the background infrastructure needed to generate a new kind of search engine, which relies on human intelligence to do what algorithms cannot. Let's take a quick look at these projects.


Lucene: Lucene is a free and open source information retrieval API, originally implemented in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. 


We mentioned Nutch earlier. Nutch is a project to develop an open source search engine. Nutch is supported by the Apache Software Foundation, and is a subproject of Lucene since 2005.

With Search Wikia Jimmy Wales hopes to build on Lucene and Nutch by adding the social component. What we'll end up with in the end is more intelligent and social based search tools. Now, don't think Google, Yahoo!, Microsoft and all the rest are not working on these kinds of technologies. It will be interesting to watch how these new technologies and methods are implemented.

Sources: http://search.wikia.com



Wikipedia creator turns to search: http://news.bbc.co.uk/2/hi/technology/6216619.stm

How Google Works: http://www.googleguide.com/google_works.html

Search Wikia website: http://search.wikia.com

Search Wikia Nutch website http://search.wikia.com/wiki/Nutch

Lucene Website: http://lucene.apache.org/java/docs/

Wikipedia Website: http://wikipedia.org/

Direct download: searchwikia_FINAL.mp3
Category:podcasts -- posted at: 12:42pm EDT