Behind the Isle of Wight Search


The Isle of Wight Search was created in 2004, and was developed in 4 days flat. 
Visit the website at www.iowsearch.co.uk.

The intention was to provide our early employees, a side project. 
This we achieved to a certain degree, until LanP3, our music jukebox took over that role.
Since the original development, one or two developers have gone in and added a few hours here and there to enhance it slightly and or fix a reported bug or typo, but mainly it is how it always was.

Here are some fast facts:

Programming Language: C# (with ASP.NET)
Platform: Microsoft .NET
Database: SQL Server 2000
Database size: around 800MB

Main Components

  • The Searching Indexing Bot
  • The Database
  • The Data Management
  • The Website Front-end

The Bot
First we created an Assembly and created a Console Application to serve as a test application. The Assembly, for the bot was dubbed BIOW.
BIOW stands for "Bot in other words" or as it has become known "Bot (for) Isle of Wight".

Although BIOW, may not compete in functionality with GoogleBot, it does do much of what you would expect it to do. As an example, if you read the robots.txt questions here, you will find that we do the same activities.

The process involves standard type activities. I do just want to add, that if we did this over a week or so more, I would have implemented the process into two distinct layers, get the data and process

  • Get the robot.txt from the root of the domain
  • Apply the Rules
  • Get page links from each page
  • Insert each page link as an entry in the Database under the domain
  • Parse the content, check for Header Tags, MetaData
  • Strip common words (2 layers) 1st layer includes extreme useless words such as " a", "the" and 2nd layer implements a little later with more terms
  • write to database two text fields, 1 for all html content, the other for Text only (html stripped)
  • When html is stripped, we do however increment "term hits" for terms that are bold, italic, underlined, h1, h2, h3
  • The number of links that point to this site? we dont bother. We felt that it was too small of a variable compared to the other things to look
  • The bot cycles through all pages, applying rules and creating some data for us.


The Database


We use SQL Server 2000. Full Text Search is a part of the tools we use. We did not want to spend an enternity designing our own.
The data is simple: Stores for Domain listings, stores for domain pages, and stores for page metadata and our own determinations.

Data Management
Using the database like this for Indexing and searching, makes the database a lot slower for users. Also corrupt, old or insignificant data can cause problems. The data management side of the search engine is no where near it should be but we do have the following direct functions.

  • Get Counts of pages and domains
  • Clean Words, Terms, the bot needed, but now we dont need. 
  • Defrags Indexes on Tables
  • Rebuilds FT Catalogues
  • Try to remove email addresses (we do this to stop it getting through to end users)
  • Backup, truncate, index logs, files.
  • Control some Bot options like Index Entire site, or just first level.

The Website

And this the final requirement, which as time was running out, we did a simple user interface, and a Banner ads section.
About a year after it went up, we did go back and "play" with teh website a bit more.

Conclusion

We did this in 4 days. It is not an earner of any kind. It currently is being used by around 60 unique visitors each day. We peak though during the Isle of Wight Festival where we have gotten up to 3000 visitors during the lead up months, and Cowes Week, where we came in at 3600. Impressions are huge, people come back and use it again and again.

We would enjoy spending more time on it, but as it does not earn anything, we find it hard to justify. However, we may in the future do something to improve it.