1- On page content, title, description and page headlines: early when search engines began searching web content - page content was the most significant consideration for the search engine. The Altavista search engine which was the biggest in 1998 used on page content for searching documents. The basic rule for Altavista can still be considered today: Keyword should appear in the title, in the description and twice on the page, when the page becomes larger the keyword should appear more (altavista ideal page size was around 10K). The concept here to avoid search engine spam is that a page with nothing but repeated keywords is spam – the keyword should appear no more than what would be a natural density for that word and over usage of the keyword would tell the search engine the page is trying to game the search engine.

 

2- Theme: An easy way to understand what a theme is would be to look at a site in terms of a book. A book is not just about what you would find on a single page. A site would have many pages of information which relates to a group of terms or keywords. Two similar pages about “visiting Arizona” may be very similar to a computer based algorithm. But if one site only has a single page based on a persons visit to Arizona and another site contains hundreds of pages on visiting Arizona attractions; The computer algo can look and see that in terms of the two sites — the site with hundreds of pages is a better choice.

 

When thinking in terms of theme, don’t limit your thinking to the site! Pages can have their own theme based on the links they contain. The excite search engine, which replaced altavista in usage in 1999, used link themes. From a mechanical computer point of view it looked at all the pages that the page linked to and calculated how many of these pages contained information related to what the person was searching for. Excite nearly became the world biggest search engine but it and yahoo were replaced by google. If you try a few searches on google it is easy to see they do in fact consider themes. The first listing for “visiting Arizona” is not only first but google gives you four other pages on the first site that relate to the search.

 

3- Authority (AKA page rank) and Natural database order: Going into the first few years of 2000 Inktomi (the search engine that powers yahoo search) coined the terms authority hub and authority site (or content site). An authority in, simplest terms, being a site or page that contains the most links from the Internet at large pointing towards it. A hub would be a page (or site) that contains links to the information on a number of pages or sites (the yahoo directory would be a hub). Content would be a page with mostly information not links.

 

I am grouping authority and natural database order together because in terms of how databases work there is a natural fit between these two. Determining how much authority a page has requires more computer thought than scanning a database for what page contains a match for the keywords.

 

How authority is created: Infoseek back in 1998 was using a rather simple means to determine authority – it was basically just counting how many links pointed to a given page. When two pages containing the same title, description and similar page content where searched the page that had the highest number of links pointing towards it would be listed first. Google changed this by what they patented as page rank, a much more complicated calculation. The page rank concept can be understood in simple terms as saying that a link from a page that contains hundreds of links towards it is more important than a link from a page that contains fewer links pointing towards it. So one link from slashdot.org would give a higher ranking than 100 links from www.mysmallsite.com.

 

4- Site structure: I almost did not include this as a basic concept, because it is basically the same concept as authority … But reading the title of this information “Having the Search Engine correctly identify what web page to send traffic to” kinda of forces looking directly at site structure. In short if you have a table of contents (more often than not your home page) and all of the chapters link back to the table of contents your table of contents becomes the page with the most authority and therefore the page that should be listed first. I am saying should, If slashdot links to a chapter they would give that chapter more authority than your TOC but your TOC would get authority from the link from that chapter … resulting in that chapter being listed first when it is relevant based on page content and the table of contents being the more important when searches are done that do not match the page that got the slashdot link.

 

Next: A search engine based on just authority can be easily imagined. Take a nature office authority image. The president places documents in the highest directory, Managers place documents in a directory below the presidents, and employees place documents below that. If/when the president creates a 2nd document the latest document creation date is listed first in the database. In this format documents can be searched from top to bottom with the latest presidents documents first followed by manager documents followed by employee documents. However, to improve the search, from the viewpoint of somebody other than the president, we need to take a closer look at on page content – if i search for the title of a document I want that document listed first.

 

The next document in this series will look at on page content with an eye on Inktomi algos – who may have the best algos when it comes to on page content.

 

 

 

Popularity: 7% [?]

Leave a Reply

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word