Brian Wong (email@example.com)
Wed, 28 Jul 1999 02:00:35 -0700
From: David Woolley <firstname.lastname@example.org>
To: Brian Wong <email@example.com>
Cc: firstname.lastname@example.org <email@example.com>; MarcusJohn@aol.com <MarcusJohn@aol.com>
Date: Tuesday, July 27, 1999 11:45 PM
Subject: Re: SETI Helping outsiders learn, while focussing ourselves.
>Unless you do it the old way, using robots.txt, you can't rely on all
>spiders obeying - actually, I don't fully understand why MS invented
>this particular META tag, except possibly for people with only access
>to part of a site (robots.txt has to be at the root level).
The two mechanisms do seem redundant but the differences are subtle.
According to the Robot Exclusion Protocol, see:
the robots.txt allows for robot visibility at the site level while the META
tag allows for visibility control down to the page level. So use the
robots.txt for course control and the META tag for fine control of where the
Bots are allowed to visit. Large web sites can certainly make good use of
>by it; not that they would want to follow links to themselves.
>Also, in some contexts, I think it is possible to use pure HTML and some
>search engines may encourage this (others may consider it theft of service
>to bypass their home page, so check first).
to spider. But it seems to me that a spider that can't digest
resides within HTML comment blocks and hence should be ignored by anything
pocketed, search engines have accounted for this in their robots.
As for potential theft of service, this is certainly a possibility and of
course it is always good form to check with the owners of the site before
proceeding to use their links. As for this specific situation, with the SL
using Altavista, it seems a safe bet that Altavista(Compaq) will allow the
symbiosis since the more web page hits translate into more ad profits.
This archive was generated by hypermail 2.0b3 on Sun Aug 01 1999 - 16:28:47 PDT