Re: SETI Helping outsiders learn, while focussing ourselves.


Brian Wong (bw@logis.com)
Wed, 28 Jul 1999 02:00:35 -0700


----Original Message-----
From: David Woolley <david@djwhome.demon.co.uk>
To: Brian Wong <bw@logis.com>
Cc: seti@sni.net <seti@sni.net>; MarcusJohn@aol.com <MarcusJohn@aol.com>
Date: Tuesday, July 27, 1999 11:45 PM
Subject: Re: SETI Helping outsiders learn, while focussing ourselves.

>Unless you do it the old way, using robots.txt, you can't rely on all
>spiders obeying - actually, I don't fully understand why MS invented
>this particular META tag, except possibly for people with only access
>to part of a site (robots.txt has to be at the root level).

The two mechanisms do seem redundant but the differences are subtle.
According to the Robot Exclusion Protocol, see:
http://info.webcrawler.com/mak/projects/robots/exclusion.html
the robots.txt allows for robot visibility at the site level while the META
tag allows for visibility control down to the page level. So use the
robots.txt for course control and the META tag for fine control of where the
Bots are allowed to visit. Large web sites can certainly make good use of
both mechanisms.

>Javascript can be fatal to spiders, as they cannot follow links obfuscated
>by it; not that they would want to follow links to themselves.
>Also, in some contexts, I think it is possible to use pure HTML and some
>search engines may encourage this (others may consider it theft of service
>to bypass their home page, so check first).

Interesting. I imagine the ability to parse HTML/Javascript varies spider
to spider. But it seems to me that a spider that can't digest
HTML/Javascript is inherently defective because Javascript code generally
resides within HTML comment blocks and hence should be ignored by anything
but a Javascript compliant browser. It would behoove the spider developers
to include Javascript capability for there are additional links to search
for hidden within the Javascript code. No doubt the larger, deeper
pocketed, search engines have accounted for this in their robots.

As for potential theft of service, this is certainly a possibility and of
course it is always good form to check with the owners of the site before
proceeding to use their links. As for this specific situation, with the SL
using Altavista, it seems a safe bet that Altavista(Compaq) will allow the
symbiosis since the more web page hits translate into more ad profits.

Brian



This archive was generated by hypermail 2.0b3 on Sun Aug 01 1999 - 16:28:47 PDT