For more information... RTFM!
NAVIGATION
PAGES THAT LINK HERE
ACCOUNT LOGIN

You are not logged in

Powered by Interchange version 5.7.0

RobotUA

Defines a list of browser "user agent" names that will be recognised as crawler robots (search engine spiders).

Synopsis

RobotUA  ua_name [, ua_names...]

Scope

This directive is only available for use in the global (interchange.cfg) configuration file, and will affect all websites running under the Interchange installation.  It will not work in a website's local (catalog.cfg) configuration file. 

Description

The RobotUA directive defines a list of browser "user agent" names that will be recognised as crawler robots (search engine spiders).  Requests coming from the recognised spiders will cause Interchange to alter its behaviour to improve the chance of Interchange-served content being crawled and listed.

Note

Note

The NotRobotUA list will be consulted before the list configured with RobotUA to help weed out false positive matches.

This directive accepts a wildcard list;  The "*" character represents any number of characters.  and the "?" character represents any single character.  For example, "fo*ar" would match "foobar".

If a client is recognised as a robot, the following will be performed by Interchange:

  • The mv_tmp_session CGI value will be set true, causing sessions to be disabled and therefore avoiding need to read and write session data to from/to the disk.  This also causes Interchange to generate URIs without including a session ID.
  • The mv_session_id CGI value will be set to "nsession".
  • The mv_no_count CGI value will be set true, causing Interchange to generate URIs without including an incremental "page count" number.

Warning

Warning

Once you have discovered that you are serving a page to a robot, you should not use this knowledge to massively alter your page content in an attempt to improve your search results ranking.  Doing so will stand you a good chance of being blacklisted by the search engine maintainers.

Example

RobotUA <<EOR
    ATN_Worldwide, AltaVista, Arachnoidea, Aranha, Architext, Ask, Atomz,
    BackRub, Builder, CMC, Contact, Digital*Integrity, Directory, EZResult,
    Excite, Ferret, Fireball, GoogleBot, Google-Sitemaps, Gromit, Gulliver,
    Harvest, Hubater, H?m?h?kki, INGRID, IncyWincy, Jack, KIT*Fireball,
    Kototoi, LWP, Lycos, Mediapartners, MegaSheep, Mercator, Nazilla,
    NetMechanic, NetScoop, Nutch, Ocelli, ParaSite, Refiner, RoboDude, Rover,
    Rutgers, Scooter, Slurp, Snappy, Spyder, T-H-U-N-D-E-R-S-T-O-N-E, Toutatis,
    Tv*Merc, Valkyrie, Voyager, W3C_Validator, Walker, WhizBang, Wire, Wombat,
    WordPress, Yahoo, Yandex, ZyBorg, adressendeutschland, archive, appie,
    asterias, bot, ccubee, cfetch, contact, crawl, collector, dogpile, fido,
    find, gazz, grab, griffon, holmes, index, legs, marvin, mirago, moget,
    newscan, ozelot, pagebull, retrieve, search, seek, speedy, silk, spider,
    suke, swish, tarantula, agent, topiclink, urllib, voyager, wget, whowhere,
    winona, worm, wwwster, xtreme,
EOR

See also

Category:  Global config directives
Last modified by: Kevin Walsh
Modification date: Thursday 21 December 2006 at 11:50 AM (EST)
Home  |  Legal nonsense  |  Privacy policy  |  Donations  |  Contact us