Difference between revisions of "Aboutus:Bot"

(How do I remove my AboutUs.org page?)
 
(47 intermediate revisions by 10 users not shown)
Line 1: Line 1:
==What Is the AboutUsBot?==
+
{{DISPLAYTITLE:The AboutUs Bot}}
 +
The main job of the AboutUs Bot is to generate basic initial pages and analysis about websites. The bot pulls initial page data once when a page is first created. Website analysis may be pulled multiple times but is cached to prevent continuous access by the bot. We want the bot to be well behaved, if you are seeing otherwise please [[help/contact|contact us]] and let us know.
  
The AboutUsBot gathers descriptive information about a website from several sources to build a Wiki Page.  This pre-built wiki page gives website owners and AboutUs.org contributors a head-start in creating a useful and informative AboutUs.org page. 
+
==User-Agent String==
 +
The AboutUs Bot User-Agent string contains the following:
  
==How do I Opt-Out of AboutUs.org==
+
:: '''<nowiki>AboutUsBot/VERSION (PURPOSE; http://www.aboutus.org/Aboutus:Bot; help@aboutus.org)</nowiki>'''
 +
For example:
 +
:: '''<nowiki>AboutUsBot/Harpy (Website Analysis; http://www.aboutus.org/Aboutus:Bot; help@aboutus.org)</nowiki>'''
  
(Opt-out?  Since AboutUs.org uses lengthy passages of content from other sites, shouldn't site owners opt-IN?)
+
The current AboutUs Bot version is <strong>Harpy</strong>.
  
Using a '''robots.txt''' file, you can choose not to have your '''future''' AboutUs.org pages initialized with selected content from your website. This doesn't mean that we won't create a Wiki Page for your website.  Our users should still have the opportunity to contribute their own content describing your site, as well as adding their own reviews.
+
==Blocking the AboutUs Bot==
 +
Using a [[Learn/How-To-Use-Robots.txt|robots.txt file]], you can choose to not have the About Us Bot access your website. This doesn't mean that we won't create a page for your website.  Our members still have the opportunity to contribute their own content describing your site.
  
To prevent the AboutUsBot from collecting your site content in the future, please include the following lines in your /robots.txt file.
+
To prevent the AboutUs Bot from accessing your site in the future, please include the following lines in your /robots.txt file.
  
: '''User-agent: AboutUsBot'''
+
:: '''User-agent: AboutUsBot'''
: '''Disallow: /'''
+
:: '''Disallow: /'''
  
The AboutUsBot will include the following in it's User-Agent string:
+
The AboutUs Bot will also honor a rule like this in your robots.txt file:
  
: '''Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)'''
+
:: '''User-agent: *'''
 +
:: '''Disallow: /'''
  
Please note that the current AboutUsBot behavior is to visit each site only once to initialize the AboutUs.org page.
+
However; this rule will prevent all well behaved bots, including Google, from crawling your site.
  
==Other supported Opt-Out methods==
+
For more information about robots.txt read our [[Learn/How-To-Use-Robots.txt|robots.txt]] article.
 
 
The AboutUsBog will also honor a rule like this in your robots.txt file:
 
 
 
: '''User-agent: *'''
 
: '''Disallow: /'''
 
 
 
Yet another alternative would be to add either of these tags to your '''main page''':
 
 
 
: <META NAME="ROBOTS" CONTENT="NOINDEX">
 
: <META NAME="ROBOTS" CONTENT="NOFOLLOW">
 
 
 
These should give you plenty of ways to prevent AboutUsBot from including content from your site when creating new AboutUs.org pages.
 
 
 
==What about my address?==
 
 
 
Even though your address may be publically available in [http://whois.domaintools.com/aboutus.org WHOIS] & [http://www.alexa.com/data/details/main?q=aboutus.org&url=aboutus.org Alexa], if your website has a robots.txt file that denies access to AboutUsBot, we will honor your intentions and not publish your contact details on your AboutUs.org page.
 
 
 
Please be aware that if we have already published your address, it was because it was easily available to us through a popular 3rd party API service.  Your address is probably completely visible in your WHOIS record, and if you want your address to be kept private in the future, you should probably subscribe to an address protection service at your registrar.
 
 
 
==How do I remove my AboutUs.org page?==
 
 
 
If you completely erase your Wiki Page content, our editors will consider it vandalism and restore the page.  If you would like to remove your website content (Title & Description) and contact details (Address & Contact), please only remove '''the content''' in those sections.
 
 
 
Other sections, including the reviews, thumbnail, language, external links, and user-contributed content should remain.
 
 
 
: Ray -- so essentially, you will NOT honor your own words and statements of deleting a domain in its entirely and/or permanently blocking a domain name on request by its owner?
 
 
 
==How do I Opt-Out of the thumbnails?==
 
 
 
We have an arrangement with the folks at [http://www.domaintools.com DomainTools.com] to provide thumbnails for AboutUs.org.  Please check their website for information about their thumbnail service.
 
 
 
[[category:AboutUs Help]]
 
 
 
__NOTOC__
 

Latest revision as of 17:33, 18 March 2014

The main job of the AboutUs Bot is to generate basic initial pages and analysis about websites. The bot pulls initial page data once when a page is first created. Website analysis may be pulled multiple times but is cached to prevent continuous access by the bot. We want the bot to be well behaved, if you are seeing otherwise please contact us and let us know.

User-Agent String

The AboutUs Bot User-Agent string contains the following:

AboutUsBot/VERSION (PURPOSE; http://www.aboutus.org/Aboutus:Bot; help@aboutus.org)

For example:

AboutUsBot/Harpy (Website Analysis; http://www.aboutus.org/Aboutus:Bot; help@aboutus.org)

The current AboutUs Bot version is Harpy.

Blocking the AboutUs Bot

Using a robots.txt file, you can choose to not have the About Us Bot access your website. This doesn't mean that we won't create a page for your website. Our members still have the opportunity to contribute their own content describing your site.

To prevent the AboutUs Bot from accessing your site in the future, please include the following lines in your /robots.txt file.

User-agent: AboutUsBot
Disallow: /

The AboutUs Bot will also honor a rule like this in your robots.txt file:

User-agent: *
Disallow: /

However; this rule will prevent all well behaved bots, including Google, from crawling your site.

For more information about robots.txt read our robots.txt article.