Community Server and robots.txt

|

A couple or members lately have had issues (#1 + #2) with the way search engines have been indexing their sites.

There are few ways to optimize a Community Server installation for search engines.

  1. Make sure that your Site Name and Description are set properly in the Control Panel.
  2. You may want to edit the resources.xml file to adjust the way your title tag is handled.
  3. Something that I don't think that has been mentioned before is to create a robots.txt file

There are more than a few pages that exist in a Community Server installation that really don't need to be indexed by a search engine robot. A robots.txt file will keep the spiders off of the pages we don't want them to get.

Here is a starting point for a Community Server robots.txt file. If you have any files you think should be blocked from search engines just drop a comment and I will update the post.

Just copy and paste this text into a /robots.txt file:

#
#
#  robots.txt file for a Community Server based site.
#  Created by Sean Kearney http://www.carknee.com
#
#  

User-agent: *

###########################
# Pages with no real value
Disallow: /error.htm
Disallow: /login.aspx
Disallow: /logout.aspx

###########################
# Directories 
Disallow: /controlpanel/
Disallow: /msgs/
# We want them to get to user profiles, so comment below out
#Disallow: /user/ 

Disallow: /user/CreateUser.aspx
Disallow: /user/EmailForgottenPassword.aspx
Disallow: /user/EditProfile.aspx

# non-essential content (FAQ)
Disallow: /languages/

Comments

Hi Sean, Thanks! Your blog is great! You've got yourself a new regular reader. So I just copied all that text into notepad to create a robots.txt file. I am pretty technologically clueless - please bear with my very basic questions...So II just copy what you wrote exactly as it is? The Number signs and all (####) ??? Also, back to the feed problem and being listed in search engines. The feeds are located on some of the same pages as I have content. For example: [TOP of PAGE] Original article titles, w/ links to the articles [BOTTOM OF PAGE] Google feeds. So how do exclude just that feed part of the page from being read? I just want it to read the site description you directed me to create in the control panel, and bring it up.

Okie, You should check out http://www.robotstxt.org/ to get an understanding of how the robots.txt file works. In short, it enables you to tell search engines what pages NOT to index. You may also want to look into Google's Webmaster Tools: http://www.google.com/webmasters/sitemaps/ That will allow you to tell what pages Google SHOULD index. It also provides a way to give descriptions to Google.

OkobojiCommunity says:

Thanks Sean, just got my CS site set up tonight, and thought somebody else might have a quick list of excluded URLs http://www.okobojicommunity.com