Community Server and robots.txt

A couple or members lately have had issues (#1 + #2) with the way search engines have been indexing their sites.

There are few ways to optimize a Community Server installation for search engines.

  1. Make sure that your Site Name and Description are set properly in the Control Panel.
  2. You may want to edit the resources.xml file to adjust the way your title tag is handled.
  3. Something that I don't think that has been mentioned before is to create a robots.txt file

There are more than a few pages that exist in a Community Server installation that really don't need to be indexed by a search engine robot. A robots.txt file will keep the spiders off of the pages we don't want them to get.

Here is a starting point for a Community Server robots.txt file. If you have any files you think should be blocked from search engines just drop a comment and I will update the post.

Just copy and paste this text into a /robots.txt file:

 

#
#
#  robots.txt file for a Community Server based site.
#  Created by Sean Kearney http://www.carknee.com
#
#  

User-agent: *

###########################
# Pages with no real value
Disallow: /error.htm
Disallow: /login.aspx
Disallow: /logout.aspx

###########################
# Directories 
Disallow: /controlpanel/
Disallow: /msgs/
# We want them to get to user profiles, so comment below out
#Disallow: /user/ 

Disallow: /user/CreateUser.aspx
Disallow: /user/EmailForgottenPassword.aspx
Disallow: /user/EditProfile.aspx

# non-essential content (FAQ)
Disallow: /languages/
Published Tuesday, February 13, 2007 2:31 PM

Comments

# re: Community Server and robots.txt

Sunday, February 18, 2007 10:52 AM by okie5

Hi Sean, Thanks! Your blog is great!  You've got yourself a new regular reader.

So I just copied all that text into notepad to create a robots.txt file.  I am pretty technologically clueless - please bear with my very basic questions...So II just copy what you wrote exactly as it is?  The Number signs and all (####) ???

Also, back to the feed problem and being listed in search engines.  The feeds are located on some of the same pages as I have content.  

For example:

[TOP of PAGE]

Original article titles, w/ links to the articles  

[BOTTOM OF PAGE]

Google feeds.  

So how do exclude just that feed part of the page from being read? I just want it to read the site description you directed me to create in the control panel, and bring it up.  

# CS Byte for February 18, 2007

Sunday, February 18, 2007 9:09 PM by Dave Burke

blog bits Sean Kearney with Search Engine Optimation tips and a sample robots.txt. Good advice from the

# re: Community Server and robots.txt

Monday, February 19, 2007 8:09 AM by Sean Kearney

Okie,

You should check out http://www.robotstxt.org/ to get an understanding of how the robots.txt file works. In short, it enables you to tell search engines what pages NOT to index.

You may also want to look into Google's Webmaster Tools:

http://www.google.com/webmasters/sitemaps/

That will allow you to tell what pages Google SHOULD index. It also provides a way to give descriptions to Google.

# Search Engine Optimation tips and a sample robots.txt

Wednesday, February 21, 2007 10:20 PM by Daily News List Blog

Sean Kearney with Search Engine Optimation tips and a sample robots.txt. Good advice from the man behind

# This Week's News for February 23, 2007

Friday, February 23, 2007 5:22 PM by Announcements

This week... A new Community Server-based site was released this week: NurseLinkup.com. A very clean

# This Week's News for February 23, 2007

Friday, February 23, 2007 5:30 PM by Community Server

This week... A new Community Server-based site was released this week: NurseLinkup.com. A very clean

# Search Engine Optimation tips and a sample robots.txt

Sunday, March 11, 2007 11:56 PM by Community Server Bits

Sean Kearney with Search Engine Optimation tips and a sample robots.txt. Good advice from the man behind

# re: Community Server and robots.txt

Sunday, March 18, 2007 1:35 AM by OkobojiCommunity

Thanks Sean, just got my CS site set up tonight, and thought somebody else might have a quick list of excluded URLs

http://www.okobojicommunity.com

# Viagra.

Saturday, July 04, 2009 10:28 AM by Viagra.

Generic viagra. Try viagra for free. Re viagra cello. Viagra.

Powered by Community Server (Non-Commercial Edition), by Telligent Systems