Archive for the 'googlebots' Category



The best WordPress SEO robots.txt

Jeudi 26 avril 2012 @ 2:10

Robots.txt for Silo SEO

The Robots.txt file is like a roadmap, each road is an access point and by using Robots.txt you can control access to specific roads.

Robots is referring to bots, a bot is generally an automated crawler that goes through your site by locating and following links then drawing out a roadmap.

The problem with bots is they don’t know when and where to stop, they have zero intelligence, they just collect the data of a site then take it back home for the bigger brothers to crush up and analyze, the bots bigger brothers are the more powerful automated machines that sit in server farms and crush numbers based on rule sets in this case algorithms, these algorithms are used to determine what your site is about and eventually determine its ranking in the SERPS (Search Engine Result Page).

The problem is they can both start to get confused if you have a messy site structure, WordPress in this case contains so many roads that overlap it can and does make our sites appear more like a set of roundabouts.

 

Silo SEO WordPress Robots.txt

Never allow indexing of your cgi-bin for the love of god.

  • User-agent: *
  • Disallow: /cgi-bin

Next up we need to tell the bots not to bother indexing our private WordPress directories.

  • Disallow: /wp-admin
  • Disallow: /wp-includes
  • Disallow: /wp-content/plugins
  • Disallow: /wp-content/cache
  • Disallow: /wp-content/themes

We also need to block access to our feeds, why would we want the bots crawling through our feeds right? We want them crawling our onsite content so its ranks well in the SERPS.

  • Disallow: /feed
  • Disallow: /*/feed

Next up is comments, we want to treat our comments as part of the on site content, not in our comment feed.

  • Disallow: /comments

You also don’t want the bots indexing author archives, because it just adds more and more onsite duplicate content.

  • Disallow: /author

Another one that adds duplicate content is our tag archives.

  • Disallow: /tag

And believe it or not the date archives are also a problem for SEO, so lets just block the entire archives out of the search engines.

  • Disallow: /archives

And just to make sure the bots don’t go near the date archives put this in.

  • Disallow: /2010/*
  • Disallow: /2011/*
  • Disallow: /2012/*

You also don’t want any iframes being indexed NOTE this is pointless unless you create an iframe directory.

  • Disallow: /iframes

In the Basic Bogan Training Module 3.2 WordPress SEO we structure our sites using the .html extension, you can block these at the robots.txt level, for example dont index my contact page/privacy policy/web site agreement. I don’t like my footprints getting indexed, so I block most of this stuff out before it even reaches the index.

  • Disallow: /privacy-policy.html
  • Disallow: /web-site-agreement.html

You also don’t want your categories being indexed, we cut this out in the Basic Bogan Training, but you can do this here also, note don’t add this to your robots.txt unless you have followed along in module 3.2 WordPress SEO in the Bogan Basic Training.

  • Disallow: /category/*/*

And forget indexing trackbacks

  • Disallow: */trackback

Cool now we are looking sweetin terms of WordPress Silo SEO.

But you also don’t want certain file types being indexed for example type this into Google.

Google filetype:xlsx

Scary right, I can remember doing all sorts of crazy stuff with this back in the day, people had no idea Google was indexing file types.

Here is a good start of file extensions to start blocking, you can make your own file extensions up and block them so you can store hidden files, works well.

  • User-agent: Googlebot
  • Disallow: /*.php$
  • Disallow: /*.js$
  • Disallow: /*.inc$
  • Disallow: /*.css$
  • Disallow: /*.gz$
  • Disallow: /*.wmv$
  • Disallow: /*.cgi$
  • Disallow: /*.xhtml$
  • Disallow: /*.xlsx $
  • Disallow: /*.doc$
  • Disallow: /*.pdf$
  • Disallow: /*.zip$

Because we blocked all wp-* directories you will need to update your wp-content/uploads to another directory, I suggest you just create images.

  • Allow: /images

Now just add a link to your site map, take this out for mass blog installs, you will need to install the XML Sitemap plugin ti generate this file.

  • Sitemap: http://yourdomain.com/sitemap.xml.gz

That’s it’s your now solid, forget paying for Silo plugins or whatever, if you want more I suggest you check out Module 3.2 WordPress SEO in the Basic Bogan Training so you can get your permalinks perfect for SEO.

Below is the full robots.txt file, if you copy and past the code below into a .txt file called robots.txt and upload it into your sites root directory the bots will treat your site as a Silo SEO wordpress blog.

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /feed
Disallow: /*/feed
Disallow: /comments
Disallow: /author
Disallow: /tag
Disallow: /archives
Disallow: /2010/*
Disallow: /2011/*
Disallow: /2012/*
Disallow: /iframes
Disallow: /privacy-policy.html
Disallow: /web-site-agreement.html
Disallow: /category/*/*
Disallow: */trackback
User-agent: Googlebot
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.xlsx $
Disallow: /*.doc$
Disallow: /*.pdf$
Disallow: /*.zip$
User-agent: *
Allow: /images
Sitemap: http://yourdomain.com/sitemap.xml.gz