»
S
I
D
E
B
A
R
«
SEO机器人文本很重要: Robots.txt Explained
April 19th, 2009 by Adam

A robots.txt file is important for SEO

robots.txtOne of the goals in Search Engine Optimization (SEO) is to educate Search Engines (SEs) about your website.  There are many common methods to provide SEs with information about your site, such as page titles, meta tags (e.g., description, keywords), and the use of Headers (e.g., h1, h2, etc.) and bold text.  However, one method that seems to be underused by many Chinese websites, is the use of a robots.txt file.

What can a  robots.txt do?

Simply put, a robots.txt file identifies any pages, folders, or files that you don’t want indexed by SEs.  The way it does this is by communicating to a Search Engine’s “spider” (i.e., the program that browses your website for the purpose of indexing) and telling it where not to go.  In this way, you can help to guide spiders to the important content that you want indexed while keeping hidden (i.e., non-indexed) unimportant, or irrelevant information.  Robots.txt is like a trusted host showing your house to invited guests, bringing them to the living room, kitchen, dining room and other important areas, but avoiding the laundry room and the junk closet.

How do I make a robots.txt file for my website?

One of the best things about the robots.txt file is that it’s incredibly easy to create or modify and doesn’t even require a strong mastery of English.  In fact, you can create a basic robots.txt file with only two phrases (see below).  All you need is a plain text editor, like Notepad, and you’re ready to generate your first robots.txt file.

Phrase 1 – Greeting all spiders with “User-agent: *”

User-agent: *

The above phrase is used to identify that the robots.txt is for all Search Engine spiders.  It’s also possible to target the spiders from specific Search Engines, but my goal this time is to keep things nice and simple.  Type the phrase “User-agent: *” (without quotation marks) in your text editor.

Phrase 2 – Blocking content from being indexed with “Disallow: /”

Disallow: /

This second phrase is used to tell the spiders that whatever follows is strictly off-limits, so that it won’t be crawled or indexed.  You can enter a folder, a page, or even a file to be ignored.  Simply follow a folder name with another slash (e.g., Disallow: /images/) and enter the URL location of a page or file, excluding the domain (e.g., Disallow: /thankyou.html).

Below is an example of what a simple robots.txt file might look like:

User-agent: *
Disallow: /images/
Disallow: /admin/
Disallow: /Scripts/
Disallow: /stats/
Disallow: /thankyou.html
Disallow: /template.html

The last step is to save the file as “robots.txt” and then upload the file to your website’s root directory (e.g., http://www.tomorrowmorrow.com).  Everything else is automatic!

With an effective robots.txt file in place, you can increase the chances of having more desirable pages indexed by preventing spiders from sneaking around in storage closets and other off-limits places.

If you have any questions regarding robots.txt, please feel welcome to e-mail me at adam(at)niulang-zhinv.com.

Adam
Co-Founder & VP of Services
打开后天SEO&Design

本文发自打开后天SEO&Design, www.tomorrowmorrow.com 转载请保留链接和出处.

Related posts:

  1. 让关键字如梅花箭竹-6处你还可放关键字的地方
  2. WordPress优化10个要点
  3. 提高Google排名的8个原则:谷歌前100网站八个SEO共同特点
  4. Hunch.com自动决策引擎 -拿不准主意,问Hunch?

Leave a Reply

»  回去打开后天SEO&Design
© 2009年打开后天SEO&Design