Dec 24, 2011

How to Create a WordPress Friendly robots.txt File

How to Create a WordPress Friendly robots.txt File  阅读原文»

Let's get one thing clear. Robots.txt isn't just a fancy file for webmaster-purists and professional SEOs. In fact, every WordPress developer should know a thing or two about the file and why it's so important for every blog's SEO.

robot

So first, here's the big question:

What is robots.txt and why is it important?

Speaking as the captain obvious: it's simply a file. But there is one interesting thing about it. It isn't displayed to the actual visitors anywhere on the blog itself.

Instead, it sits in the root directory of the blog and serves only one purpose. It is the file that search engines look at before they start crawling the contents of a blog. And the reason for looking at it is to find information on what they should and shouldn't be crawling.

So in essence, by using this file you can inform search engines what you want them to index and rank, and what you DON'T want them to index and rank.

The truth is that not every page (or area) of a blog is worth ranking. As a webmaster or a person working with WordPress you have to be able to identify those areas and use robots.txt as a place where you can speak to search engines directly, and let them know what's going on.

Creating robots.txt for WordPress

First of all, let me tackle the actual guidelines which you can find at codex.wordpress.org � this page in particular: Robots.txt Optimization. There's an example file. Here's the thing … don't use it as a template!

I'm not saying that it's completely bad, but it can create a lot of problems for some WP blogs. It all depends on your settings. Things like permalinks, category and tag bases. That's why you need to create robots.txt for each individual blog and be careful when you're dealing with a template of any kind.

Things you should always block

There are some parts of every WP blog that should always be blocked: the "cgi-bin" directory and the standard WP directories.

The "cgi-bin" directory is present on every web server, and it's the place where CGI scripts can be installed and then ran. Nowadays, some servers don't even allow access to this directory, but it surely won't do you any harm to include it in the Disallow directives inside the robots.txt file.

There are 3 standard WP directories (wp-admin, wp-content, wp-includes). You should block them because, essentially, there's nothing there that search engines might consider being interesting.

But there's one exception. The wp-content directory has a subdirectory called "uploads". It's the place where everything you upload using WP media upload feature gets put. The standard approach here is to leave it unblocked.

Here are the directives to get the above done:

Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-content/uploads/

Notice the small difference between the template at WP codex. They tell you to block “/wp-admin" (without the trailing "/" character). This can be problematic if you have your permalinks set to "/%postname%/" only. In this case every post with a slug beginning with "wp-admin-" won't get indexed.

I know that there's only a small group of bloggers that could have created such posts (the "blogging about WordPress" group), but as a WP developer you can't make any assumptions about what's going to happen on the blog you're working on after it takes off. That's why it's better to remember about the trailing "/" character here.

Things to block depending on your WP configuration

Every blog has a set of settings that are unique and need to be handled individually when creating the robots.txt file.

First thing is whether the blog uses categories or tags to structure the content … or both… or none.

In case you're using categories to structure your blog make sure that tag archives are blocked from search engines. To get it done first check what's the "tag base" for tag archives (Admin panel > Settings > Permalinks). If the field is blank then the base is "tag". Use this base and place it in a Disallow directive:

Disallow: /tag/

In case you're using tags to structure your blog make sure that category archives are blocked from search engines. Again, check the category base in the same place and then block it:

Disallow: /category/

In case you're using both categories and tags then don't do anything here.

In case you're using neither categories nor tags then block both of them by using their bases:

Disallow: /tag/
Disallow: /category/

Why should you bother? An honest question. The main reason here is the duplicate content issue. For example, if you're not using categories then your category archive looks exactly the same as your home page, i.e. there are two sites that are exactly the same but have different URLs:

yourdomain.com/
yourdomain.com/category/uncategorized

I'm sure I don't need to explain why that's bad. You have to make sure that such situation doesn't happen.

Next up is the authors' archive. If you're dealing with a single author blog then there's no point in keeping the authors' archive available to the search engines. It creates the same duplicate content issue as the tag-category thing. You can block author's archive by using:

Disallow: /author/

Files to block separately

WordPress uses a number of different files to display the content. Most of these don't need to be accessible via the search engines.

The list most often includes: PHP files, JS files, INC files, CSS files. You can block them by using:

Disallow: /index.php # separate directive for the main script file of WP
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$

(The "$" character matches the end of an URL string.)

However, be careful with this. It's not advised to block any other files (images, text files, etc.). That's because even if such a file is not placed in the uploads directory you probably still want it to be recognized by the search engines.

Note. If you used the "Allow: /wp-content/uploads/" line earlier on, then all PHP, JS, INC, and CSS files that are inside the uploads directory would still be visible to the search engines � nature of the Allow directive.

Things not to block

The final choice is of course up to you, but I would not block any images from Google image search. It can be done by a separate record:

User-agent: Googlebot-Image
Disallow:
Allow: / # not a standard use of this directive but Google prefers it this way here

Another robot to handle individually would by the Google AdSense robot, of course, only when you are a part of their program. In this case you need to make sure that it can see all the pages that your users can see. The easies way of doing this is by using a very similar record:

User-agent: Mediapartners-Google
Disallow:
Allow: /

Of course, the issue doesn't end with just these two examples. There are probably many more of them because every blog is different. Feel free to comment and point out some additional areas of a WP blog that shouldn't be blocked.

How to handle duplicate content

No matter what you do your blog will always have some duplicate content. It's just how WP is constructed, you can't really prevent it. But you can still use robots.txt to prevent search engines from accessing it.

There's a number of duplicate content areas on every blog, for instance:

Search results

This is what a search result page URL usually looks like for a WP blog:

yourdomain.com/?s=phrase

(Sometimes there're also some additional parameters after the search phrase.)

This is both duplicate content and content generated automatically � something Google really doesn't like. That's why it's good to block this by using:

Disallow: /*?

Apart from blocking the search results this directive blocks access to all URLs that include a question mark, but this shouldn't cause any problems when it comes to WordPress.

Trackback URLs

Some blogs use trackback URLs that are essentially duplicate content of the original post. Here's an example of a normal post's URL and its trackback URL:

yourdomain.com/some-post/
yourdomain.com/some-post/trackback/

To prevent search engines from accessing such content you can use:

Disallow: /trackback/
Disallow: */trackback/

Now why the duplicate statements? The fact is that the implementation of the Robot Exclusion Standard can vary for different robots. By using these two lines you can be sure that it's understandable for all of them.

RSS feeds

RSS feeds are just another example of content that's purely duplicate. You can eliminate it from search eng

阅读更多内容

该邮件由 QQ邮件列表 推送。
如果您不想继续收到该邮件,可点此 退订

zhxslsd案例:客户不喜欢带来负面信息的销售人员-都治柏

尊敬的i.am.weihua.1234,您好:

1. 为什么相同的产品,业务人员的业绩相差几十倍?

2. 为什么销售人员经常向公司申请政策支持,但业绩却很不理想?

3. 为什么不同的客户,销售人员说词千篇一律?

4. 为什么销售人员轻易给客户亮出自己的"底牌"?

5. 都知道要多听少说,但为什么遇到客户后连说都不会说呢?

6. 为什么销售人员总是误解客户要表达的意思?

7. 为什么销售人员总是以打工者的心态,不愿意更多的投入?

8. 为什么销售人员报销的费用越来越高,但业绩却越来越不理想?

      都治柏 ---2011-12-25--- 8:10:36

Dec 20, 2011

店长对员工激励制度不合理叫不动,怎么办3081370

1、系统提升店长的人格魅力。他想受伤。因为放水试探,所以,他一直都没有防御。眼神中闪过一
2、系统提升店长现场领导管理能力。把玩了一下,然后灵魂力量侵入其中,片刻后,手掌一翻,一卷由森
3、系统提升店长管理90后员工的执行能力。去了,你那个国王老丈人舍不得她离开,你就带不回来了?"傅书宝
4、系统提升店长公关应变谈判与处理现场问题能力。的话,只怕这时早就已经败北。不过亚巴顿仍在苦苦支持着,他有必
5、掌握一套分析问题的鱼骨图方法与PSP解决问题工具。畜生,那个家伙也是有着异火,为何偏要先寻上我?"脸庞一片铁青
6、掌握达成公司下达的目标任务与提升门店执行力的法宝。话,接下来,他将会遭受奥力的疯狂反击。又过了十天,防御过后,

Dec 18, 2011

[2:58:36]-成长型企业股权激励操作实务扂緻鄣屾稛?喚斕-慕容 坚发

◆为什么相当多的员工干活总是出工不出力?!凤鸣轩师兄弟、林千琴和灵风等重要弟子。凤鸣轩道:"天翎,跟大
◆为什么有的企业上下同心,有的企业却人心涣散?的目光,遥遥的盯着天空上的那头七彩巨鹤,隐隐间,能够看见鹤头
◆为什么有的企业顺风顺水,有的企业却危机四伏?个神秘的主人显然是一个老奸巨猾的家伙,艾米黛娜肯定是发现了他
◆为什么有的企业花重金培养员工,却成了竞争对手培养人才的黄埔军校?圆形工事的一起商议,定出了现在这个引蛇出洞,暗中埋伏的计划来
◆为什么有些企业老总天天喝茶钓鱼,员工却仍在自觉有序工作?开。"嘭!"金色巨拳过处,所有的斗技斗技,几半是在顷刻间爆裂
◆为什么公司下达任务时员工总是讨价还价?这位'徒弟'的罗恩,微微的叹了一气。明白龙天在担心什么,罗恩
◆为什么公司引进的新人总被"老油条"同化?开。"嘭!"金色巨拳过处,所有的斗技斗技,几半是在顷刻间爆裂

详情请查阅附件信息欢迎来电咨询!的目光,遥遥的盯着天空上的那头七彩巨鹤,隐隐间,能够看见鹤头