Google index crawling

No Comments

 

 

Google crawling: how to optimize the crawl budget of your website

crawl-budget
On Google’s official webmaster blog, Google’s Gary Illyes wrote about crawl budgets and how they affect your website. Prioritizing the pages that should be indexed can help you to get high rankings for your more important pages. Two factors influence the crawl budget of a website:
1. The crawl rate limit

Crawling is the main priority of Google’s web crawler. The crawl-rate limit represents the number of simultaneous parallel connections Googlebot may use to crawl the site, as well as the time it has to wait between the fetches.

The crawl rate is influenced by how quickly a website responds to requests. You can also limit indexing in Google’s search console. Unfortunately, Google does not support the crawl-delay directive for robots.txt that is supported by many other bots.

2. Crawl demand

The crawl demand represents Google’s interest in a website. URLs that are more popular on the Internet tend to be crawled more often to keep them fresher in Google’s index. Google also attempts to prevent URLs from becoming stale in the index.

If a website moves to a new address, the crawl demand might increase in order to reindex the content under the new URLs.

The crawl rate limit and the crawl demand define the crawl budget as the number of URLs Googlebot can and wants to crawl.

How to optimize your crawl budget

Having many low-value-add URLs can negatively affect a site’s crawling and indexing. Here are some low-value-add URLs that should be excluded from crawling:

1. Pages with session ID’s: If the same page can be accessed with multiple session ID’s, use the rel=canonical attribute on these pages to show Google the preferred version of the page. The same applies to all other duplicate content pages on your site, for example print versions of web pages. The duplicates will be ignored then.

2. Faceted navigation (filtering by color, size, etc.): Filtering pages by color, size and other criteria can also lead to a lot of duplicate content. Use the robots.txt file of your site to make sure that these duplicates aren’t indexed.

3. Soft 404 pages: Soft 404 pages are error pages that show a “this page was not found” error message with the wrong HTTP status code “200 OK”. These error pages should use the HTTP status code “404 not found”.

4. Infinite spaces: For example, if your website has a calender with a “next month” link, Google could follow these “next month” links forever. If your website contains automatically created pages that do not really contain new content, add the rel=nofollow attribute to these links.

5. Low quality and spam content: Check if there are pages on your website that aren’t that good. If your website has very many pages, removing these pages can result in better rankings.

If you do not block these page types, you will waste server resources on unimportant pages that do not have value. Excluding these pages will make sure that Google indexes the important pages of your site.

What does this mean for your web page rankings on Google?

It’s likely that you do not have to worry about crawl budgets. If Google indexes your pages on the same day they are published (or a day later) then you do not have to do anything.

Google crawls websites with a few thousand websites efficiently. If you have a very big site with tens of thousands of websites it is more important to prioritize what to crawl, and how much resources the server hosting the site can allocate to crawling.

Crawling is not a ranking factor. There are many factors that are used by Google’s ranking algorithms. The crawling rate is not one of them.

SEO and online marketing have been a passion of mine for 20+ years and I pride myself on helping clients at a budget prices to fit any and all needs.

About us and this blog

We are a SEO & digital marketing company with a focus on helping our customers achieve great results across several key areas.

Request a free quote

We offer professional SEO services that help websites increase their organic search score drastically in order to compete for the highest rankings even when it comes to highly competitive keywords.

Subscribe to our newsletter!

Fields marked with an * are required

More from our blog

See all posts