Tag Archives: Spider

Google’s new search index: Caffeine

On June 9th Google announced the completion of a new web indexing system called Caffeine. Caffeine provides 50 percent fresher results for web searches than Google’s last index, and it’s the largest collection of web content which Google offered. Whether it’s a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.

Some background for those of you who don’t build search engines for a living like us: when you search Google, you’re not searching the live web. Instead you’re searching Google’s index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here’s a good explanation of how it all works.)

So why did Google build a new search indexing system? Content on the web is blossoming. It’s growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average webpage are richer and more complex. In addition, people’s expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.

To keep up with the evolution of the web and to meet rising user expectations, Google built Caffeine. The image below illustrates how our old indexing system worked compared to Caffeine:



Google’s old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, Google would analyze the entire web, which meant there was a significant delay between when Google found a page and made it available to you.

With Caffeine, Google analyze the web in small portions and update its search index on a continuous basis, globally. As Google find new pages, or new information on existing pages, they add them straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.

Caffeine let Google index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.

Google built Caffeine with the future in mind. Not only is it fresher, it’s a robust foundation that makes it possible for Google to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. So stay tuned, and look for more improvements in the months to come.

Breaking the Myth about Page Rank (PR)

The most difficult challenge most web designers face is getting traffic to your site. There are plenty of companies who promise to send traffic your way. Sadly, most of this traffic is not qualified. Yes, your hit counter will move higher, however, if its not qualified, you may find you have unhappy visitors to your site. Unhappy visitors will not click on your ads or purchase your products.

Once you have optimized your site, consider submitting it to every search engine. If you want to get spidered quicker in Google, have a web page with a PR of 4 or higher point to your site. Your site will be spidered within a couple of days!

One myth I would like to bust is that PR is a measure of a web site. Its not. I receive countless emails offering a reciprocal link with their PR5 or PR6 site. Unless my link is appearing on the main page, or a page that has PR6, I am not getting a share of PR6. Most likely, my link will appear on a page that has a PR2!

Page rank is Google’s ranking of that specific page’s relevance. Just because the main page has a PR of 4, does not make every page on the site a PR4. Beware of sites who claim that they will exchange links with you and its to your benefit since they have a PR5 or PR6. Where is your link appearing? If its on a page that has a PR of 4 or 5 or 6, great!

Reciprocal linking, if done properly, will ensure that your keywords are at the top of the search engine. If you have a popular keyword, you’ll need to have more back links. Pick your link partners properly, and ensure that they are linking to your keyword.

For example: if your site is www.frenzilla.com, consider sending out requests to relevant higher ranking pages to start with, followed by lower ranking pages and ask web designers to link back in a manner so that your url is a hyperlink for your keyword, not your site url or site name.

Presuming their keyword is “best dining in new york”, having links pointing to your site with an anchor tag incorporating your keywords will improve your search engine rankings dramatically.

Once you have established a collection of sites pointing to your site using your keywords, you will start receiving reciprocal link exchanges from other sites. This is where you can start to be particular.

If you want to maintain an effective PR and attract better sites for linking, follow these tips:

a) Is it indexed?

While their site may be indexed, the page where they are placing your link, is it at least indexed by google? If you type in allinurl:www.sitename.com/links/right_here.html and there are no results, consider declining their offer. If the page your link appears on has not been indexed, there is no benefit whatsoever to you. If your pages have PR, they may consider placing your link on another page. If the page your link appears on is indexed, but does not have PR, consider accepting their offer. While the page today may not have PR, it will in time.

b) How many neighbors?
The value of the page rank is shared with each of the links on that page. If you are splitting that PR with several other sites, your share of PR will be small, which doesn’t help you. Reconsider accepting any link exchanges if your site is 1 of more than 30 – 40 sites that will appear on that page, unless its a very high PR. Further, if there are too many links on that page, Google may consider the page to be part of a link farm, which may end up penalizing your site.

c) Is it relevant?
Google is big on relevancy. Ensure your links pages are relevant. If you operate a site about golf, having links from cooking sites will not help you establish your page rank. It may cost you more than you get in return.

How to Find Good PR sites:

a) Do a search for them by typing in your keyword and start asking for reciprocal link exchanges. Take a look at their PR and go from there. Remember, its the number of sites that backlink to you that matters, not strictly the PR of the page. I would rather have 50 pages that have a PR1 pointing to my site, than to have 5 sites that have a PR5. Of course, if you can get 50 pages that have a PR5 pointing to your site, you are laughing!

b) Take a look at your existing link partners and check out their links pages. Its clear the people appearing on those links pages are interested in reciprocating.

c) Purchase software that will help find quality link partners.

It is important to attract higher PR sites when you are on a reciprocal link campaign. However, its not the most important thing when it comes to search engine rankings. Its the backlinks that point back at you that are key. The more of those, the better off you will be for your keyword.

Remember: every page starts off as a PR0. Just because its new doesn’t mean it wont get a higher PR once google gets around to assessing a score. If the page your site appears on is indexed, and its a relevant site of quality, consider exchanging links. You’ll grow a large list of link partners in a short period of time, and increase your search engine rankings in the process.

Sitemaps – Uncover the Only Quick and Easy Site Submission Strategy

What is the easiest and quickest way to get your site index and listed by Google and Yahoo?

This is an eternal question, one that many frustrated SEO specialists and online business owners are still trying to figure out. The trick to answering this question is to find out exactly what Google and Yahoo want you to do to get listed; not what you THINK they want you to do. That being said, the tried and true methods to get your pages indexed and listed by Google and Yahoo have been the following:

1. Blogging and Pinging

2. Submit your website through the standard submission form provided by Yahoo and Google

3. Pay a search engine submission service to do it for you

4. Figure out how to create a Sitemap for each of the engines to get the spiders to come to your site (you still have to manually submit)

These are all decent methods. However, the problems with 3 of these methods are many, and here are just a few:

1. Blog and Ping and submission forms are all slow ways to get listed. Who has that kind of time?

2. Search Engine Submission services can be expensive

3. Using an automated search engine submission service can get you banned, if the search engines think that you are “spamming”

Of all the methods listed above, both Google and Yahoo prefer that you create a sitemap and then submit your site to them. You create a database file that contains information about ALL of your web pages. You then load that file onto your website and then let Google know where that file is. Google is extremely specific about how the database should be submitted, which is in XML format.

By submitting your website in this way, you are cutting down on their overhead by a huge amount. You see, when you submit the old-fashioned way, using the standard submission form we talked about earlier, Google and Yahoo have to convert your information to the database XML format themselves. As you can imagine, this takes time, especially with the zillions of web pages that are submitted daily. So, those sites that are submitted in the manner and format that Google and Yahoo are already using will get their pages index faster…do you see the logic here? Using this process, you are enabling them to visit many MORE of your web pages quickly and easy. This is exactly what Google and Yahoo want you to do.

By submitting your sitemap in the preferred format, you will accomplish the following:

1. Save enormous amounts of your valuable time and money.

2. You don’t risk getting banned

3. You will get Google and Yahoo to spider the pages of your site faster.

So, if you want to get your site index, and therefore listed, faster, you must create and submit the sitemap. In theory, the search engines can list your pages whenever they want. In reality, until you get them to spider your web pages, you will never get them listed!