TrustRank: I guess Google is human after all

I wanted to follow-up on a post I made the other day about the concept of trust in the Google algorithm and how that relates to SEO. I was reacting to the Friday whiteboard podcast and earlier blog post at SEOMoz about how the Google algorithm has changed over the years. Specifically, Rand was talking about the rise in relative importance of Trust / Authority over other factors that had previously been very important for SEO. I was joking that Google's process for determining the rank of websites to show in their search results is simply a computer program, and unable to feel something so uniquely human as Trust.

Well here's the update. Most people in the SEO world are familiar with the term PageRank. It is a propriety algorithm used by Google in the process of ranking the relevancy of the websites in their database to the search query string. We all know about PageRank. And there are many websites devoted to SEO tactics to help raise your PageRank with the hopes of raising your search results. Since your PageRank is visible for every page on your website (with quarterly updates), it is easy to have public debate about it.

But lesser known is a second algorithm called TrustRank. And this was the concept introduced by Rand in his SEOMoz whiteboard. When his graph shows that Trust / Authority is rising in importance to SEO, this means that a site's TrustRank score is being given more weight in the equation that determines search engine rankings. And this is not to be confused with the trademarked term "TrustRank" that Google introduced about the same time. If you watch the Matt Cutts video above, you will see that these are two different things.

So what is TrustRank?

There is a journal publication by two Standford University professors and a person from Yahoo that introduces a set of equations called TrustRank. To put it in the most simple terms possible: TrustRank is the probability that a website is "good" calculated based on it's proximity to other "good websites." So it isn't really trust. But it is a numeric score that can be calculated for every website on the Internet.

Here is how TrustRank works. And I'm massively simplifying here because the matrix algebra that is behind the statistic is heavy. TrustRank starts with a sample of the Internet that is put in front of humans called Oracles. An Oracle Determination is made for each website in the sample of 0 or 1, where 0 is a "bad" website 1 is a "good" website. In the paper they call them spam sites, but you all know my feeling on using that term for websites. Based on this sample, all websites on the Internet are scored and their TrustRank value is based on how many links away from a good site they are. The closer you are to a good site (i.e., fewer hops), the higher your TrustRank score. There are several assumptions built into this equation:

* Good websites rarely link to bad ones
* Bad websites frequently link to good ones

There is an entire science behind how the samples, called seed websites, are selected in the calculation of TrustRank. Specifically, they target sites that are very interconnected in an effort to have a sample that touches the majority of the Internet.


So what are the real SEO take-aways from this rather academic discussion of TrustRank? The biggest is that you would like to have a direct link from a website in the sample seed that is judged good by the Oracle. You also need to stay tuned to the trends and be sure that your site is within the limits of what is considered good. Perhaps a stretch conclusion, but a safe one.

Alternatively, you could take the position that you simply need as many links as possible with the hopes that one or many of them are close to positively evaluated seed sites. There is really no penalty for being linked to by bad sites. How can there be? It is an assumption that bad sites often link to good ones. Secondly, if there was it would be easy to take down your competitors. So a bad site linking to a bad site is therefore indistinguishable by the algorithm from a bad site liking to a good site. But I'm thinking there is a broader conclusion that I will outline in a second post on this topic because it deserves deeper exploration.

Final point. While we know the full details from this paper about what TrustRank is, we have no evidence that it is in use by Yahoo or Google. As the video above explains, there is confusion about the email version of TrustRank that Google does use, and the search engine TrustRank introduced by Yahoo. I guess my big question is: What is Rand talking about? Is he implying that the TrustRank algo introduced by Yahoo is in use at Google?

