More details on the process of filtering and selecting phrases
Key Google searches:
N1 = "i hate hostx"
N2 = "hostx sucks"
N3 = "hostx sux"
N4 = "avoid hostx"
N5 = "problems with hostx"
P1 = "i love hostx"
P2 = "hostx rocks"
P3 = "hostx is great"
P4 = "hostx is * great"
P5 = "recommend hostx"
Phrase selection - The phrases selected were chosen according to how many applicable results were thrown up by Google. This increases the reliability of the results. Example: "avoid host x" appears more often than "host x is a load of rubbish". See boxout for phrase details.
Intial filtering of hosts - Hundreds of hosts were initially checked to see if they appeared in Alexa's top 150,000 web sites. Around half of these made the 'short list' of the 63 as presented above. The other half were excluded because they were not popular enough according to Google.
Google's nested entries: - Nested entries aren't counted so as to avoid results from the same site (mostly to avoid duplicates).
Duplication: "fortunecity is * great" is a good example of duplicates from Google. Out of the 180 listed, only 10 aren't dupes! I specially developed software to remove all of these duplicates.
Name ambiguity: Special care had to be taken for hosts with names such as 'Tripod', as the same word can be used to represent something to rest a camcorder on, the name of a rock group, and is also the name of a comedy group! In such cases, results were manually checked (tedious to say the least).
Cash incentives: - It was much easier filtering out negative comments from the results than positive. The reason for this is because people may recommend a host for commision/affiliate purposes rather than because they genuinely think the host provider in question is particularly good. Host reviewing sites will tend to do this a lot, and it's easy to see how bias can creep in - thus losing the objectivity of the results.
For the top hosts, we sifted through every result to see people were promoting a host for monetary purposes (affiliate link). Dreamhost and Lunarpages had quite a lot of them, and one could argue that we should have 'counted' such results. Regardless, the positions on the table wouldn't have changed (Dreamhost and Lunarpages would have come out on top even higher were we not to filter them out).
Google's unpredictibility: - Problems arose when the EXACT same search resulted in far fewer results sometimes than other times. For example, a search for "tripod is great" can reveal either "1 - 87 of about 188" ...or... "1 - 100 of about 577", depending on what mood Google is in (or more technically, what data center Google fetches the search results from). Because of this, I always double and triple checked the results at various times, and used the larger number of results as the final data collection for my analysis. You'll see I've marked the 'total results' next to each rating in the table so you can verify them for yourself.
Sarcasm: - Supposedly positive comments can actually be sarcastic towards that host provider. I tend to find this the case particularly with the free hosts ;-) Example: "gotta love Geocities".
Service ambiguity: - Certain sites such as godaddy.com and pair.com were excluded from the results because they offer other services (such as domain registration or email) on top of web hosting.
'Recommend' search term - Rather than search for "recommend hostx" as is implied above, I used: "recommend hostx" -"not recommend hostx" -"wouldn't recommend hostx" -"don't recommend hostx" -"won't recommend hostx" -"no longer recommend hostx" -"cannot recommend hostx" -"we * recommend hostx" -"we recommend hostx". The last two phrases are filtered because although they seem positive, the use of the word "we" will tend towards affiliate purposes.