Clean Room Implementation of Google Page Rank Algorithm Clean room mise en œuvre de Google Page Rank algorithme
Finally a clean-room implementation of Google’s Page Rank Algorithm in Java, reverse-engineered from their numerous commentary on Page Rank (or is it Pigeon Rank?). Enfin une propre salle de mise en œuvre de Google Page Rank Algorithm en Java, rétro-ingénierie de leurs nombreux commentaires sur les Page Rank (ou est-il Pigeon rang?).
public static int getPageRank(url) { // start off with a random low PR int pageRank = rand.getInt(0, 3); if ( isHostedOn('google.com', url) ) { pageRank++; } else if ( isHostedOn('microsoft.com', url) ) { pageRank--; } // Support valid pages if (isValidPage(url) ) { pageRank += 1; } tag_value['b'] = 1; tag_value['h2'] = 2; tag_value['h1'] = 3; tag_value['strong'] = -1; // W3C sux!public static int getPageRank (url) (/ / commencer avec un faible PR aléatoire int rand.getInt Pagerank = (0, 3); if (isHostedOn ( "google.com", url)) (PageRank + +;) else if (isHostedOn ( «microsoft.com», url)) (PageRank -;) / / Support valable si pages (isValidPage (url)) (PageRank + = 1;) tag_value [ 'b'] = 1; tag_value [ 'h2'] = 2; tag_value [ 'h1'] = 3; tag_value [ 'strong'] = -1 / / W3C sux!pageRank = calculateTagsPR(tag_value, pagerank); // Sergey said good news sites have // lots of nested tables tablesOnPage = getTagCount(’table’); if (tablesOnPage >= 50) { pageRank += 2; } if (pageRank >= 5) { pageRank = 4; // helps selling AdWords } if (linksFrom(’mattcutts.com’, url) >= 4) { // I link to “clean” sites only // ?Pagerank = calculateTagsPR (tag_value, pagerank) / / Sergey dit bonne nouvelle sites / / beaucoup de tableaux imbriqués tablesOnPage = getTagCount ( «table»), si (tablesOnPage> = 50) (PageRank + = 2;) if (pagerank> = 5) (Pagerank = 4 / / aide à la vente AdWords) if (linksFrom ( 'mattcutts.com ", url)> = 4) (/ / I lien" propres "sites seulement / /?Matt, Feb 2006 pagerank += 2; } pagerank += countBacklinks(url) / 10000; blacklist1 = getList(’c:\chinese-government-censored.txt’); blacklist2 = getList(’c:\larry-page-hatelist.txt’); if ( inArray(blacklist1, url) || inArray(blacklist2, url) ) { pageRank = 0; } d = dashesInUrl(url); pageRank = (d >= 3) ?Matt, Feb 2006 pagerank + = 2;) = + pagerank countBacklinks (url) / 10000; blacklist1 = getList ( 'c: \ chinese-gouvernement-censored.txt'); blacklist2 = getList ( 'c: \ larry-page - hatelist.txt '); if (inArray (blacklist1, url) | | inArray (blacklist2, url)) (Pagerank = 0;) d = dashesInUrl (url); Pagerank = (d> = 3)?pageRank -1 : pageRank + 1; if (inString(url, “how to build a bomb”)) { // added on request.-1 PageRank: PageRank + 1; if (inString (url, "comment construire une bombe")) (/ / ajouté sur demande.2004-12-01.2004-12-01.recipient = “peter@homelandsecurity.gov”; subject = “You might wanna check this…”; sendMailTo(recipient, subject, url); // page might still be relevant pageRank++; } if (month() == “June” || month() == “October”) { // makes people talk about // PR updates, good publicity pagerank -= randomNumber(1,3); } if (checkIdenticalPageAndLinkColor) { // spammer!!destinataire = "peter@homelandsecurity.gov" subject = "Vous pouvez le vérifier…"; sendMailTo (destinataire, sujet, url) / / page peut encore être pertinente PageRank + +;) if (mois () == "Juin" | | Mois () == "Octobre") (/ / rend les gens parler de / / PR mises à jour, une bonne publicité pagerank -= randomNumber (1,3);) if (checkIdenticalPageAndLinkColor) (/ / spammeur!Googleaxe it!!Googleaxe il!pagerank = 0; } if (url == “http://www.nytimes.com”) { // just testing, pls remove tomorrow // ?pagerank = 0;) if (url == "http://www.nytimes.com") (/ / qu'un test, pls supprimer demain / /?Frank, June 2003 pagerank = 10; } //Don’t show PR above 10 if(pagerank > 10) pagerank = 10; return pagerank; }Frank, Juin 2003 pagerank = 10;) / / Ne pas afficher les PR au-dessus de 10 if (pagerank> 10) pagerank = 10; retour pagerank;)
Modified (to Java and added normalization etc.) from idea and original code by Mis à jour (à Java et a ajouté normalisation etc) et de l'idée de code original Jack Tang Jack Tang .
Filed under Classé sous Google , Headline News Headline News , How To Comment , Humor Humour , Tech Note Note technique , Web , Web Services Services Web | |
| |
RSS 2.0 RSS 2,0 | |
Trackback this Article | cet article |
Email this Article Envoyer cet article
You may also like to read Vous mai également à lire |




