Finally a clean-room implementation of Google’s Page Rank Algorithm in Java, reverse-engineered from their numerous commentary on Page Rank (or is it Pigeon Rank?).最后一无尘室执行Google的网页排名算法在Java中,逆向工程,从他们的众多评论对网页排名(或者是它的白鸽排名? ) 。

 public static int getPageRank(url) {     // start off with a random low PR     int pageRank = rand.getInt(0, 3);      if ( isHostedOn('google.com', url) ) {         pageRank++;     } else if ( isHostedOn('microsoft.com', url) ) {         pageRank--;     }     // Support valid pages     if (isValidPage(url) ) {         pageRank += 1;     }      tag_value['b'] = 1;     tag_value['h2'] = 2;     tag_value['h1'] = 3;     tag_value['strong'] = -1; // W3C sux! 公共静态诠释getpagerank器( URL ) ( / /起步与随机低公关诠释的PageRank = rand.getint ( 0 , 3 ) ;如果( ishostedon ( ' google.com ' ,网址) ) (的PageRank + + ; )否则,如果( ishostedon ( ' microsoft.com ' ,网址) ) (的PageRank -; ) / /支持有效的网页,如果( i svalidpage器( U RL) ) (的P ageRank+ = 1 ; ) t ag_value[的B ' ] = 1 ; t ag_value[ '的H 2' ] = 2 ; tag_value [ ' H1的' ] = 3 ; tag_value [ '强' ] = -1 ; / / W3C的sux ! pageRank = calculateTagsPR(tag_value, pagerank);      // Sergey said good news sites have     // lots of nested tables     tablesOnPage = getTagCount(’table’);     if (tablesOnPage >= 50) {         pageRank += 2;     }      if (pageRank >= 5) {         pageRank = 4; // helps selling AdWords     }      if (linksFrom(’mattcutts.com’, url) >= 4) {         // I link to “clean” sites only         // ? PageRank的= calculatetagspr ( tag_value ,的PageRank ) ; / /谢尔盖说,良好的新闻网站有/ /大量的巢状表格tablesonpage = gettagcount ( '表' ) ;如果( tablesonpage > = 50 ) (的PageRank + = 2 ; )如果( PageRank的> = 5 ) (的PageRank = 4 ; / /有助于销售AdWords )如果( linksfrom ( ' mattcutts.com ' ,网址) > = 4 ) ( / /我链接到“干净”的网站只/ / ? Matt, Feb 2006         pagerank += 2;     }      pagerank += countBacklinks(url) / 10000;      blacklist1 = getList(’c:\chinese-government-censored.txt’);     blacklist2 = getList(’c:\larry-page-hatelist.txt’);     if ( inArray(blacklist1, url) || inArray(blacklist2, url) ) {         pageRank = 0;     }      d = dashesInUrl(url);     pageRank = (d >= 3) ? 马特, 2006年2月的PageRank + = 2 ; )的PageRank + = countbacklinks器( URL ) /万; blacklist1 = getlist ( '为C : \中国政府- censored.txt ' ) ; blacklist2 = getlist ( '为C : \拉里-网页hatelist.txt ' ) ;如果( inarray ( blacklist1 ,网址) | | inarray ( blacklist2 ,网址) ) (的PageRank = 0 ; ) = dashesinurl器( URL ) ;的PageRank = (四> = 3 ) ? pageRank -1 : pageRank + 1;      if (inString(url, “how to build a bomb”)) {         // added on request. PageRank的-1 : PageRank的+ 1 ;如果( instring (网址, “如何建立一个定时炸弹” ) ) ( / /补充的要求。 2004-12-01. 2004年12月1日。 recipient = “peter@homelandsecurity.gov”;         subject = “You might wanna check this…”;         sendMailTo(recipient, subject, url);          // page might still be relevant         pageRank++;     }      if (month() == “June” || month() == “October”) {         // makes people talk about         // PR updates, good publicity         pagerank -= randomNumber(1,3);     }          if (checkIdenticalPageAndLinkColor) {         // spammer!! 收件人= “ peter@homelandsecurity.gov ” ;主题= “您可能要检查这… … ” ; sendmailto (收件人,主题,网址) ; / /页仍可能有关的PageRank + + ; )如果(月( ) == “六四” | |个月( ) == “十月” ) ( / /使人们谈论/ /公关更新,良好的舆论宣传的PageRank -= randomnumber ( 1,3 ) ; )如果( checkidenticalpageandlinkcolor ) ( / /垃圾邮件发送者! Googleaxe it!! googleaxe它! pagerank = 0;     }      if (url == “http://www.nytimes.com”) {         // just testing, pls remove tomorrow         // ? PageRank的= 0 ; )如果(网址== “ http://www.nytimes.com ” ) ( / /只是测试,薪酬水平调查的移除明天/ / ? Frank, June 2003         pagerank = 10;     }      //Don’t show PR above 10     if(pagerank > 10)  pagerank = 10;      return pagerank; } 坦白说, 2003年6月的PageRank = 10 ; ) / /不显示公关上述10如果( PageRank的> 10 )的PageRank = 10 ;返回的PageRank ; ) 

Modified (to Java and added normalization etc.) from idea and original code by改性( Java和补充正常化等) ,从构思和原代码 Jack Tang杰克汤 .