Finally a clean-room implementation of Google’s Page Rank Algorithm in Java, reverse-engineered from their numerous commentary on Page Rank (or is it Pigeon Rank?).最後一無塵室執行Google的網頁排名算法在Java中,逆向工程,從他們的眾多評論對網頁排名(或者是它的白鴿排名? ) 。

 public static int getPageRank(url) {     // start off with a random low PR     int pageRank = rand.getInt(0, 3);      if ( isHostedOn('google.com', url) ) {         pageRank++;     } else if ( isHostedOn('microsoft.com', url) ) {         pageRank--;     }     // Support valid pages     if (isValidPage(url) ) {         pageRank += 1;     }      tag_value['b'] = 1;     tag_value['h2'] = 2;     tag_value['h1'] = 3;     tag_value['strong'] = -1; // W3C sux! 公共靜態詮釋getpagerank器( URL ) ( / /起步與隨機低公關詮釋的PageRank = rand.getint ( 0 , 3 ) ;如果( ishostedon ( ' google.com ' ,網址) ) (的PageRank + + ; )否則,如果( ishostedon ( ' microsoft.com ' ,網址) ) (的PageRank -; ) / /支持有效的網頁,如果( i svalidpage器( U RL) ) (的P ageRank+ = 1 ; ) t ag_value[的B ' ] = 1 ; t ag_value[ '的H 2' ] = 2 ; tag_value [ ' H1的' ] = 3 ; tag_value [ '強' ] = -1 ; / / W3C的sux ! pageRank = calculateTagsPR(tag_value, pagerank);      // Sergey said good news sites have     // lots of nested tables     tablesOnPage = getTagCount(’table’);     if (tablesOnPage >= 50) {         pageRank += 2;     }      if (pageRank >= 5) {         pageRank = 4; // helps selling AdWords     }      if (linksFrom(’mattcutts.com’, url) >= 4) {         // I link to “clean” sites only         // ? PageRank的= calculatetagspr ( tag_value ,的PageRank ) ; / /謝爾蓋說,良好的新聞網站有/ /大量的巢狀表格tablesonpage = gettagcount ( '表' ) ;如果( tablesonpage > = 50 ) (的PageRank + = 2 ; )如果( PageRank的> = 5 ) (的PageRank = 4 ; / /有助於銷售AdWords )如果( linksfrom ( ' mattcutts.com ' ,網址) > = 4 ) ( / /我鏈接到“乾淨”的網站只/ / ? Matt, Feb 2006         pagerank += 2;     }      pagerank += countBacklinks(url) / 10000;      blacklist1 = getList(’c:\chinese-government-censored.txt’);     blacklist2 = getList(’c:\larry-page-hatelist.txt’);     if ( inArray(blacklist1, url) || inArray(blacklist2, url) ) {         pageRank = 0;     }      d = dashesInUrl(url);     pageRank = (d >= 3) ? 馬特, 2006年2月的PageRank + = 2 ; )的PageRank + = countbacklinks器( URL ) /萬; blacklist1 = getlist ( '為C : \中國政府- censored.txt ' ) ; blacklist2 = getlist ( '為C : \拉里-網頁hatelist.txt ' ) ;如果( inarray ( blacklist1 ,網址) | | inarray ( blacklist2 ,網址) ) (的PageRank = 0 ; ) = dashesinurl器( URL ) ;的PageRank = (四> = 3 ) ? pageRank -1 : pageRank + 1;      if (inString(url, “how to build a bomb”)) {         // added on request. PageRank的-1 : PageRank的+ 1 ;如果( instring (網址, “如何建立一個定時炸彈” ) ) ( / /補充的要求。 2004-12-01. 2004年12月1日。 recipient = “peter@homelandsecurity.gov”;         subject = “You might wanna check this…”;         sendMailTo(recipient, subject, url);          // page might still be relevant         pageRank++;     }      if (month() == “June” || month() == “October”) {         // makes people talk about         // PR updates, good publicity         pagerank -= randomNumber(1,3);     }          if (checkIdenticalPageAndLinkColor) {         // spammer!! 收件人= “ peter@homelandsecurity.gov ” ;主題= “您可能要檢查這… … ” ; sendmailto (收件人,主題,網址) ; / /頁仍可能有關的PageRank + + ; )如果(月( ) == “六四” | |個月( ) == “十月” ) ( / /使人們談論/ /公關更新,良好的輿論宣傳的PageRank -= randomnumber ( 1,3 ) ; )如果( checkidenticalpageandlinkcolor ) ( / /垃圾郵件發送者! Googleaxe it!! googleaxe它! pagerank = 0;     }      if (url == “http://www.nytimes.com”) {         // just testing, pls remove tomorrow         // ? PageRank的= 0 ; )如果(網址== “ http://www.nytimes.com ” ) ( / /只是測試,薪酬水平調查的移除明天/ / ? Frank, June 2003         pagerank = 10;     }      //Don’t show PR above 10     if(pagerank > 10)  pagerank = 10;      return pagerank; } 坦白說, 2003年6月的PageRank = 10 ; ) / /不顯示公關上述10如果( PageRank的> 10 )的PageRank = 10 ;返回的PageRank ; ) 

Modified (to Java and added normalization etc.) from idea and original code by改性( Java和補充正常化等) ,從構思和原代碼 Jack Tang傑克湯 .