Vote 投票 0 0
Clean Room Implementation of Google Page Rank Algorithm潔淨室執行的Google網頁排名算法
Angsuman Chakraborty日由Angsuman Chakraborty
August 17th, 2006 2006年8月17日 Finally a clean-room implementation of Google’s Page Rank Algorithm in Java, reverse-engineered from their numerous commentary on Page Rank (or is it Pigeon Rank?).最後一無塵室執行Google的網頁排名算法在Java中,逆向工程,從他們的眾多評論對網頁排名(或者是它的白鴿排名? ) 。
public static int getPageRank(url) { // start off with a random low PR int pageRank = rand.getInt(0, 3); if ( isHostedOn('google.com', url) ) { pageRank++; } else if ( isHostedOn('microsoft.com', url) ) { pageRank--; } // Support valid pages if (isValidPage(url) ) { pageRank += 1; } tag_value['b'] = 1; tag_value['h2'] = 2; tag_value['h1'] = 3; tag_value['strong'] = -1; // W3C sux!公共靜態詮釋getpagerank器( URL ) ( / /起步與隨機低公關詮釋的PageRank = rand.getint ( 0 , 3 ) ;如果( ishostedon ( ' google.com ' ,網址) ) (的PageRank + + ; )否則,如果( ishostedon ( ' microsoft.com ' ,網址) ) (的PageRank -; ) / /支持有效的網頁,如果( i svalidpage器( U RL) ) (的P ageRank+ = 1 ; ) t ag_value[的B ' ] = 1 ; t ag_value[ '的H 2' ] = 2 ; tag_value [ ' H1的' ] = 3 ; tag_value [ '強' ] = -1 ; / / W3C的sux !pageRank = calculateTagsPR(tag_value, pagerank); // Sergey said good news sites have // lots of nested tables tablesOnPage = getTagCount(’table’); if (tablesOnPage >= 50) { pageRank += 2; } if (pageRank >= 5) { pageRank = 4; // helps selling AdWords } if (linksFrom(’mattcutts.com’, url) >= 4) { // I link to “clean” sites only // ?PageRank的= calculatetagspr ( tag_value ,的PageRank ) ; / /謝爾蓋說,良好的新聞網站有/ /大量的巢狀表格tablesonpage = gettagcount ( '表' ) ;如果( tablesonpage > = 50 ) (的PageRank + = 2 ; )如果( PageRank的> = 5 ) (的PageRank = 4 ; / /有助於銷售AdWords )如果( linksfrom ( ' mattcutts.com ' ,網址) > = 4 ) ( / /我鏈接到“乾淨”的網站只/ / ?Matt, Feb 2006 pagerank += 2; } pagerank += countBacklinks(url) / 10000; blacklist1 = getList(’c:\chinese-government-censored.txt’); blacklist2 = getList(’c:\larry-page-hatelist.txt’); if ( inArray(blacklist1, url) || inArray(blacklist2, url) ) { pageRank = 0; } d = dashesInUrl(url); pageRank = (d >= 3) ?馬特, 2006年2月的PageRank + = 2 ; )的PageRank + = countbacklinks器( URL ) /萬; blacklist1 = getlist ( '為C : \中國政府- censored.txt ' ) ; blacklist2 = getlist ( '為C : \拉里-網頁hatelist.txt ' ) ;如果( inarray ( blacklist1 ,網址) | | inarray ( blacklist2 ,網址) ) (的PageRank = 0 ; ) = dashesinurl器( URL ) ;的PageRank = (四> = 3 ) ?pageRank -1 : pageRank + 1; if (inString(url, “how to build a bomb”)) { // added on request.PageRank的-1 : PageRank的+ 1 ;如果( instring (網址, “如何建立一個定時炸彈” ) ) ( / /補充的要求。2004-12-01.2004年12月1日。recipient = “peter@homelandsecurity.gov”; subject = “You might wanna check this…”; sendMailTo(recipient, subject, url); // page might still be relevant pageRank++; } if (month() == “June” || month() == “October”) { // makes people talk about // PR updates, good publicity pagerank -= randomNumber(1,3); } if (checkIdenticalPageAndLinkColor) { // spammer!!收件人= “ peter@homelandsecurity.gov ” ;主題= “您可能要檢查這… … ” ; sendmailto (收件人,主題,網址) ; / /頁仍可能有關的PageRank + + ; )如果(月( ) == “六四” | |個月( ) == “十月” ) ( / /使人們談論/ /公關更新,良好的輿論宣傳的PageRank -= randomnumber ( 1,3 ) ; )如果( checkidenticalpageandlinkcolor ) ( / /垃圾郵件發送者!Googleaxe it!!googleaxe它!pagerank = 0; } if (url == “http://www.nytimes.com”) { // just testing, pls remove tomorrow // ?PageRank的= 0 ; )如果(網址== “ http://www.nytimes.com ” ) ( / /只是測試,薪酬水平調查的移除明天/ / ?Frank, June 2003 pagerank = 10; } //Don’t show PR above 10 if(pagerank > 10) pagerank = 10; return pagerank; }坦白說, 2003年6月的PageRank = 10 ; ) / /不顯示公關上述10如果( PageRank的> 10 )的PageRank = 10 ;返回的PageRank ; )
Modified (to Java and added normalization etc.) from idea and original code by改性( Java和補充正常化等) ,從構思和原代碼 Jack Tang傑克湯 . 。
Filed under提起下 Google谷歌 , , Headline News頭條新聞 , , How To如何 , , Humor幽默 , , Tech Note技術說明 , , Web網頁 , , Web Services Web服務 | |
| |
RSS 2.0 2.0 | |
Trackback Trackback跟踪 this Article |此文章|
Email this Article電子郵件此文章
You may also like to read您也可以想讀 |




