Vote 投票 0 0
Clean Room Implementation of Google Page Rank Algorithm洁净室执行的Google网页排名算法
Angsuman Chakraborty日由Angsuman Chakraborty
August 17th, 2006 2006年8月17日 Finally a clean-room implementation of Google’s Page Rank Algorithm in Java, reverse-engineered from their numerous commentary on Page Rank (or is it Pigeon Rank?).最后一无尘室执行Google的网页排名算法在Java中,逆向工程,从他们的众多评论对网页排名(或者是它的白鸽排名? ) 。
public static int getPageRank(url) { // start off with a random low PR int pageRank = rand.getInt(0, 3); if ( isHostedOn('google.com', url) ) { pageRank++; } else if ( isHostedOn('microsoft.com', url) ) { pageRank--; } // Support valid pages if (isValidPage(url) ) { pageRank += 1; } tag_value['b'] = 1; tag_value['h2'] = 2; tag_value['h1'] = 3; tag_value['strong'] = -1; // W3C sux!公共静态诠释getpagerank器( URL ) ( / /起步与随机低公关诠释的PageRank = rand.getint ( 0 , 3 ) ;如果( ishostedon ( ' google.com ' ,网址) ) (的PageRank + + ; )否则,如果( ishostedon ( ' microsoft.com ' ,网址) ) (的PageRank -; ) / /支持有效的网页,如果( i svalidpage器( U RL) ) (的P ageRank+ = 1 ; ) t ag_value[的B ' ] = 1 ; t ag_value[ '的H 2' ] = 2 ; tag_value [ ' H1的' ] = 3 ; tag_value [ '强' ] = -1 ; / / W3C的sux !pageRank = calculateTagsPR(tag_value, pagerank); // Sergey said good news sites have // lots of nested tables tablesOnPage = getTagCount(’table’); if (tablesOnPage >= 50) { pageRank += 2; } if (pageRank >= 5) { pageRank = 4; // helps selling AdWords } if (linksFrom(’mattcutts.com’, url) >= 4) { // I link to “clean” sites only // ?PageRank的= calculatetagspr ( tag_value ,的PageRank ) ; / /谢尔盖说,良好的新闻网站有/ /大量的巢状表格tablesonpage = gettagcount ( '表' ) ;如果( tablesonpage > = 50 ) (的PageRank + = 2 ; )如果( PageRank的> = 5 ) (的PageRank = 4 ; / /有助于销售AdWords )如果( linksfrom ( ' mattcutts.com ' ,网址) > = 4 ) ( / /我链接到“干净”的网站只/ / ?Matt, Feb 2006 pagerank += 2; } pagerank += countBacklinks(url) / 10000; blacklist1 = getList(’c:\chinese-government-censored.txt’); blacklist2 = getList(’c:\larry-page-hatelist.txt’); if ( inArray(blacklist1, url) || inArray(blacklist2, url) ) { pageRank = 0; } d = dashesInUrl(url); pageRank = (d >= 3) ?马特, 2006年2月的PageRank + = 2 ; )的PageRank + = countbacklinks器( URL ) /万; blacklist1 = getlist ( '为C : \中国政府- censored.txt ' ) ; blacklist2 = getlist ( '为C : \拉里-网页hatelist.txt ' ) ;如果( inarray ( blacklist1 ,网址) | | inarray ( blacklist2 ,网址) ) (的PageRank = 0 ; ) = dashesinurl器( URL ) ;的PageRank = (四> = 3 ) ?pageRank -1 : pageRank + 1; if (inString(url, “how to build a bomb”)) { // added on request.PageRank的-1 : PageRank的+ 1 ;如果( instring (网址, “如何建立一个定时炸弹” ) ) ( / /补充的要求。2004-12-01.2004年12月1日。recipient = “peter@homelandsecurity.gov”; subject = “You might wanna check this…”; sendMailTo(recipient, subject, url); // page might still be relevant pageRank++; } if (month() == “June” || month() == “October”) { // makes people talk about // PR updates, good publicity pagerank -= randomNumber(1,3); } if (checkIdenticalPageAndLinkColor) { // spammer!!收件人= “ peter@homelandsecurity.gov ” ;主题= “您可能要检查这… … ” ; sendmailto (收件人,主题,网址) ; / /页仍可能有关的PageRank + + ; )如果(月( ) == “六四” | |个月( ) == “十月” ) ( / /使人们谈论/ /公关更新,良好的舆论宣传的PageRank -= randomnumber ( 1,3 ) ; )如果( checkidenticalpageandlinkcolor ) ( / /垃圾邮件发送者!Googleaxe it!!googleaxe它!pagerank = 0; } if (url == “http://www.nytimes.com”) { // just testing, pls remove tomorrow // ?PageRank的= 0 ; )如果(网址== “ http://www.nytimes.com ” ) ( / /只是测试,薪酬水平调查的移除明天/ / ?Frank, June 2003 pagerank = 10; } //Don’t show PR above 10 if(pagerank > 10) pagerank = 10; return pagerank; }坦白说, 2003年6月的PageRank = 10 ; ) / /不显示公关上述10如果( PageRank的> 10 )的PageRank = 10 ;返回的PageRank ; )
Modified (to Java and added normalization etc.) from idea and original code by改性( Java和补充正常化等) ,从构思和原代码 Jack Tang杰克汤 . 。
Filed under提起下 Google谷歌 , , Headline News头条新闻 , , How To如何 , , Humor幽默 , , Tech Note技术说明 , , Web网页 , , Web Services Web服务 | |
| |
RSS 2.0 2.0 | |
Trackback Trackback跟踪 this Article |此文章|
Email this Article电子邮件此文章
You may also like to read您也可以想读 |




