My analysis of the actual problem with JavaBlogs Aggregator我分析的实际问题与javablogs汇集

Disclaimer 免责声明
This analysis is based on observing这一分析是基于观察 my blog我的博客 ’s interaction with奇摩与互动 JavaBlogs javablogs . 。 JavaBlogs as you know is a popular aggregator for Java feeds. javablogs如你所知是一个受欢迎的汇集for Java的饲料。
Overview 概览
Often many of us see that往往我们很多人看到, old posts of our blog keeps popping up in JavaBlogs旧的职位,我们的博客不断出现在javablogs . 。
Details 详情
RSS versions before 2.0 did not have GUID.的RSS 2.0版本之前没有的GUID 。 So preventing duplicate posts is slightly harder then RSS 2.0 compliant feeds.因此,防止重复的职位是稍微困难,然后2.0兼容的饲料。 My feed我的饲料 is是 RSS 2.0 2.0 compliant.兼容。 Specifically it sends a特别是它发出了一个 GUID的GUID as an element of作为一个组成部分 item项目 . 。 GUID is supposed to be globally unique. GUID是要在全球范围内独一无二的。 So if I change my feed url but keep my GUID same it shouldn’t matter.所以如果我改变我的饲料网址,但保留我的GUID ,同时不应该的事。
What does WordPress send as GUID?什么是WordPress所传送的GUID呢? It sends the permalink to the post as GUID like它发出的永久,以邮政作为的GUID一样 http://blog.taragana.com/index.php/archive/whats-up-with-republican-java-geeks/ . 。
Technically they are globally unique.在技术上,他们是全球独一无二的。 Unless I change my site structure.除非我改变我的网站结构。 So if I start using .htaccess and change the permalink format to所以如果我开始使用。 htaccess的和改变的永久格式 http://blog.taragana.com/archive/whats-up-with-republican-java-geeks/ then I can expect reposting to happen, right?那么,我可以预期reposting发生,对不对? Yes, it does happen in JavaBlogs and it has happened to me once or twice.是的,它发生在javablogs和它发生了我一次或两次。 However it can still be prevented.但它仍然可以预防的。 More on it in a later post.更多关于它在后来的职位。
In any case WordPress can also improve this situation by using a alpha-numeric GUID value instead of permalinks, which may not be so permanent after all.在任何情况下的WordPress也可以改善这种情况用字母和数字的GUID值,而不是永久性,这可能不那么永久毕竟。
The more common problem is something much simpler.更常见的问题是,一些简单得多。 Suppose you normally syndicate 20 latest items from your feed.假设你通常集团最新的20项目从您的饲料。 Then you suddenly decide to syndicate more say 30.然后你突然决定的集团,更多的发言权30 。 Now suddenly lot of the old feeds are republished again!现在突然有很多旧供稿再版了! The GUID hasn’t changed nor the date, only the item count has changed in the feed.该GUID没有改变,也不是迄今为止,只有项目计数改变了在饲料中。 Probably the reverse (reducing the number of items in a feed) is also true, cannot remember for sure.可能是逆向(减少项目的数量,在一个饲料)也是如此,不能记得是肯定的。
It appears看来 JavaBlogs is maintaining a database of past feed items javablogs是维护一个数据库,过去的饲料项目 . 。 So it shouldn’t be hard to identify that the post is not new.因此,不应该难以确定,该职位是不是新的。
It looks like some simple bug.它看起来像一些简单的错误。 Hopefully it will be fixed soon.我希望这将尽快解决。
This article was initiated by a comment from这篇文章所倡导的评论,从 Mr. Charles Miller, developer at JavaBlogs查尔斯米勒先生,开发人员在javablogs . 。
PS.保安局常任秘书长。 On a different note I think the policy to display a feed when its date has been updated is correct implementation by JavaBlogs.对不同的注意,我认为政策,以显示饲料时,其日期已更新是正确的执行javablogs 。
Filed under提起下 Java Software Java软件 , , Pro Blogging赞成Blogging , , Technology技术 , , Web网页 , , WordPress在WordPress | |
| |
RSS 2.0 2.0 | |
Email this Article电子邮件此文章
You may also like to read您也可以想读 |





March 18th, 2005 at 2:19 am 2005年3月18日在上午02时19分
Tracking duplicates is a nightmare with all the various RSS flavors and buggy RSS feeds out there.跟踪重复的是一场噩梦与所有各种口味的RSS和小车RSS源存在。 My code for javacrawl.com currently does the following query to check for a duplicate post: “…where (guid = ? OR (link = ? and title = ?))”.我的代码为javacrawl.com目前并以下查询,以检查是否有重复的帖子: “ … …在哪里(的GUID = ?或(链接= ?和标题= ?))". This works reasonably well, but is still succeptable to the changing link problem you mention here.这个工程相当良好,但仍是succeptable适应不断变化的连结问题,你所提到的在这里。
I agree that using links for GUID is probably not the best unless they’re stable.我同意使用环节的GUID可能是不是最好的,除非他们很稳定。 An MD5 hash of the title plus the timestamp would be a reasonable way to go. 1 MD5哈希的标题加上时间戳将是一个合理的路要走。
Another suggestion I would have to RSS producers is to please, please implement responding 304 to the If-Modified-Since header.另一项建议,我会到RSS生产者是要请,请贯彻回应304到如果- Modified - Since的标题。 This saves a huge amount of CPU, disk and bandwith resources on both ends.这样可以节省大量的CPU ,磁盘和带宽资源的两端都。
March 18th, 2005 at 2:56 am 2005年3月18日在上午02时56分
Jason,贾森,
Thanks for the informative comments.感谢翔实的评论。
The MD5 of title and timestamp sounds good, I cannot think of anything against it.的MD5的标题和时间戳的声音好,我不能相信任何人反对。
304 would be good solution to reduce the bandwidth clog and will ultimately benefit the bloggers. 304将是很好的解决方案,以减少带宽阻塞,并会最终获益的博客。
April 3rd, 2005 at 8:09 pm 2005年4月3日在下午8时09分
It’s not just a problem with JavaBlogs!它不仅是一个问题javablogs !
Everytime I ping Technorati that my blog has been updated, it takes every entry previously and spams the Technorati tags (ie Java tag) as well!每次来Technorati平说,我的博客已更新,需要每一个进入以前和垃圾邮件的Technorati标记(即Java的标记)以及! I do use RSS 2.0 and Rome 0.5 from Sun Microsystems to generate my own feeds, and I do use the and tags.我使用的RSS 2.0和罗马0.5从Sun Microsystems公司创造自己的饲料,而我却使用了和标记。 I have used the permalink system, but since I can put anything in there since I control the code, maybe I’ll start generating my own MD5 hash as suggested.我用的永久性系统,但由于我可以向任何在那里,因为我控制的代码,也许我会开始产生自己的MD5哈希的建议。 If anyone wants to know if that works, check out my website in about a week.如果有人想知道,如果工程,请查阅我的网站在一个星期左右。
Otherwise, enjoy reading my entries from March 2005 for the ninteith time.否则,享受阅读我的作品,从2005年3月为ninteith时间。