What Matt Mullenweg (WordPress Author) Knows About You (WordPress & Akismet Plugin User)什么马特mullenweg (在WordPress作者)知道你(在WordPress & akismet插件用户)
I took a look at the data we are sending to Akismet, a WordPress plugin for comment spam protection, for each comment submitted on your blog, if you use this plugin for comment spam prevention.我一看数据我们发送给akismet , wordpress插件为垃圾评论的保护,为每个评论提交的关于您的博客,如果您使用此插件为垃圾评论的预防。 I have recently最近我曾 started using Akismet开始使用akismet , a WordPress plugin from WordPress author , wordpress插件从作者的WordPress Matt Mullenweg马特mullenweg . 。 I have to say I was surprised at the copious amount of data, some sensitive, being sent to Matt’s server for handling every single comment.我必须说我很惊讶于,再用大量的数据,一些敏感的,被派往马特的服务器处理每一个单一的评论。
Tons of useless (for spam protection) information is being sent for every comment, most of which rarely, if ever, changes on a server.吨的无用(垃圾邮件防护)的资料,正在发送的每个评论,其中大部分很少,如果以往的变化,在服务器上。
Here are the data that was sent to Akismet server for a single test comment on my blog.这里的数据被送往akismet服务器为一个单一的测试发表评论我的博客。 I have commented on them inline.我对他们的评论,内插。
comment_post_ID=1128 // Why does he need this? comment_post_id = 1128 / /为什么他是否需要这个?
comment_author=Angsuman+Chakraborty comment_author =由Angsuman +查敏
comment_author_email=angsuman%40taragana.com comment_author_email =由Angsuman % 40taragana.com
comment_author_url=http%3A%2F%2Fblog.taragana.com%2F comment_author_url =的HTTP %第3 A % 2楼% 2fblog.taragana.com % 2楼
comment_content=[Actual comment] comment_content = [实际评论]
comment_type= comment_type =
user_ID=1 // Why does he need this? user_id = 1 / /为什么他是否需要这个?
user_ip=59.93.245.60 user_ip = 59.93.245.60
user_agent=[Truncated] user_agent = [截断]
referrer=[Truncated - Post url] 引荐= [截断-邮政网址]
blog=http%3A%2F%2Fblog.taragana.com 博客=的HTTP %第3 A % 2楼% 2fblog.taragana.com
CONTENT_LENGTH=98 content_length = 98
// Isn’t it obvious? / /是不是很明显吗? Why send it?为什么它传送? Does it ever change?难道以往任何时候都改变?
CONTENT_TYPE=application%2Fx-www-form-urlencoded内容类型=应用% 2fx - WWW的形式- urlencoded
// What is he doing with it? / /什么是他做的与它呢? This information is useless for spam protection.此信息是无用的垃圾邮件防护。
DOCUMENT_ROOT=[File system path] DOCUMENT_ROOT在= [文件系统路径]
// Why does he need this? / /为什么他是否需要这个? Yet another useless junk.又一无用的垃圾。
HTTP_ACCEPT=[Truncated] http_accept = [截断]
// Why does he need this? / /为什么他是否需要这个?
HTTP_ACCEPT_CHARSET=[Truncated] http_accept_charset = [截断]
HTTP_ACCEPT_LANGUAGE=en-us%2Cen%3Bq%3D0.5 http_accept_language = -我们% 2cen % 3bq % 3d0.5
// Why does he need this? / /为什么他是否需要这个?
HTTP_CONNECTION=keep-alive http_connection =保持活着
HTTP_HOST=blog.taragana.com http_host = blog.taragana.com
// Why does he need this? / /为什么他是否需要这个?
HTTP_KEEP_ALIVE=300 http_keep_alive = 300
HTTP_REFERER=[Truncated] http_referer = [截断]
HTTP_USER_AGENT=[Truncated] http_user_agent = [截断]
// Why does he have to have my PATH information? / /为什么他是否有有我的路径信息呢?
PATH=[PATH environment variable]路径= [ PATH环境变量]
REMOTE_ADDR=59.93.245.60 remote_addr = 59.93.245.60
REMOTE_PORT=1567 remote_port = 1567
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? Why does he need it?为什么他是否需要它?
// It contains file system information / /它包含文件系统信息
SCRIPT_FILENAME=[Truncated] script_filename = [截断]
// How many times does it change on a server? / /多少次,是否改变在服务器上呢?
SERVER_ADDR=69.36.187.98 server_addr = 69.36.187.98
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? Why does he need it?为什么他是否需要它?
SERVER_ADMIN=Postmaster%40taragana.com server_admin =邮政% 40taragana.com
SERVER_NAME=blog.taragana.com服务器= blog.taragana.com
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
SERVER_PORT=80 server_port = 80
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
SERVER_SIGNATURE=[Truncated] server_signature = [截断]
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
SERVER_SOFTWARE=[Truncated] server_software = [截断]
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
GATEWAY_INTERFACE=CGI%2F1.1 gateway_interface =的CGI % 2f1.1
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
SERVER_PROTOCOL=HTTP%2F1.1 server_protocol =的HTTP % 2f1.1
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
// This is always POST! / /这是始终邮政!
REQUEST_METHOD=POST request_method =后
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
QUERY_STRING= query_string =
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
REQUEST_URI=%2Fwp-comments-post.php request_uri = % 2fwp -评论- post.php
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
SCRIPT_NAME=%2Fwp-comments-post.php script_name = % 2fwp -评论- post.php
// Why does he need to know where I installed WordPress on my server? / /为何他要知道我安装的WordPress在我的伺服器上?
PATH_TRANSLATED=[Truncated] path_translated = [截断]
// How many times does it change on a server? / /多少次,是否改变在服务器上呢? What does he need it for?是什么,他是否需要它呢?
PHP_SELF=%2Fwp-comments-post.php php_self = % 2fwp -评论- post.php
// This is inane / /这是inane
argv=Array argv =阵列
// This is inane / /这是inane
argc=0 argc = 0
This huge amount of data (considering it is send for every comment) can consume a not-so-insignificant portion of your bandwidth quota, if you get lots of spam.这个庞大的数据量(考虑到这是发送的每个评论)可以消耗没有那么微不足道的一部分,请在带宽配额,如果您收到大量的垃圾邮件。
It is clear Matt & Co. haven’t taken the effort to filter out the unnecessary information, even though they can easily do so.很显然,马特公司没有采取的努力,过滤掉不必要的信息,即使他们可以轻松地这样做。
Some of these information may also be used by hackers (bad ones).一些这些资料也可能被黑客(坏人) 。 Remember all information is submitted over the internet in cleartext.记住所有的资料是提交了在互联网上明文。
Kind of makes you feel warm and fuzzy, doesn’t it?种让你感到温暖和模糊,不是吗?
Filed under提起下 CMS Software CMS软件 , , Headline News头条新闻 , , Pro Blogging赞成Blogging , , Web网页 , , Web Services Web服务 , , WordPress在WordPress | |
| |
RSS 2.0 2.0 | |
Trackback Trackback跟踪 this Article |此文章|
Email this Article电子邮件此文章
You may also like to read您也可以想读 |




April 8th, 2006 at 11:03 pm 2006年4月8日在下午11时03分
Akismet’s privacy policy is available to the public here (legal translation coming soon): akismet的隐私权政策,是向公众提供在这里(法律翻译即将推出的) :
http://akismet.com/privacy/
Matt would [probably] be glad if you were to contact him with your privacy/security concerns.马特会[可能]很高兴,如果你要与他联络,与您的隐私/安全关切。 If you send your inquiry through如果您发送您的查询通过 the Akismet contact form该akismet联系表 , he’ll usually respond within the week. ,他通常会作出回应,一周内。
April 9th, 2006 at 6:00 pm 2006年4月9日下午6时
We do strip out potentially sensitive data, like your login cookie.我们带出潜在的敏感数据,如您的登录信息的Cookie 。 The rest is entirely harmless, and actually quite useful in identifying spam.其余的是完全无害,其实是相当有益的,在确定是垃圾邮件。 You can exclude it, but the effectiveness of Akismet will go down.您可以排除,但成效akismet将下降。
April 10th, 2006 at 9:36 am 2006年4月10日在上午09时36分
Matt,马特,
Thanks for the clarifications.感谢澄清。 However I couldn’t understand why you need data which never changes for any user like:不过,我不明白为什么你需要的数据,从来没有改变,任何用户,例如:
CONTENT_TYPE=application%2Fx-www-form-urlencoded内容类型=应用% 2fx - WWW的形式- urlencoded
REQUEST_METHOD=POST request_method =后
SERVER_PORT=80 // May very rarely change server_port = 80 / /很可能是很少改变
SERVER_PROTOCOL=HTTP%2F1.1 server_protocol =的HTTP % 2f1.1
GATEWAY_INTERFACE=CGI%2F1.1 gateway_interface =的CGI % 2f1.1
etc.等等。
Also there are several pieces of data which I cannot see (irrespective of the algorithm you are using, which I personally think is a variant of naive bayesian with manual blacklisting也有几件资料,我看不到(不论该算法的您所使用的,我个人认为是一个变种的朴素贝叶斯手动列入黑名单
) how they can help in analysing spam like my servers SCRIPT_FILENAME or PATH_TRANSLATED. )如何,他们可以帮助在分析垃圾邮件一样,我的服务器script_filename或path_translated 。
I could see you have a provision in code to filter out certain data from list.我可以看到你有一个规定,在代码中要筛选掉某些数据,从名单。 Why not use it to get only the data that you need.为什么不使用它来只能得到的数据,您所需要的。
Looking forward to your response.期待着您的回应。
Best,最好的,
Angsuman由Angsuman
April 10th, 2006 at 9:37 am 2006年4月10日在上午09时37分
James,詹姆斯,
I guess I reached him faster this way我猜我达成他的速度更快,这样
Thanks for your suggestions.感谢您的建议。
Best,最好的,
Angsuman由Angsuman
April 11th, 2006 at 12:14 pm 2006年4月11日在下午12时14分
[...] In addition, over at Simple Thoughts, Angsuman Chakraborty wrote an interesting post entitled, “What Matt Mullenweg (WordPress Author) Knows About You (WordPress & Akismet Plugin User).” There, he figured out what kind of info Akismet sends back to interpret comments as spam / not spam. [ … … ]此外,超过在简单的想法,日由Angsuman Chakraborty写了一个有趣的职位,题目是“什么马特mullenweg (在WordPress作者)知道你(在WordPress & akismet插件用户) 。 ”在那里,他揣摩什么样的信息akismet发回的评论解释为垃圾邮件/这不是垃圾邮件。 All this was very interesting, but it got my no further to my goal of getting out of Akismet jail.所有这是非常有趣的,但它得到我没有进一步的向我的目标失控的akismet坐牢。 My identity had been taken by a black box for unknown reasons, and there was no way to get it back.我的身份已采取黑箱作业,原因不明,有没有办法取回。 Granted, on the net it is very easy to change your identity, but I had been writing as myself for quite awhile.理所当然的,对净,这是很容易改变自己的身份,但我已以书面形式作为自己相当一段时间。 Why would I want to give up what little, if any, reputation I have?为什么我要放弃什么太少,如有的话,我的声誉呢? Especially to the black box?特别是黑匣子? [...] [ … … ]
January 16th, 2007 at 8:47 am 2007年1月16日在上午8时47分
I my - maybe simple - views these informations are required for analyzing spam: i我-也许简单-的意见,这些信息都需要分析垃圾邮件:
comment_content # Yeah, sure… comment_content #是啊,肯定…
comment_author* # All three together comment_author * #所有三个一起
blog_url (a splogger can easily remove that URL, so you still have his server’s IP number. But what about a sblog like spammer-blog.wordpress.com? Got it? IP is useless, two! blog_url (一splogger可以轻松地移去该网址,所以,你还有他的服务器的IP数目,但约1 sblog一样,垃圾邮件发送者- blog.wordpress.com ?得到它呢? IP是没有用的,二!
And even the client’s IP/user-agent-string are useless because of open proxies.甚至客户端的IP /用户代理字符串是无用的,因为开放的代理人。 Yeah, you can blacklist that IP numbers, but how many open proxies exist in the wide world?是啊,您可以列入黑名单的IP号码,但究竟有多少公开的代理中存在广泛的世界呢? 100,000 ??? 100000 ? ? ?
Well, I’ll remove all information which you really don’t need to know from my blog (like absolute paths and such).那么,我将删除所有资料,你真的不需要知道从我的博客(如绝对路径等) 。 Only I need to know where your scripts are installed and not you.只是我需要知道您的脚本安装,而不是你。
I know you can blacklist my ID number so move on.我知道你可以列入黑名单,我的ID号码,以便继续前进。 I have more anti-spam plug-ins left to replace with Akismet.我有更多的反垃圾邮件插件离开,以取代与akismet 。
And Akismet isn’t the ultimate death for spam comments, as well.和akismet是不是最终的死亡为垃圾邮件的评论,以及。
I’m not against Matt and all the other people behind Akismet but I really need to know why, why, why you need to know so much useless informations from my blog?我不反对马特和所有其他人背后的akismet ,但我真的需要知道,为什么,为什么,为什么您需要知道这么多无用的信息,从我的博客? Why the comment ID why the absolute path of my script installation?为什么评论身份证,为什么绝对路径我的脚本安装?
So long and all the best,只要和所有最好的,
Roland罗兰
January 16th, 2007 at 8:50 am 2007年1月16日在上午8时50分
An addition to my previous post. 1 ,除了我以前的职位。 I’m saying this to Matt not to Angsuman.我说,这是马特不是由Angsuman 。
August 1st, 2007 at 5:53 pm 2007年8月1日在下午5时53分
Don’t forget that Akismet is integrated into other tools too, such as the cakePHP framework so some of that info will be relevant there.不要忘记, akismet是集成到其他工具也如cakephp框架,使一些该信息将有关。
I’m with you on the server path type of thing but the actual calling script is probably important for identifying the weak points (or high traffic points ) on a site.我与你在该服务器上的路径类型的事情,但实际要求脚本可能是重要的确定薄弱点(或高流量点)的一个网站上。 More for future development than current spam detection.更多的为未来的发展比目前的垃圾邮件检测。
I wouldn’t be blogging today if it wasn’t for Akismet and Bad Behaviour - as it is I have all comments on moderation anyway… it’s that bad!我不会博客,今天如果不是akismet和坏的行为-因为这是我的所有评论温和无论如何… …它的坏!