I took a look at the data we are sending to Akismet, a WordPress plugin for comment spam protection, for each comment submitted on your blog, if you use this plugin for comment spam prevention. I have recently started using Akismet, a WordPress plugin from WordPress author Matt Mullenweg. I have to say I was surprised at the copious amount of data, some sensitive, being sent to Matt’s server for handling every single comment.

Tons of useless (for spam protection) information is being sent for every comment, most of which rarely, if ever, changes on a server.

Here are the data that was sent to Akismet server for a single test comment on my blog. I have commented on them inline.

comment_post_ID=1128 // Why does he need this?
comment_author=Angsuman+Chakraborty
comment_author_email=angsuman%40taragana.com
comment_author_url=http%3A%2F%2Fblog.taragana.com%2F
comment_content=[Actual comment]
comment_type=
user_ID=1 // Why does he need this?
user_ip=59.93.245.60
user_agent=[Truncated]
referrer=[Truncated - Post url]
blog=http%3A%2F%2Fblog.taragana.com
CONTENT_LENGTH=98

// Isn’t it obvious? Why send it? Does it ever change?
CONTENT_TYPE=application%2Fx-www-form-urlencoded

// What is he doing with it? This information is useless for spam protection.
DOCUMENT_ROOT=[File system path]

// Why does he need this? Yet another useless junk.
HTTP_ACCEPT=[Truncated]

// Why does he need this?
HTTP_ACCEPT_CHARSET=[Truncated]
HTTP_ACCEPT_LANGUAGE=en-us%2Cen%3Bq%3D0.5

// Why does he need this?
HTTP_CONNECTION=keep-alive
HTTP_HOST=blog.taragana.com

// Why does he need this?
HTTP_KEEP_ALIVE=300
HTTP_REFERER=[Truncated]
HTTP_USER_AGENT=[Truncated]

// Why does he have to have my PATH information?
PATH=[PATH environment variable]
REMOTE_ADDR=59.93.245.60
REMOTE_PORT=1567

// How many times does it change on a server? Why does he need it?
// It contains file system information
SCRIPT_FILENAME=[Truncated]

// How many times does it change on a server?
SERVER_ADDR=69.36.187.98

// How many times does it change on a server? Why does he need it?
SERVER_ADMIN=Postmaster%40taragana.com
SERVER_NAME=blog.taragana.com

// How many times does it change on a server? What does he need it for?
SERVER_PORT=80

// How many times does it change on a server? What does he need it for?
SERVER_SIGNATURE=[Truncated]
// How many times does it change on a server? What does he need it for?
SERVER_SOFTWARE=[Truncated]

// How many times does it change on a server? What does he need it for?
GATEWAY_INTERFACE=CGI%2F1.1

// How many times does it change on a server? What does he need it for?
SERVER_PROTOCOL=HTTP%2F1.1

// How many times does it change on a server? What does he need it for?
// This is always POST!
REQUEST_METHOD=POST

// How many times does it change on a server? What does he need it for?
QUERY_STRING=

// How many times does it change on a server? What does he need it for?
REQUEST_URI=%2Fwp-comments-post.php

// How many times does it change on a server? What does he need it for?
SCRIPT_NAME=%2Fwp-comments-post.php

// Why does he need to know where I installed WordPress on my server?
PATH_TRANSLATED=[Truncated]

// How many times does it change on a server? What does he need it for?
PHP_SELF=%2Fwp-comments-post.php

// This is inane
argv=Array

// This is inane
argc=0

This huge amount of data (considering it is send for every comment) can consume a not-so-insignificant portion of your bandwidth quota, if you get lots of spam.

It is clear Matt & Co. haven’t taken the effort to filter out the unnecessary information, even though they can easily do so.

Some of these information may also be used by hackers (bad ones). Remember all information is submitted over the internet in cleartext.

Kind of makes you feel warm and fuzzy, doesn’t it?