PubSub Search Down; PubSub Too?September 12th, 2006 Lately whenever I am checking my blog stats in PubSub, I am getting "blog.taragana.com" has not been active in the past 30 days.. blog.taragana.com has been very much active during this time.
Full Content RSS Feeds Compounding the Splog Problem?October 21st, 2005 It appears to me full content RSS feeds are actually helping and compounding the splog (spam blog) problem. Splogs or spam blogs are blogs which are created by using syndicated content of other blogs, automatically aggregated, to increase their PR and AdSense dollars.
OpenDNS: A Valid Web 2.0 Business Model; Is It Good For you?July 19th, 2006 I am intrigued by OpenDNS business model. OpenDNS is a startup which offers DNS service to any user accessing the internet.
Second best use of FlashApril 4th, 2005 Previously I had posted on one of the best use of Flash. Now I have seen the second best use of Flash - as a cookie creator, cookies (persistent user information stored on client machines and accessible from the web server and are often used to track client's online activity) which are hard to delete and are not amenable to normal methods of cookie deletion.
Spreading some click-love... The future of free webFebruary 1st, 2005 I have seen references in web-pages requesting visitors to show their appreciation or help by clicking on the advertiser's link. Such desperate methods are of questionable ethics as it provides unwanted clicks for the advertisers, who pays for them.
Basic Details of Apple's Push Notification in iPhone OS 3.0June 20th, 2009 With the update for latest iPhone OS 3.0 available, Apple wants to put the Push Notification system through one last stress test. Apple has released a new support document that provides the details of Apple's Push Notification Service in iPhone OS 3.0.
iPhone 3G 8GB Model Costs 4 Times More in India!August 20th, 2008 Vodaphone sent the pricing for iPhone 3G 8GB and 16GB models in India and I have to tell you it is daylight robbery!
The iPhone 3G 8GB model costs Rs. 31, 000 (~ 738$) from Vodaphone whereas the 16GB model costs Rs.
Linux / Fedora Core: How To Use rdiff-backup To Pull BackupsMay 26th, 2007 rdiff-backup is a popular, free, open source mirroring and incremental backup system for posix based operating systems like linux & Mac OS X. It uses rysnc algorithm through librsync but it doesn't use rsync.
Should NYT Ask Readers To Pay for News? - Analysis of the Paid-Content ModelMay 18th, 2005 Bobbie Johnson of Guardian (UK based newspaper) thinks:
I understand that news operations are expensive things (I know, I work in one), but the NYT - which has a prestigious standing on the web - seems to be intent on undermining its own credibility with abandon. What I was expecting from him was to outline, given his mention of his knowledge of the news industry and the associated costs, a plan of action for New York Times.
Can RSS Feeds carry a Virus Payload?June 12th, 2005 With the rapid proliferation of RSS Feeds and offline aggregators it is presumable that virus writers will try to exploit this avenue to spread the virus. But the question is whether it is feasible?
In short technically a resounding YES.
HitTail Website Goes For Tailspin - SolutionsJune 24th, 2006 HitTail is a nifty utility to get suggestions for writing your blog entries following the long tail model. All throughout the day I couldn't login to their site, proving once again - If it sounds too good to be true...
Understanding Software Industry EcosystemApril 8th, 2006 While reading about ecosystems, I found an uncanny similarity with (and gained insight into) software industry ecosystems. Like any other ecosystem, the Software industry ecosystem is in constant flux, with inherent forces that push it toward a stable state that includes niches that are well-defined and sensible with respect to applications.
How To Compare Product IdeasSeptember 17th, 2007 I am faced with the task to evaluate between several strongly competing product ideas in Web 2.0 space. It is not an easy task to evaluate strong competing ideas, each with a reasonable business model, in this space because of unpredictability of the market.
Bye bye comment spam!December 6th, 2004 Finally I took the step to get rid of comment spam without time consuming moderation, which I have been doing so far. My quest first started with a simple solution - moderation of all comments.
Microsoft executive says retail stores 'right next door to Apple' planned for fallJuly 16th, 2009 Microsoft plans stores 'right next door to Apple'SEATTLE — Microsoft Corp.'s chief operating officer says the software maker is planning to open retail stores "right next door to Apple" in the fall. The executive, Kevin Turner, also says Microsoft is "in the game for the long-term" and has hired a retail team.
March 24th, 2005 at 3:09 am
PubSub is a Prospective Search engine that matches your stored queries against data in the network as it changes. The matches are done in real-time. Prospective search uses the publish-subscribe protocol. In publish-subscribe you subscribe to things you are interested in and when those things are published you get them. If own information you want others to see then you publish it. As a subscriber you own your subscriptions. If you no longer find your subscription useful (think it is spam) then delete it. When you publish you do not need to know who is subscribed to your information. In this way publish-subscribe is very loosely-coupled. Bloging is inherently publishing-subscribe.
This is very different from email. In email, the owner of the information owns who to send it to and they must know who you are. To delete a subscription requires you to inform the sender (publisher) to stop sending you email.
Richard Treadway
March 24th, 2005 at 3:39 am
Richard,
I am aware of the publish-subscribe paradigm and I am very much aware of the technologies involved. In fact if you want I can build the core pubsub engine in probably couple of days.
The challenge here is not related to the paradigm but to its specific implementation in PubSub. Suppose I subscribe for “java”. Any post that mentions the word “java” will be displayed to me in real-time. And as you can guess I would not be the only one to have an interest in Java.
It would be trivial for spammers to create free blogs (they already do) which is auto-populated (they already do) with spam posts with a bot (using say Blogger API etc.). Additionally the posts contain keywords of interest to large segment of population. Then they are registered in PubSub.
Such bots can easily generate thousands of posts in a very short time and virtually flood PubSub with spam posts laden with keywords people have subscribed to. So instead of seeing relevant Java posts now I (and others) will see info about organ enhancement pills. You get the picture. And it isn’t pretty.
The problem with PubSub is that it has no way of knowing honest bloggers from spammers, similar to email system.
Publish-Subscribe works great when publishers are trusted. Unfortunately the reality in this scenario is otherwise.
Let me know if you are clear about the problem space.
Angsuman
PS. I have a solution too but its getting late and I am tired
April 3rd, 2005 at 8:28 am
Well, it seems like the spamming on weblog has differnt features to email spamming. While it is hard to prevent anyone with an email address to send spam since email was designed for easy communication between 2 parties. It is possible to filter out blog spams by link structure among blogs and websites. A group of spammers may even form a cluster but I believe the percentage of outsiders to link into cluster in almost zero. While a non spamming cluster of blogs may have nice balance of incoming and outgoing links from outsiders.
April 3rd, 2005 at 1:01 pm
The case you are making is not one of push versus pull, but one of only listening to a set of “trusted” feeds. I thought you could limit the scope of your PubSub subscriptions to attain the same effect, using SOURCE: .. except that you get the information faster.
April 4th, 2005 at 5:53 am
@Eugene Would you please elaborate further on your cluster theory and how it specifically affects this situation?
Many spammers have blogger accounts and so do decent people. In blogger there is a link to randomly connect you to any other blog, a model which directly breaks your clustering idea.
I may subscribe to high quality blogs only thereby lending some credence to your cluster theory.
However in PubSub’s case it doesn’t care about any cluster’s. So as soon as I subscribe to keyword “Java”, I immediately subject myself to any spam originating which contains the keyword Java. Not only that it now comes to my desktop and I have no control over it, other than to disable/uninstall PubSub.
Hope that clarifies…
April 4th, 2005 at 5:56 am
@Per I think PubSub is actually pushing content to my desktop. This is in contrast to the normal model with RSS, where I subscribe to selected feeds and fetch contents from their site; directly or through bloglines etc.
We may argue over the nomenclature but the core problem with the PubSub model remains.
I am not aware of Source to restrict data from certain feeds only. Can you please elaborate.
Even if it is available and I do use it, then PubSub loses its value, because I can as easily fetch my selected feeds directly or through bloglines. PubSub doesn’t have any strategic advantage in this scenario.
April 5th, 2005 at 10:11 am
I believe that incoming random links to spammers’ blogs inserted by blogger will be insignificant in blogsphere. Also it is easy to ignore that next blog feature since it is the same for all the blogger users. It is also not hard to ignore adwords from google, yahoo, msn and so on inside blogs.
There exists algorithms to cluster random graphs. For example http://micans.org/mcl/lit/index.html . It is very possible to cluster the whole blogsphere, then apply pattern recognition algorithm to identified spammer clusters. Which I believe will be better than email spam detectors.
Also, if you ever play around page rank algorithm, You should know that page rank is just a special case in clustering a random graph. But page rank algorithm did not consider clustering, so that’s how google bomb was invented. By I believe google starts to identify those clusters set up by spammers.
April 5th, 2005 at 10:49 am
I assume that graph structure in blogsphere can help us identifying spamming blogs from regular blogs. Therefore, a pubsub system can build a reputation system on clusters. Therefore publishers’ reputation will be the criterion for subscribers to receive their publications. I think it is possible to build such system to weed out spammers. If this can be acheived, then spammers will at the same time be discouraged in the blogsphere. It may create a positive feedback dynamics. Therefore content based pubsub system will be a better model than pull based aggregator model.