Any reasonable (web) application which deals with some subset of web data (rss feeds, web pages, product pricing data etc.), has to use distributed data processing to go anywhere. Unless you have very deep pockets and / or have strong VC funding (which is rarer than a bottle of Mouton Rothschild Pauillac Premier Cru First Growth these days) you will have to opt for consumer grade hardware (instead of big iron which more companies opted during the dotcom era ) and use them in a distributed computing framework like hadoop or GridGain or older (more mature) high throughput computing systems like OpenPBS (ignore the marketing talk to download the free version) or Condor.

Full article (703 words) »