strip_tags() in PHP has several problems. It doesn't recognize that css within the style tags are not document text. It will not remove HTML entities or content within script tags. strip_tags() fails for invalid HTML. In short strip_tags() is not advisable to use except for trivial cases. The best solution I have come across is by uersoy at tnn dot net:


function html2txt($document){
  $search = array('@]*?>.*?@si',  // Strip out javascript
                 '@]*?>.*?@siU',    // Strip style tags properly
                 '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
                 '@@'         // Strip multi-line comments including CDATA
  );
  $text = preg_replace($search, '', $document);
  return $text;
}