Skip to main content

Posts

Showing posts from November, 2010

Stripping HTML from text in SQL Server–Version 3

  I’ve used the HTML stripping function for SQL Server available in lazycoders.blogspot.com , which is the second version of the originally published in blog.sqlauthority.com . But neither one removes the comments in this case: <!-- <b>hello world</b> --> Hello which is more or less the code that MS Word generates. Well, the function with that fixed is this (changes are in bold): ALTER FUNCTION [dbo].[DeHtmlize] ( @HTMLText varchar ( MAX ) ) RETURNS varchar ( MAX ) AS BEGIN DECLARE @ Start int DECLARE @ End int DECLARE @Length int -- Replace the HTML entity &amp; with the '&' character (this needs to be done first, as -- '&' might be double encoded as '&amp;amp;') SET @ Start = CHARINDEX( '&amp;' , @HTMLText) SET @ End = @ Start + 4 SET @Length = (@ End - @ Start ) + 1 WHILE (@ Start > 0 AND @ End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @ Start , @Le