I’ve used the HTML stripping function for SQL Server available in lazycoders.blogspot.com, which is the second version of the originally published in blog.sqlauthority.com. But neither one removes the comments in this case:
<!-- <b>hello world</b> --> Hello
which is more or less the code that MS Word generates.
Well, the function with that fixed is this (changes are in bold):
ALTER FUNCTION [dbo].[DeHtmlize] ( @HTMLText varchar(MAX) ) RETURNS varchar(MAX) AS BEGIN DECLARE @Start int DECLARE @End int DECLARE @Length int -- Replace the HTML entity & with the '&' character (this needs to be done first, as -- '&' might be double encoded as '&amp;') SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&') SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity < with the '<' character SET @Start = CHARINDEX('<', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '<') SET @Start = CHARINDEX('<', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity > with the '>' character SET @Start = CHARINDEX('>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '>') SET @Start = CHARINDEX('>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity & with the '&' character SET @Start = CHARINDEX('&amp;', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&') SET @Start = CHARINDEX('&amp;', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity with the ' ' character SET @Start = CHARINDEX(' ', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, ' ') SET @Start = CHARINDEX(' ', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 END -- Replace any <br> tags with a newline SET @Start = CHARINDEX('<br>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10)) SET @Start = CHARINDEX('<br>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Replace any <br/> tags with a newline SET @Start = CHARINDEX('<br/>', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)') SET @Start = CHARINDEX('<br/>', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace any <br /> tags with a newline SET @Start = CHARINDEX('<br />', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)') SET @Start = CHARINDEX('<br />', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 END -- Remove anything between comments <!-- --> SET @Start = CHARINDEX('<!--', @HTMLText) SET @End = CHARINDEX('-->', @HTMLText, CHARINDEX('<!--', @HTMLText)) SET @Length = (@End - @Start) + 3 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('<!--', @HTMLText) SET @End = CHARINDEX('-->', @HTMLText, CHARINDEX('<!--', @HTMLText)) SET @Length = (@End - @Start) + 3 END -- Remove anything between <whatever> tags SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END GO
Hope this helps. Enjoy!
This was perfect, thank you!
ReplyDeleteHello, great post, It still leaves the following in the field:
ReplyDeleteCHAR(13) + CHAR(10).
Can you update the function to remove this as well?
Many Thanks.
Patrick
This comment has been removed by the author.
ReplyDeleteHi There,
ReplyDeleteThis function is working fine for converting output to text.
But we need some tags allowed in the text to format it like
<b>,<li>, <font>, <a> with there closing matching tags.
Please help.
Regards