Skip to main content

Full Text Searching XMLs (Lucene.NET Version)

There was a question in a spanish DBMS list, because a guy was trying to use the SQL Server Full Text Search for indexing any object in the application. The objects were being serialized as XML objects and indexed using the FTS engine, but it started to became complex when trying to perform the query.

Anyway, this post is not about SQL Server, but Lucene.NET. So, I started playing again with that and I end up with this: Download Source Code. I guess this test describes all the power of the mini-framework built over Lucene:

public void MixObjects()
{
    var obj = new Foo()
        .SetDescription("hello world")
        .AddBar("text containing the word happy")
        .AddBar("some comments the word programming");

    var obj2 = new Bar("some comments");

    Engine.IndexObject(obj, 5);
    Engine.IndexObject(obj2, 5);

    Engine.DumpIndexInfo();

    Assert.Equal(1, Engine.LookForAll()
                        .Where(@"Foo\Bars\Bar\Comments", "happy")
                        .Search().ResultCount);

    Assert.Equal(1, Engine.LookForAll()
                        .Where(@"Bar\Comments", "comments")
                        .Search().ResultCount);

    Assert.Equal(0, Engine.LookForAll()
                        .Where(@"Bar\Comments", "unexistent text")
                        .Search().ResultCount);

    Assert.Equal(1, Engine.LookFor<Foo>()
                        .FreeWhere("comments")
                        .Search().ResultCount);

    Assert.Equal(1, Engine.LookFor<Bar>()
                        .FreeWhere("comments")
                        .Search().ResultCount);

    Assert.Equal(2, Engine.LookForAll()
                        .FreeWhere("comments")
                        .Search().ResultCount);
}

Comments

Popular posts from this blog

Making Celery 4 work in Django 1.10 and Elastic Beanstalk

Finally after many many days of trying to make it work and reading thousand of pages, I got Celery working with django 1.10 in Amazon AWS Elastic Beanstalk with SQS (Simple Queue Services) – including Celery Beat!. First, the files I ended up with, then the explanation of what I understand (some of those things still remain being a mystery) STEP 0: Install using the following: pip install -U celery[sqs] pip install django-celery-beat I’m using the following versions of the apps: boto (2.45.0) botocore (1.4.63) celery (4.0.2) Django (1.10.1) django-celery-beat (1.0.1) kombu (4.0.2) pip (9.0.1) pycurl (7.43.0) FILE: /src/PROJECT_NAME/celery.py from __future__ import absolute_import , unicode_literals import os from celery import Celery # set the default Django settings module for the 'celery' program. # DONE IN __init__.py os . environ . setdefault ( "DJANGO_SETTINGS_MODULE" , "PROJECT_NAME.settings.production" ) app = Celery ( 'PR

Stripping HTML from text in SQL Server–Version 3

  I’ve used the HTML stripping function for SQL Server available in lazycoders.blogspot.com , which is the second version of the originally published in blog.sqlauthority.com . But neither one removes the comments in this case: <!-- <b>hello world</b> --> Hello which is more or less the code that MS Word generates. Well, the function with that fixed is this (changes are in bold): ALTER FUNCTION [dbo].[DeHtmlize] ( @HTMLText varchar ( MAX ) ) RETURNS varchar ( MAX ) AS BEGIN DECLARE @ Start int DECLARE @ End int DECLARE @Length int -- Replace the HTML entity &amp; with the '&' character (this needs to be done first, as -- '&' might be double encoded as '&amp;amp;') SET @ Start = CHARINDEX( '&amp;' , @HTMLText) SET @ End = @ Start + 4 SET @Length = (@ End - @ Start ) + 1 WHILE (@ Start > 0 AND @ End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @ Start , @Le