Ticket #32 (defect)

Opened 4 years ago

Last modified 4 years ago

Default search fields not registered

Status: reopened

Reported by: rent.lupin.road@gmail.com Assigned to: krys
Priority: normal Milestone:
Component: TurboLucene Version: 0.2.2
Keywords: default field Cc:

Default search fields are set in app.cfg:

turbolucene.search_fields = [ 'content', 'title' ]

Where content and title are:

Field('title', event.title, STORE, UN_TOKENIZED)
Field('content', event.content, COMPRESS, TOKENIZED)

Documents are successfully created, indexed and stored, but attempts to search require field keywords. Suppose both the title and the content contains the word "cheese".

* searching for "cheese" returns 0 results * searching for "title:cheese" returns 0 results * searching for "content:cheese" returns 1 result

It appears that the turbolucene.search_fields configuration parameter is not being recognised, and searching on the STORED, UN_TOKENIZED title is not working at all...

Versions of software: TurboLucene-0.2.2-py2.5 TurboGears-1.0.4b2-py2.5 PyLucene-2.2.0-1

Change History

12/10/07 23:37:32: Modified by krys

  • status changed from new to assigned.

Hi there,

It is my understanding that with UN_TOKENIZED your whole title will be treated as a single string. So unless the title is exactly "cheese", then the title will not match the search term "cheese".

Try changing the title to TOKENIZED and see if that works. TOKENIZED means split the string up on word boundaries and index each word separately. This should fix title:cheese not returning any results.

As for the default search fields, I'm not exactly sure what's going on there, but I do know that if the data was not indexed properly (i.e. as expected) then searching will not produce the expected results. I just double checked the code and by all appearances it *should* work. :)

I suggest trying the above change to the title field and then re-index your data. If the default search is sill not working properly, post back to this ticket and let me know. We'll keep working at it. If I don't hear from you, I'll assume it all worked. :)

Hope this helps, Krys

12/11/07 00:11:18: Modified by rent.lupin.road@gmail.com

Hi, thanks for that! Setting the title field to be TOKENIZED did help.

I now have search results returned as expected for queries such as "title:cheese" and "content:cheese", but "cheese" returns an empty list.

Obviously, indexing is working to some extent as I am getting results back when I specify fields, and I can see words I'm expecting when I inspect the index files: I can upload them if that helps?

Do you think this is a problem with TurboLucene or should I raise it with the PyLucene guys?

Thanks

12/12/07 00:47:59: Modified by krys

Hey,

Glad I could help! :)

I'm really not sure about the search_fields thing. The only think I can think of is to try printing the value in your code to make sure it is actually being interpreted as a list.

The TurboLucene code just blindly passes the value to PyLucene. And since it works (worked) for me, the only thing I can think of is that it's not getting into TurboGears as a list of strings.

If it is comming in correctly (which I would assume because configobj is a great library), then I'm stumped. Unless you want to share you whole app and data so I can try to reporduce the problem, but I am certainly not asking you to do that.

Anyway, I suggest testing to make sure that the setting in app.cfg is, in fact, comming through into Python correctly. Beyond that, I'm not sure.

Hope this helps. Sorry I cannot give a better answer.

Let me know how things go, okay?

Take care, Krys

01/05/08 19:15:37: Modified by krys

  • status changed from assigned to closed.
  • resolution set to worksforme.

01/21/08 19:54:34: Modified by rent.lupin.road@gmai.com

  • status changed from closed to reopened.
  • resolution deleted.

Hi, it does seem that the configuration changes I'm making in dev.cfg and/or app.cfg aren't visible to TurboLucene. I've changed _Searcher.run to add some debug output:

...
        searcher = IndexSearcher(_get_index_path(language))
        search_fields = config.get('turbolucene.search_fields', ['id'])
        log.debug("search_fields: %s" % search_fields)
        parser = MultiFieldQueryParser(search_fields, _analyzer_factory(
          language))
        default_operator = getattr(parser.Operator, config.get(
          'turbolucene.default_operator', 'AND').upper())
        log.debug("operator: %s" % default_operator)
        parser.setDefaultOperator(default_operator)
        try:
            log.debug("query: %s" % query)
            log.debug("parsed query: %s" % parser.parse(query))
            hits = searcher.search(parser.parse(query))
...

Searching for 'wikipedia', with this config in dev.cfg (same effect in app.cfg):

turbolucene.search_fields = [ 'content', 'title' ]
turbolucene.default_operator = 'OR'

I get this output:

search_fields: ['id']
operator: AND
query: wikipedia
parsed query: id:wikipedia

FYI, I also changed run to return a dictionary of ids and scores - ranking is one of the great things in PyLucene, please please expose it!!!