Search Lucene performance on a large index

For programming and general questions on Zend Framework
Post Reply
mozillalives
Posts: 5
Joined: Mon Apr 27, 2009 4:15 pm

Search Lucene performance on a large index

Post by mozillalives » Sat May 16, 2009 2:06 pm

Does anyone have any tips on using Zend Lucene with a large (230MB) index? I've had to push the memory limit up to 128MB to process queries. And it still takes about 6-8 seconds on a fully optimized index. I was able to get better results when I limited the number returned, but I would prefer not to resort to that.

dsb1971
Posts: 3
Joined: Mon May 18, 2009 6:59 am
Location: Germany
Contact:

Re: Search Lucene performance on a large index

Post by dsb1971 » Mon May 18, 2009 7:11 am

Could it be that you are using Lucene like a database? That is not what it is meant for!
What is the design of your index?

Maybe you are suffering from a design failure -> did you place large data in the index? It is a better way to let Lucene index the content but not to store the content itself in the index (use "unstored" for large fields). Just store a unique key in the index, get your hits and after that get the real content from the original database, file or whatever using you unique key.
I was able to get better results when I limited the number returned, but I would prefer not to resort to that
From my point of view it doesn't make sense not to limit the number of hits. If a user uses your search, he don't want to get a list with 2000 hits. So I wonder in which context do you use the Lucene search?
Best regards from Germany,
Daniel Schlichtholz

mozillalives
Posts: 5
Joined: Mon Apr 27, 2009 4:15 pm

Re: Search Lucene performance on a large index

Post by mozillalives » Mon May 18, 2009 1:54 pm

No, I'm not using lucene as a database (I read those warnings too). No I'm not storing lots of info in the index, just what I need to display a simple results page. And no, I don't want to limit the number of results. I'm not going to presume what the user wants in his search. Also note in the docs that it states

It doesn't give the 'best N' results, but only the 'first N'[7].

http://framework.zend.com/manual/en/zen ... s-limiting

I'd rather give the user all the results to his query.

dsb1971
Posts: 3
Joined: Mon May 18, 2009 6:59 am
Location: Germany
Contact:

Re: Search Lucene performance on a large index

Post by dsb1971 » Mon May 18, 2009 4:03 pm

mozillalives wrote:Also note in the docs that it states

It doesn't give the 'best N' results, but only the 'first N'[7].

http://framework.zend.com/manual/en/zen ... s-limiting

I'd rather give the user all the results to his query.
Ouch, you are right. I didn't read this before.
In that case I'd also fetch all results in order to get the hits correctly ordered by their score-points and programmatically would only display the first n hits.
I have no idea how to speed up this.

I am building an index over 5 fields of 1.8 millionen records on IBM iseries. Maybe we can compare the size of the index and the speed of the queries after I have set it up.
Best regards from Germany,
Daniel Schlichtholz

mozillalives
Posts: 5
Joined: Mon Apr 27, 2009 4:15 pm

Re: Search Lucene performance on a large index

Post by mozillalives » Thu Jul 23, 2009 11:13 pm

Just to follow up on this topic, I never did solve my slow index/search times. Instead I ran across this article

http://www.miximum.fr/tutos/192-integrer-solr-a-symfony

about integrating Solr into a php project and decided to give that a try. It was pretty easy to setup and the performance was spectacular. It wasn't the solution I was looking for, but it helped me out of a tight spot and I hope it can help someone else too.

Post Reply