Approach to Incrementally Index Database Data from Multi-Table Join in
Lucene with No Unique Key
I have a particular SQL join such that:
select DISTINCT ... 100 columns
from ... 10 tabes, some left joins
Currently I export the result of this query to XML using Toad (I'll query
it straight from Java later). I use Java to parse the XML file, and I use
Lucene (Java) to index it and to search the Lucene index. This works
great: I get results 6-10 times faster than querying it from the database.
I need to think of a way to incrementally update this index when the data
in the database changes.
Because I am joining tables (especially left joins) I'm not sure I can get
a unique business key combination to do an incremental update. On the
other hand, because I am using DISTINCT, I know that every single field is
a unique combination. Given this information, I thought I could put the
hashCode of a document as a field of the document, and call updateDocument
on the IndexWriter like this:
public static void addDoc(IndexWriter w, Row row) throws IOException {
//Row is simply a java representation of a single row from the above
query
Document document = new Document();
document.add(new StringField("fieldA", row.fieldA, Field.Store.YES));
...
String hashCode = String.valueOf(document.hashCode());
document.add(new StringField("HASH", hashCode, Field.Store.YES));
w.updateDocument(new Term("HASH", hashCode), document);
}
Then I realized that updateDocument was actually deleting the document
with the matching hash code and adding the identical document again, so
this wasn't of any use.
What is the way to approach this?
No comments:
Post a Comment