First, you enumerate your data source and push all documents in the index using the Push API. Then, you need to regularly update the index by adding new documents, and deleting documents that have been deleted in the source. Sometimes you are not notified by the source of deleted documents and you don't know which are the documents to delete from the index. In such case, your only solution is to compare documents present in the index with documents present in the data source, and then delete documents that are in the index but no longer in the source. To do so, we provide a
Using this class you will create an object that calculates the set of items present in a data source A and NOT in a data source B, to know exactly which items to delete from the index. This calculation will be performed in a java heap buffer and will swap
on disk if there is not enough memory available. Items enumeration is processed
through Cursor objects, which are enumerators. This allows you to use the
/** * Iterator on the CloudView index */ private Cursor<byte[]> getCursorFromCloudview(final PushAPI papi) throws PushAPIException { return new Cursor<byte[]>() { private Iterator<SyncedEntry> syncedEntries = papi.enumerateSyncedEntries("", EnumerationMode.NOT_RECURSIVE_ALL).iterator(); @Override public void close() throws IOException { } @Override public byte[] next() throws Exception { if( syncedEntries.hasNext() == false ) { return null; } final SyncedEntry entry = syncedEntries.next(); final String uri = entry.getUri(); return uri.getBytes("UTF-8"); } }; } /** * Iterator on a fake data source * which contains 9995 documents with uris from "uri0" to "uri9994" */ private Cursor<byte[]> getCursorFromDataSource() { return new Cursor<byte[]>() { private int index = 0; private int len = 9995; @Override public void close() throws IOException { } @Override public byte[] next() throws Exception { if( index >= len ) return null; final String uri = new String("uri" + index++); return uri.getBytes("UTF-8"); } }; } /** * Sample code to show how to check documents needed to be deleted in * the Cloudview index. * In this sample code we already have 10000 (10K) documents in the Cloudview index * with uris from "uri0" to "uri9999" */ private void checkItemsToDelete(final PushAPI papi, final Logger logger) throws Exception { // computes sources intersection final File workdir = new File(System.getProperty("java.io.tmpdir") + '/' + UUID.randomUUID()); try { // this cursor will enumerate on 10000 documents (the cloudview index) final Cursor<byte[]> cursorFromCloudview = getCursorFromCloudview(papi); try { // this cursor will enumerate on 9995 documents (the data source) final Cursor<byte[]> cursorFromDataSource = getCursorFromDataSource(); try { // maximum amount of memory consumed in Java HEAP (5 MB) final int ramBudget = 5 * 1024 * 1024; // create a subtractor object final Subtractor sub = new Subtractor(workdir, "subtractorName", ramBudget, logger); // computes items that are in source1 and NOT in source2 final Cursor<byte[]> itemsInCloudviewAndNotInDataSource = sub.sub(cursorFromCloudview, cursorFromDataSource); try { // loop on all items that are in the Cloudview index and no longer in the data source for (final byte[] bytes : new IterableCursor<byte[]>(itemsInCloudviewAndNotInDataSource)) { final String s = new String(bytes, "UTF-8"); papi.deleteDocument(s);// delete the document to update the index } } finally { itemsInCloudviewAndNotInDataSource.close(); } } finally { cursorFromDataSource.close(); } } finally { cursorFromCloudview.close(); } } finally { FileUtils.deleteQuietly(workdir); } } |