How do I convert a file system index to a memory index?
Author: Deron Eriksson
Description: This Java tutorial shows how to convert a Lucene file system index to an in-memory index.
Tutorial created using:
Windows XP || JDK 1.5.0_09 || Eclipse Web Tools Platform 2.0 (Eclipse 3.3.0)
In certain situations, if you have enough memory, it can be useful to copy an index from the file system into memory. This results in faster searches against the index, since no reads from a hard drive need to be performed against an in-memory index since it is already in memory. This can be accomplished via the RAMDirectory(Directory directory) constructor of the RAMDirectory class, where we pass the FSDirectory to be copied as the parameter of the constructor. This constructor makes a copy of the Directory that is passed as a parameter. As a result, it is independent from the original directory, so changes to the original directory that occur after the copy has taken place will not be reflected in the new RAMDirectory. We will demonstrate this will the following project: ![]() Two text files in the filesToIndex directory will be indexed. The first one, deron-foods.txt, lists some foods that I like. deron-foods.txtHere are some foods that Deron likes: hamburger french fries steak mushrooms artichokes The second text file, nicole-foods.txt, lists some foods that Nicole likes. nicole-foods.txtHere are some foods that Nicole likes: apples bananas salad mushrooms cheese The LuceneFileSystemToRamDemo class is our demonstration class. If we look at its main() method, we can see that it first creates a file system index via its createFileSystemIndex() method. This occurs using FSDirectory, which is obtained via: fileSystemDirectory = FSDirectory.getDirectory(INDEX_DIRECTORY); Following this, we make a copy of the file system index into memory via the aforementioned RAMDirectory constructor: memoryDirectory = new RAMDirectory(fileSystemDirectory); Following this, we perform 5 searches using the file system index and measure the total time, and then we perform another 5 searches using the memory index and measure the total time. I won't describe the details involved in the search process, since these are covered in another tutorial. LuceneFileSystemToRamDemo.javapackage avajava; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.io.Reader; import java.util.Date; import java.util.Iterator; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.Hit; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.store.LockObtainFailedException; import org.apache.lucene.store.RAMDirectory; public class LuceneFileSystemToRamDemo { public static final String FILES_TO_INDEX_DIRECTORY = "filesToIndex"; public static final String INDEX_DIRECTORY = "indexDirectory"; public static final String FIELD_PATH = "path"; public static final String FIELD_CONTENTS = "contents"; public static Directory fileSystemDirectory = null; public static Directory memoryDirectory = null; public static void main(String[] args) throws Exception { createFileSystemIndex(); memoryDirectory = new RAMDirectory(fileSystemDirectory); doSearches(fileSystemDirectory); doSearches(memoryDirectory); } public static void doSearches(Directory directory) throws IOException, ParseException { long start = new Date().getTime(); searchIndex(directory, "mushrooms"); searchIndex(directory, "steak"); searchIndex(directory, "steak AND cheese"); searchIndex(directory, "steak and cheese"); searchIndex(directory, "bacon OR cheese"); long end = new Date().getTime(); System.out.println("TOTAL SEARCH TIME (using " + directory.getClass().getSimpleName() + ") in milliseconds:" + (end - start)); } public static void createFileSystemIndex() throws CorruptIndexException, LockObtainFailedException, IOException { Analyzer analyzer = new StandardAnalyzer(); boolean recreateIndexIfExists = true; fileSystemDirectory = FSDirectory.getDirectory(INDEX_DIRECTORY); IndexWriter indexWriter = new IndexWriter(fileSystemDirectory, analyzer, recreateIndexIfExists); File dir = new File(FILES_TO_INDEX_DIRECTORY); File[] files = dir.listFiles(); for (File file : files) { Document document = new Document(); String path = file.getCanonicalPath(); document.add(new Field(FIELD_PATH, path, Field.Store.YES, Field.Index.UN_TOKENIZED)); Reader reader = new FileReader(file); document.add(new Field(FIELD_CONTENTS, reader)); indexWriter.addDocument(document); } indexWriter.optimize(); indexWriter.close(); } public static void searchIndex(Directory directory, String searchString) throws IOException, ParseException { System.out.println("Searching for '" + searchString + "'"); IndexSearcher indexSearcher = new IndexSearcher(directory); Analyzer analyzer = new StandardAnalyzer(); QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer); Query query = queryParser.parse(searchString); Hits hits = indexSearcher.search(query); System.out.println("Number of hits: " + hits.length()); Iterator<Hit> it = hits.iterator(); while (it.hasNext()) { Hit hit = it.next(); Document document = hit.getDocument(); String path = document.get(FIELD_PATH); System.out.println("Hit: " + path); } } } Let's look at the console output from the execution of LuceneFileSystemToRamDemo. Console OutputSearching for 'mushrooms' Number of hits: 2 Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Searching for 'steak' Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Searching for 'steak AND cheese' Number of hits: 0 Searching for 'steak and cheese' Number of hits: 2 Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Searching for 'bacon OR cheese' Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt TOTAL SEARCH TIME (using FSDirectory) in milliseconds:125 Searching for 'mushrooms' Number of hits: 2 Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Searching for 'steak' Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Searching for 'steak AND cheese' Number of hits: 0 Searching for 'steak and cheese' Number of hits: 2 Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Searching for 'bacon OR cheese' Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt TOTAL SEARCH TIME (using RAMDirectory) in milliseconds:16 In the console output, notice that the 5 searches against the file system index took 125 milliseconds, and the 5 searches against the memory index took 16 milliseconds. This is what we would expect, since the file system index searches required hard disk I/O operations, while the memory index searches went against the index in memory, so no I/O was required. Related Tutorials: |