Writing the java code
- source codes for inspiration or modification - grepcode.com
- tutorial on solr.pl
- each filter must have its own factory for Solr schema
- the core functionality of the filter class is in incrementToken function - it should return true when value in CharTermAttribute is set and false when all tokens from input were processed
Testing the code
- simple testing class inside the jar (alongside with tokenizer sources)
File MainClass.java
package my.custom.package; import java.io.IOException; import java.io.StringReader; import java.util.Map; import java.util.HashMap; import org.apache.lucene.analysis.util.TokenizerFactory; import org.apache.lucene.analysis.standard.StandardTokenizerFactory; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; public class MainClass { public static void main(String[] args) throws IOException { Map param = new HashMap<>(); param.put("luceneMatchVersion", "LUCENE_44"); System.out.print("Enter the string to analyze: "); // open up standard input BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); String input = null; // read the username from the command-line; need to use try/catch with the // readLine() method try { input = br.readLine(); } catch (IOException ioe) { System.out.println("IO error trying to read your input!"); System.exit(1); } TokenizerFactory stdTokenFact = new StandardTokenizerFactory(param); Tokenizer tokenizer = stdTokenFact.create(new StringReader(input)); param.put("luceneMatchVersion", "LUCENE_44"); param.put("minGramSize", "2"); param.put("maxGramSize", "512"); CustomNGramFilterFactory customNGramFactory = new CustomNGramFilterFactory(param); TokenStream tokenStream = customNGramFactory.create(tokenizer); CharTermAttribute termAttrib = (CharTermAttribute) tokenStream.getAttribute(CharTermAttribute.class); tokenStream.reset(); while (tokenStream.incrementToken()) { //System.out.println("CharTermAttribute Length = " + termAttrib.length()); System.out.println(termAttrib.toString()); } tokenStream.end(); tokenStream.close(); } }
- build the whole jar and run the test: java -classpath /path/to/my-jar.jar my.custom.package.MainClass
- further testing could be performed on the Analysis page for the collection in Solr administration web panel
Runing and deploying
For single Solr instance, you should add the path with your jar to the solrconfig.xml:
<lib dir="/path/to" regex="*\.jar" />
For SolrCloud it is wiser to store your jar in the special system-level collection (to avoid managing the jar on all nodes of your cluster) and manage it using the Blob Store API - see documentation