Custom Lucene/Solr filter.

Správa článků

Vyhledávání

How to write, test and use your own filter.

Writing the java code

source codes for inspiration or modification - grepcode.com
tutorial on solr.pl
each filter must have its own factory for Solr schema
the core functionality of the filter class is in incrementToken function - it should return true when value in CharTermAttribute is set and false when all tokens from input were processed

Testing the code

simple testing class inside the jar (alongside with tokenizer sources)

File MainClass.java

package my.custom.package;

import java.io.IOException;
import java.io.StringReader;
import java.util.Map;
import java.util.HashMap;
import org.apache.lucene.analysis.util.TokenizerFactory;
import org.apache.lucene.analysis.standard.StandardTokenizerFactory;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class MainClass
{
    public static void main(String[] args) throws IOException {
        Map param = new HashMap<>();
        param.put("luceneMatchVersion", "LUCENE_44");

        System.out.print("Enter the string to analyze: ");

        //  open up standard input
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        String input = null;

        //  read the username from the command-line; need to use try/catch with the
        //  readLine() method
        try {
            input = br.readLine();
        } catch (IOException ioe) {
            System.out.println("IO error trying to read your input!");
            System.exit(1);
        }

        TokenizerFactory stdTokenFact = new StandardTokenizerFactory(param);
        Tokenizer tokenizer = stdTokenFact.create(new StringReader(input));

        param.put("luceneMatchVersion", "LUCENE_44");
        param.put("minGramSize", "2");
        param.put("maxGramSize", "512");
        CustomNGramFilterFactory customNGramFactory = new CustomNGramFilterFactory(param);
        TokenStream tokenStream = customNGramFactory.create(tokenizer);

        CharTermAttribute termAttrib = (CharTermAttribute) tokenStream.getAttribute(CharTermAttribute.class);

        tokenStream.reset();

        while (tokenStream.incrementToken()) {

            //System.out.println("CharTermAttribute Length = " + termAttrib.length());

            System.out.println(termAttrib.toString());
        }

        tokenStream.end();
        tokenStream.close();
    }
}

build the whole jar and run the test: java -classpath /path/to/my-jar.jar my.custom.package.MainClass
further testing could be performed on the Analysis page for the collection in Solr administration web panel

Runing and deploying

For single Solr instance, you should add the path with your jar to the solrconfig.xml:
<lib dir="/path/to" regex="*\.jar" />

For SolrCloud it is wiser to store your jar in the special system-level collection (to avoid managing the jar on all nodes of your cluster) and manage it using the Blob Store API - see documentation

Odkaz na článek

Vytvořil 4. září 2015 v 15:37:07 mira. Upravováno 45x, naposledy 4. září 2015 ve 22:32:35, mira