Přejít na menu

Custom Lucene/Solr filter.

Správa článků

Vyhledávání Vyhledávání
4.9.2015 15:35
,
Počet přečtení: 713
How to write, test and use your own filter.

Writing the java code

  • source codes for inspiration or modification - grepcode.com
  • tutorial on solr.pl
  • each filter must have its own factory for Solr schema
  • the core functionality of the filter class is in incrementToken function - it should return true when value in CharTermAttribute is set and false when all tokens from input were processed

Testing the code

  • simple testing class inside the jar (alongside with tokenizer sources)

File MainClass.java

package my.custom.package;

import java.io.IOException;
import java.io.StringReader;
import java.util.Map;
import java.util.HashMap;
import org.apache.lucene.analysis.util.TokenizerFactory;
import org.apache.lucene.analysis.standard.StandardTokenizerFactory;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class MainClass
{
    public static void main(String[] args) throws IOException {
        Map param = new HashMap<>();
        param.put("luceneMatchVersion", "LUCENE_44");

        System.out.print("Enter the string to analyze: ");

        //  open up standard input
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        String input = null;

        //  read the username from the command-line; need to use try/catch with the
        //  readLine() method
        try {
            input = br.readLine();
        } catch (IOException ioe) {
            System.out.println("IO error trying to read your input!");
            System.exit(1);
        }

        TokenizerFactory stdTokenFact = new StandardTokenizerFactory(param);
        Tokenizer tokenizer = stdTokenFact.create(new StringReader(input));

        param.put("luceneMatchVersion", "LUCENE_44");
        param.put("minGramSize", "2");
        param.put("maxGramSize", "512");
        CustomNGramFilterFactory customNGramFactory = new CustomNGramFilterFactory(param);
        TokenStream tokenStream = customNGramFactory.create(tokenizer);

        CharTermAttribute termAttrib = (CharTermAttribute) tokenStream.getAttribute(CharTermAttribute.class);

        tokenStream.reset();

        while (tokenStream.incrementToken()) {

            //System.out.println("CharTermAttribute Length = " + termAttrib.length());

            System.out.println(termAttrib.toString());
        }

        tokenStream.end();
        tokenStream.close();
    }
}
  • build the whole jar and run the test: java -classpath /path/to/my-jar.jar my.custom.package.MainClass
  • further testing could be performed on the Analysis page for the collection in Solr administration web panel

Runing and deploying

For single Solr instance, you should add the path with your jar to the solrconfig.xml:
<lib dir="/path/to" regex="*\.jar" />

For SolrCloud it is wiser to store your jar in the special system-level collection (to avoid managing the jar on all nodes of your cluster) and manage it using the Blob Store API - see documentation

Vytvořil 4. září 2015 v 15:37:07 mira. Upravováno 45x, naposledy 4. září 2015 ve 22:32:35, mira


Diskuze ke článku

Vložení nového komentáře
*
*
*