Write a Java Custom Semantic Processor
The difference with the Java Custom Tokenizer is in the input:
The tokenizer receives a text chunk to process.
While for the Java Custom Semantic Processor, you have to get the tokens from the pipeline (see Sample Semantic Processor).
Derive your MySemanticProcessor
class from
and implement:
@PropertyLabel(value = "JavaCustomSemanticProcessor") @CVComponentConfigClass(configClass = MySemanticProcessorConfig.class) @CVComponentDescription(value = "My semantic processor in Java") public class MySemanticProcessor extends com.exalead.pdoc.analysis.JavaCustomSemanticProcessor { public MySemanticProcessor(MySemanticProcessorConfig config) throws Exception; /** * Called when a new document is about to get processed. */ public void newDocument(); /** * Called when there is no more input to process in the current document. * This is the last chance to attach annotations to the document if needed. */ public void endDocument(); /** * Called at initialization to retrieve the annotation tags that are planned to be used during processing. * Only declared annotations will be accessible on tokens retrieved with getNextToken(). * @return the list of all annotation tags needed or null if none */ public String[] declareAnnotations(); /** * Called when a new input chunk is to be processed. * The processor must pump tokens from pipe using getNextToken() * and return them once processed to the pipe with pushToken(). * * @param text the chunk text * @param language the chunk language * @param context the chunk context * @see getNextToken(), newAnnotation(), pushToken() */ public void processChunk(String chunk, int language, String context) throws Exception; }
The JavaCustomSemanticProcessor
provides you with
helpers too:
package com.exalead.pdoc.analysis; public abstract class JavaCustomSemanticProcessor extends CustomDocumentProcessor { ... /** * Pump the next token from the input stream. * @return the next token from the pipe or null if end of input is reached */ protected final AnnotatedToken getNextToken(); /** * Allocate a new annotation with the provided tag, value and length. * The annotation is either created or recycled from a previous use. * * @param tag the new annotation tag * @param value the new annotation value * @param nbTokens the new annotation length * @return a fresh or recycled annotation * @pre tag must have been declared in declareAnnotation() * @pre value is not null * @pre nbTokens > 0 */ protected final Annotation newAnnotation(String tag, String displayForm, int nbTokens) throws InvalidAnnotationException; /** * Send a token to the output stream. * * - validity of the token is checked * - the token is added to the output buffer * - if needed, the output buffer is flushed * - the token is recycled * * In all cases, the token and its annotations are not usable anymore after the call. * * @param token A token allocated through a call to newToken() * @pre token is not null * @pre token form is not null nor empty * @pre token type is defined * @see newToken(), newAnnotation() */ protected final void pushToken(AnnotatedToken token) throws InvalidTokenException; /** * Attach an annotation to the currently processed document * * @param annotation the annotation to attach * @pre annotation must have been allocated with newAnnotation() * @see newAnnotation() */ protected final void addDocumentAnnotation(Annotation annotation) throws InvalidAnnotationException; ... }