jvntextpro
Class JVnTextPro

java.lang.Object
  extended by jvntextpro.JVnTextPro

public class JVnTextPro
extends java.lang.Object

The Class JVnTextPro.


Field Summary
 CompositeUnicode2Unicode convertor
          The convertor.
 
Constructor Summary
JVnTextPro()
          Instantiates a new j vn text pro.
 
Method Summary
 boolean initPosTagger(java.lang.String modelDir)
          Initialize the pos tagger for Vietnamese.
 boolean initSegmenter(java.lang.String modelDir)
          Initialize the word segmetation for Vietnamese.
 boolean initSenSegmenter(java.lang.String modelDir)
          Initialize the sentence segmetation for Vietnamese return true if the initialization is successful and false otherwise.
 void initSenTokenization()
          Initialize the sentence tokenization.
 java.lang.String posTagging(java.lang.String text)
          Do pos tagging.
 java.lang.String postProcessing(java.lang.String text)
          Do post processing for word segmentation: break not valid vietnamese words into single syllables.
 java.lang.String process(java.io.File infile)
          Process a file and return the processed text pipeline : sentence segmentation, tokenization, tone recover, word segmentation.
 java.lang.String process(java.lang.String text)
          Process the text and return the processed text pipeline : sentence segmentation, tokenization, word segmentation, part of speech tagging.
 java.lang.String senSegment(java.lang.String text)
          Do sentence segmentation.
 java.lang.String senTokenize(java.lang.String text)
          Do sentence tokenization.
 java.lang.String wordSegment(java.lang.String text)
          Do word segmentation.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

convertor

public CompositeUnicode2Unicode convertor
The convertor.

Constructor Detail

JVnTextPro

public JVnTextPro()
Instantiates a new j vn text pro.

Method Detail

initSenSegmenter

public boolean initSenSegmenter(java.lang.String modelDir)
Initialize the sentence segmetation for Vietnamese return true if the initialization is successful and false otherwise.

Parameters:
modelDir - the model dir
Returns:
true, if successful

initSegmenter

public boolean initSegmenter(java.lang.String modelDir)
Initialize the word segmetation for Vietnamese.

Parameters:
modelDir - the model dir
Returns:
true if the initialization is successful and false otherwise

initPosTagger

public boolean initPosTagger(java.lang.String modelDir)
Initialize the pos tagger for Vietnamese.

Parameters:
modelDir - the model dir
Returns:
true if the initialization is successful and false otherwise

initSenTokenization

public void initSenTokenization()
Initialize the sentence tokenization.


process

public java.lang.String process(java.lang.String text)
Process the text and return the processed text pipeline : sentence segmentation, tokenization, word segmentation, part of speech tagging.

Parameters:
text - text to be processed
Returns:
processed text

process

public java.lang.String process(java.io.File infile)
Process a file and return the processed text pipeline : sentence segmentation, tokenization, tone recover, word segmentation.

Parameters:
infile - data file
Returns:
processed text

senSegment

public java.lang.String senSegment(java.lang.String text)
Do sentence segmentation.

Parameters:
text - text to have sentences segmented
Returns:
the string

senTokenize

public java.lang.String senTokenize(java.lang.String text)
Do sentence tokenization.

Parameters:
text - to be tokenized
Returns:
the string

wordSegment

public java.lang.String wordSegment(java.lang.String text)
Do word segmentation.

Parameters:
text - to be segmented by words
Returns:
text with words segmented, syllables in words are joined by '_'

posTagging

public java.lang.String posTagging(java.lang.String text)
Do pos tagging.

Parameters:
text - to be tagged with POS of speech (need to have words segmented)
Returns:
the string

postProcessing

public java.lang.String postProcessing(java.lang.String text)
Do post processing for word segmentation: break not valid vietnamese words into single syllables.

Parameters:
text - the text
Returns:
the string