|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjvntextpro.JVnTextPro
public class JVnTextPro
The Class JVnTextPro.
Field Summary | |
---|---|
CompositeUnicode2Unicode |
convertor
The convertor. |
Constructor Summary | |
---|---|
JVnTextPro()
Instantiates a new j vn text pro. |
Method Summary | |
---|---|
boolean |
initPosTagger(java.lang.String modelDir)
Initialize the pos tagger for Vietnamese. |
boolean |
initSegmenter(java.lang.String modelDir)
Initialize the word segmetation for Vietnamese. |
boolean |
initSenSegmenter(java.lang.String modelDir)
Initialize the sentence segmetation for Vietnamese return true if the initialization is successful and false otherwise. |
void |
initSenTokenization()
Initialize the sentence tokenization. |
java.lang.String |
posTagging(java.lang.String text)
Do pos tagging. |
java.lang.String |
postProcessing(java.lang.String text)
Do post processing for word segmentation: break not valid vietnamese words into single syllables. |
java.lang.String |
process(java.io.File infile)
Process a file and return the processed text pipeline : sentence segmentation, tokenization, tone recover, word segmentation. |
java.lang.String |
process(java.lang.String text)
Process the text and return the processed text pipeline : sentence segmentation, tokenization, word segmentation, part of speech tagging. |
java.lang.String |
senSegment(java.lang.String text)
Do sentence segmentation. |
java.lang.String |
senTokenize(java.lang.String text)
Do sentence tokenization. |
java.lang.String |
wordSegment(java.lang.String text)
Do word segmentation. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public CompositeUnicode2Unicode convertor
Constructor Detail |
---|
public JVnTextPro()
Method Detail |
---|
public boolean initSenSegmenter(java.lang.String modelDir)
modelDir
- the model dir
public boolean initSegmenter(java.lang.String modelDir)
modelDir
- the model dir
public boolean initPosTagger(java.lang.String modelDir)
modelDir
- the model dir
public void initSenTokenization()
public java.lang.String process(java.lang.String text)
text
- text to be processed
public java.lang.String process(java.io.File infile)
infile
- data file
public java.lang.String senSegment(java.lang.String text)
text
- text to have sentences segmented
public java.lang.String senTokenize(java.lang.String text)
text
- to be tokenized
public java.lang.String wordSegment(java.lang.String text)
text
- to be segmented by words
public java.lang.String posTagging(java.lang.String text)
text
- to be tagged with POS of speech (need to have words segmented)
public java.lang.String postProcessing(java.lang.String text)
text
- the text
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |