JVnTextPro: A Java-based Vietnamese Text Processing Tool


JVnTextPro is a Java open source tool, which is based on Conditional Random Fields (CRFs) and Maximum Entropy (Maxent), for Natural Language Processing (NLP) in Vietnamese. This tool consists of several steps (or sub-problem tools) for Vietnamese preprocessing and processing designed in a pipeline manner in which output of one step is used for the next step. The sub-problem tools are sentence segmentation tool, sentence tokenization tool, word segmentation tool and Part-of-Speech tagging tool. This tool would be useful for Vietnamese NLP community. We highly appreciate any bug report, comment, and suggestion that help to fix errors and improve the accuracy.

Project Managers:

Cam-Tu Nguyen (1,2) (ncamtu at gmail dot com)

Xuan-Hieu Phan (1) (pxhieu at gmail dot com)

Thu-Trang Nguyen (1) (trangnt84 at gmail dot com)

1. College of Technology, Vietnam National University, Hanoi

2. Graduate School of Information Sciences (GSIS), Tohoku University, Japan

Development Team:

Current version (2.0) of JVnTextPro is developed by the project managers and under the terms of the GNU General Public License. We highly welcome anyone, who would like to develop this tool for the benifits of NLP community in general and Vietnamese NLP community in particular, to join us. Please contact the project managers for further details.


- Version 2.0




Related links:

Researches using this tool for running experiments should include the following citation:

Cam-Tu Nguyen, Xuan-Hieu Phan and Thu-Trang Nguyen, "JVnTextPro: A Java-based Vietnamese Text Processing Tool",, 2010.

We would like to thank for hosting this project.

Last updated: July 19, 2010