Friso
Friso is a Chinese tokenizer developed in C. It uses the popular mmseg algorithm to tokenize the Chinese characters. You can reuse the library in your programs easily, such as MySQL, PHP.
- Supports only UTF8 encoding
- Accuracy: 98.41%
- Supports custom dictionary
- Supports Chinese, English and mixed.
How to use the library in your C program
//run friso test program.
friso -init friso.ini
friso -init /c/friso/friso.ini
friso_t friso;
friso_task_t task;
friso = friso_new_from_ifile(__path__);
task = friso_new_task();
friso_set_text( task, "text to be tokenized");
while ( ( friso_next( friso, friso->mode, task ) ) != NULL ) {
//printf("%s[%d,%d]/ ", task->hits->word, task->hits->type, task->hits->offset );
printf("%s/ ", task->hits->word );
if ( task->hits->type == __FRISO_NEW_WORDS__ ) {
FRISO_FREE( task->hits->word );
}
}
// Release
friso_free_task( task );
friso_free( friso );