Friso

Friso is a Chinese tokenizer developed in C. It uses the popular mmseg algorithm to tokenize the Chinese characters. You can reuse the library in your programs easily, such as MySQL, PHP.
  • Supports only UTF8 encoding
  • Accuracy: 98.41%
  • Supports custom dictionary
  • Supports Chinese, English and mixed.

How to use the library in your C program

//run friso test program. 
friso -init friso.ini
friso -init /c/friso/friso.ini
friso_t friso; 
friso_task_t task; 
friso = friso_new_from_ifile(__path__); 
task = friso_new_task(); 
friso_set_text( task, "text to be tokenized"); 

while ( ( friso_next( friso, friso->mode, task ) ) != NULL ) { 
    //printf("%s[%d,%d]/ ", task->hits->word, task->hits->type, task->hits->offset ); 
    printf("%s/ ", task->hits->word ); 
    if ( task->hits->type == __FRISO_NEW_WORDS__ ) { 
        FRISO_FREE( task->hits->word );
    } 
} 

// Release
friso_free_task( task ); 
friso_free( friso );

Last edited Dec 30, 2012 at 4:45 PM by stevenbburns, version 2