On Sat, Apr 12, 2008 at 5:19 AM, Teshome Kassie <tkheran@yaho...> wrote:
> Hell all;
> Does SenseClusters support Utf-8 ?
Great question, and I think the answer is no. Unfortunately not. The main issue
I think is not so much SenseClusters as it is Text::NSP, which is what we use
for a significant portion of our feature extraction needs.
There has been considerable discussion regarding how to make Text::NSP
better at handling different character sets. If you are interested in
the history of
that discussion, you can see the most recent version of it here:
The short version is that I've decided that the right thing to do is to use the
Perl module Encode in Text::NSP to provide full unicode support. The only
draw back is that this requires a bit of work, and right now it hasn't
enough in the queue. But, it's getting there, especially since SenseClusters
has such a heavy dependence on Text::NSP.