Find in this group all groups
Unknown more information…

s : 14 April 2008 • 4:48AM -0400

Re: [Senseclusters-developers] senseclusters-developers Digest, Vol 7, Issue 2
by Ted Pedersen


On Sat, Apr 12, 2008 at 5:19 AM, Teshome Kassie <tkheran@yaho...> wrote:
> Hell all;
> Does SenseClusters support Utf-8 ?
> Teshome

Great question, and I think the answer is no. Unfortunately not. The main issue
I think is not so much SenseClusters as it is Text::NSP, which is what we use
for a significant portion of our feature extraction needs.

There has been considerable discussion regarding how to make Text::NSP
better at handling different character sets. If you are interested in
the history of
that discussion, you can see the most recent version of it here:

The short version is that I've decided that the right thing to do is to use the
Perl module Encode in Text::NSP to provide full unicode support. The only
draw back is that this requires a bit of work, and right now it hasn't
risen high
enough in the queue. But, it's getting there, especially since SenseClusters
has such a heavy dependence on Text::NSP.

So, that's the long term solution I have planned. Unfortunately that
doesn't help
much in the shorter term.

Sorry I don't have a better answer. Other suggestions are most welcome.


Ted Pedersen

This email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.;198757673;13503038;p?
senseclusters-developers mailing list

Bookmark with:

Delicious   Digg   reddit   Facebook   StumbleUpon

opensubscriber is not affiliated with the authors of this message nor responsible for its content.