opensubscriber
   Find in this group all groups
 
Unknown more information…

n : nutch-developers@lists.sourceforge.net 1 June 2005 • 4:18AM -0400

[Nutch-dev] Possible deadlock in PDFBox parser - with a fix.
by Andrzej Bialecki

REPLY TO AUTHOR
 
REPLY TO GROUP




Hi,

First, the symptoms: I was doing some tests on sites with many PDFs, and
the Fetcher was gradually slowing down, until it became stuck. This was
repeatable. A thread dump showed all threads waiting somewhere in PDFBox
code (which is used by parse-pdf). In an email exchange with the author
(Ben Litchfield) he confirmed that there was a problem in the latest
official release of PDFBox, which could result in such behaviour.

If you experienced such problems, the fix is to use the latest CVS
version of PDFBox, where this problem is believed to be fixed.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nutch-developers mailing list
Nutch-developers@list...
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Bookmark with:

Delicious   Digg   reddit   Facebook   StumbleUpon

opensubscriber is not affiliated with the authors of this message nor responsible for its content.