boilerpipe


ID Status Summary
84 New How to debug the result? Type-Defect Priority-Medium
83 New Different result when using Web Api and the source api? Type-Defect Priority-Medium
82 New Unsupported content type: null Type-Defect Priority-Medium
81 New Boilerpipe is conflicting with CyberNeko library Type-Defect Priority-Medium
80 New Performance issues with UnicodeTokenizer Type-Defect Priority-Medium
79 New Missing ImageExtractor in downloabale 1.2 jar file Type-Defect Priority-Medium
78 New IllegalArgumentException for many web pages Type-Defect Priority-Medium
77 New Fail to extract main content on some page, get footnote instead Type-Defect Priority-Medium
76 New Incomplete extraction of article Type-Defect Priority-Medium
75 New its not working for a news site Type-Defect Priority-Medium
74 New Xerces for andorid jar file needed Type-Defect Priority-Medium
73 New Missing Maven 1.2.0 Type-Defect Priority-Medium
72 New Extract article from non-english text Type-Defect Priority-Medium
71 New Limit the parsing depth of the html parsing to avoid out of memory situations Type-Defect Priority-Medium
70 New Server returned HTTP response code: 403 for URL (SOLVED) please use this codeline. Type-Defect Priority-Medium
69 New Incomplete extraction of text with special characters Type-Defect Priority-Medium
68 New How to use boilerpipe to get some text with a hyperlink from the web page? Type-Defect Priority-Medium
67 New Program does not terminate for badly formatted/syntactically incorrect HTML input Type-Defect Priority-Medium
66 New boilerpipe - issues detected Type-Defect Priority-Medium
65 New BoilerplateBlockFilter ignores labelToKeep Type-Defect Priority-Medium
64 New Never endning loop Type-Defect Priority-Medium
63 New Difference WebApi - Api Type-Defect Priority-Medium
62 New Hotpatched nekohtml classes cause library incompatibilities Type-Defect Priority-Medium
61 New ContentFusion can change the order of document text Type-Defect Priority-Medium
60 New Faulty XML encoding of characters in <script> tags in <head> Type-Defect Priority-Medium
59 New Runtime Error while using boilerpipe in android Type-Defect Priority-Medium
58 New Extract article HTML from given HTML source? Type-Defect Priority-Medium
57 New BoilerPipe for Android Type-Defect Priority-Medium
56 New Output as JSON Type-Defect Priority-Medium
55 New Can not parse NYtimes pages Type-Defect Priority-Medium
54 New Web api codes? Type-Defect Priority-Medium
53 New Incorrect characters in Extractor output Type-Defect Priority-Medium
52 New Please push 1.2 to maven central Type-Defect Priority-Medium
51 New No tag in svn for 1.2? Type-Defect Priority-Medium
50 Started StackOverflowError when page includes another <body> part in <noframes> Type-Defect Priority-Medium OpSys-All
49 New Article Image Type-Defect Priority-Medium
48 New hybrid extractor? Type-Defect Priority-Medium
47 New Errors deploying to Android Type-Defect Priority-Medium
46 New Library does not produce same results as http://boilerpipe-web.appspot.com/ Type-Defect Priority-Medium
45 Duplicate Ignore FORM tags in HTMLHighlighter Type-Defect Priority-Medium
44 New Ignore FORM tags in HTMLHighlighter Type-Enhancement Priority-Medium
43 New DocumentTitleMatchClassifier should include the « and • characters Type-Defect Priority-Medium
42 Duplicate Patch for /trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/filters/heuristics/DocumentTitleMatchClassifier.java Type-Patch
41 Fixed Title detection: Treat non-breaking space as whitespace Type-Enhancement
40 Duplicate Patch for /trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/sax/DefaultTagActionMap.java Type-Patch
39 Duplicate Patch for /trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/sax/CommonTagActions.java Type-Patch
38 Duplicate Patch for /trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/sax/BoilerpipeHTMLContentHandler.java Type-Patch
37 Fixed timeout and fallback strategy for boilerpipe Type-Defect Priority-Medium
36 Fixed ImageExtractor doesn't detect alternative images for Object plugins Type-Defect Priority-Medium
35 WontFix word counting code does not account for & being special html symbol. Type-Defect Priority-Medium
34 Fixed Add 'getInstance' accessor for ImageExtractor Type-Enhancement Priority-Low
33 Accepted Bad xml format in html output from Web API Type-Enhancement Priority-Low
32 Done Documentation - How to output html extract fragement instead of text? Type-Other Priority-Medium
31 New Support HTML5 elements Type-Defect Priority-Medium
30 Invalid Outputs html instead of plain text for certain urls Type-Defect Priority-Medium
29 Fixed boilerpipe crash Type-Defect Priority-Medium
28 Invalid UTF characters are not handled correctly Type-Defect Priority-Medium
27 Fixed Add 1.2.0 release to maven repository Type-Task Priority-Medium
26 Verified Tags missing in output html Type-Defect Priority-Medium
25 WontFix Feature Request - api to return character offsets of non-boilerplate text Type-Enhancement Priority-Medium
24 Verified Boilepipe fails (but not web api edition) Type-Defect Priority-Medium
23 WontFix Encoding problem (input is interpreted as Latin-1) Type-Defect Priority-Medium
22 Fixed Page not being parsed correctly <li> the issue. Type-Defect Priority-Medium
21 Invalid Included nekhtml 1.9.9 mising LostText class Type-Defect Priority-Medium
20 Accepted Featurerequest: Run boilerpipe as a command line tool Type-Enhancement Priority-Medium
19 New Code for Google app-engine? Type-Enhancement Priority-Low
18 Done Description of different extractors? Type-Other Priority-Low
17 Fixed Precursory header tags missing Type-Defect Priority-Medium
16 New Better support for non-english pages Type-Defect Priority-Medium
15 New Title empty when parsing with TagSoup Type-Defect Priority-Medium
14 Fixed boilerpipe-web: Charset encoding problem Type-Defect Priority-Medium
13 Accepted Missing Maven dependency Type-Enhancement Priority-Low OpSys-All
12 Fixed Possible improvement to TerminatingBlocksFinder Type-Enhancement Priority-Medium OpSys-All Performance
11 WontFix Unconventional operator used for boolean logic Type-Other Priority-Low
10 Fixed Links on boilerpipe homepage are broken Type-Defect Priority-Low OpSys-All
9 Fixed Add clone method to TextBlock Type-Enhancement Priority-Low
8 Done Can you fix or promote the bug fix of NekoHTML (#2909310) ? Type-Defect Priority-Medium OpSys-All
7 Invalid Exclude Script tags Type-Defect Priority-Medium
6 WontFix 2 to 3 mins taken for a some URLs Type-Defect Priority-Low
5 Fixed INSTALL.txt in src directory Type-Enhancement Priority-Medium
4 Fixed Ability to keep inline HTML in extracted content Type-Enhancement Priority-Low OpSys-All
3 WontFix IDN <-> ACE Domain Names Type-Defect Priority-Low OpSys-All
2 Verified Encoding problem? – Strange garbage introduced Type-Defect Priority-Medium OpSys-All
1 Verified DefaultExtractor.INSTANCE.getText(html): Removes leading special charcater when it is coded in ascii Type-Defect Priority-Medium OpSys-All