ID |
Status |
Summary |
84
|
New |
How to debug the result?
Type-Defect
Priority-Medium
|
83
|
New |
Different result when using Web Api and the source api?
Type-Defect
Priority-Medium
|
82
|
New |
Unsupported content type: null
Type-Defect
Priority-Medium
|
81
|
New |
Boilerpipe is conflicting with CyberNeko library
Type-Defect
Priority-Medium
|
80
|
New |
Performance issues with UnicodeTokenizer
Type-Defect
Priority-Medium
|
79
|
New |
Missing ImageExtractor in downloabale 1.2 jar file
Type-Defect
Priority-Medium
|
78
|
New |
IllegalArgumentException for many web pages
Type-Defect
Priority-Medium
|
77
|
New |
Fail to extract main content on some page, get footnote instead
Type-Defect
Priority-Medium
|
76
|
New |
Incomplete extraction of article
Type-Defect
Priority-Medium
|
75
|
New |
its not working for a news site
Type-Defect
Priority-Medium
|
74
|
New |
Xerces for andorid jar file needed
Type-Defect
Priority-Medium
|
73
|
New |
Missing Maven 1.2.0
Type-Defect
Priority-Medium
|
72
|
New |
Extract article from non-english text
Type-Defect
Priority-Medium
|
71
|
New |
Limit the parsing depth of the html parsing to avoid out of memory situations
Type-Defect
Priority-Medium
|
70
|
New |
Server returned HTTP response code: 403 for URL (SOLVED) please use this codeline.
Type-Defect
Priority-Medium
|
69
|
New |
Incomplete extraction of text with special characters
Type-Defect
Priority-Medium
|
68
|
New |
How to use boilerpipe to get some text with a hyperlink from the web page?
Type-Defect
Priority-Medium
|
67
|
New |
Program does not terminate for badly formatted/syntactically incorrect HTML input
Type-Defect
Priority-Medium
|
66
|
New |
boilerpipe - issues detected
Type-Defect
Priority-Medium
|
65
|
New |
BoilerplateBlockFilter ignores labelToKeep
Type-Defect
Priority-Medium
|
64
|
New |
Never endning loop
Type-Defect
Priority-Medium
|
63
|
New |
Difference WebApi - Api
Type-Defect
Priority-Medium
|
62
|
New |
Hotpatched nekohtml classes cause library incompatibilities
Type-Defect
Priority-Medium
|
61
|
New |
ContentFusion can change the order of document text
Type-Defect
Priority-Medium
|
60
|
New |
Faulty XML encoding of characters in <script> tags in <head>
Type-Defect
Priority-Medium
|
59
|
New |
Runtime Error while using boilerpipe in android
Type-Defect
Priority-Medium
|
58
|
New |
Extract article HTML from given HTML source?
Type-Defect
Priority-Medium
|
57
|
New |
BoilerPipe for Android
Type-Defect
Priority-Medium
|
56
|
New |
Output as JSON
Type-Defect
Priority-Medium
|
55
|
New |
Can not parse NYtimes pages
Type-Defect
Priority-Medium
|
54
|
New |
Web api codes?
Type-Defect
Priority-Medium
|
53
|
New |
Incorrect characters in Extractor output
Type-Defect
Priority-Medium
|
52
|
New |
Please push 1.2 to maven central
Type-Defect
Priority-Medium
|
51
|
New |
No tag in svn for 1.2?
Type-Defect
Priority-Medium
|
50
|
Started |
StackOverflowError when page includes another <body> part in <noframes>
Type-Defect
Priority-Medium
OpSys-All
|
49
|
New |
Article Image
Type-Defect
Priority-Medium
|
48
|
New |
hybrid extractor?
Type-Defect
Priority-Medium
|
47
|
New |
Errors deploying to Android
Type-Defect
Priority-Medium
|
46
|
New |
Library does not produce same results as http://boilerpipe-web.appspot.com/
Type-Defect
Priority-Medium
|
45
|
Duplicate |
Ignore FORM tags in HTMLHighlighter
Type-Defect
Priority-Medium
|
44
|
New |
Ignore FORM tags in HTMLHighlighter
Type-Enhancement
Priority-Medium
|
43
|
New |
DocumentTitleMatchClassifier should include the « and • characters
Type-Defect
Priority-Medium
|
42
|
Duplicate |
Patch for /trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/filters/heuristics/DocumentTitleMatchClassifier.java
Type-Patch
|
41
|
Fixed |
Title detection: Treat non-breaking space as whitespace
Type-Enhancement
|
40
|
Duplicate |
Patch for /trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/sax/DefaultTagActionMap.java
Type-Patch
|
39
|
Duplicate |
Patch for /trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/sax/CommonTagActions.java
Type-Patch
|
38
|
Duplicate |
Patch for /trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/sax/BoilerpipeHTMLContentHandler.java
Type-Patch
|
37
|
Fixed |
timeout and fallback strategy for boilerpipe
Type-Defect
Priority-Medium
|
36
|
Fixed |
ImageExtractor doesn't detect alternative images for Object plugins
Type-Defect
Priority-Medium
|
35
|
WontFix |
word counting code does not account for & being special html symbol.
Type-Defect
Priority-Medium
|
34
|
Fixed |
Add 'getInstance' accessor for ImageExtractor
Type-Enhancement
Priority-Low
|
33
|
Accepted |
Bad xml format in html output from Web API
Type-Enhancement
Priority-Low
|
32
|
Done |
Documentation - How to output html extract fragement instead of text?
Type-Other
Priority-Medium
|
31
|
New |
Support HTML5 elements
Type-Defect
Priority-Medium
|
30
|
Invalid |
Outputs html instead of plain text for certain urls
Type-Defect
Priority-Medium
|
29
|
Fixed |
boilerpipe crash
Type-Defect
Priority-Medium
|
28
|
Invalid |
UTF characters are not handled correctly
Type-Defect
Priority-Medium
|
27
|
Fixed |
Add 1.2.0 release to maven repository
Type-Task
Priority-Medium
|
26
|
Verified |
Tags missing in output html
Type-Defect
Priority-Medium
|
25
|
WontFix |
Feature Request - api to return character offsets of non-boilerplate text
Type-Enhancement
Priority-Medium
|
24
|
Verified |
Boilepipe fails (but not web api edition)
Type-Defect
Priority-Medium
|
23
|
WontFix |
Encoding problem (input is interpreted as Latin-1)
Type-Defect
Priority-Medium
|
22
|
Fixed |
Page not being parsed correctly <li> the issue.
Type-Defect
Priority-Medium
|
21
|
Invalid |
Included nekhtml 1.9.9 mising LostText class
Type-Defect
Priority-Medium
|
20
|
Accepted |
Featurerequest: Run boilerpipe as a command line tool
Type-Enhancement
Priority-Medium
|
19
|
New |
Code for Google app-engine?
Type-Enhancement
Priority-Low
|
18
|
Done |
Description of different extractors?
Type-Other
Priority-Low
|
17
|
Fixed |
Precursory header tags missing
Type-Defect
Priority-Medium
|
16
|
New |
Better support for non-english pages
Type-Defect
Priority-Medium
|
15
|
New |
Title empty when parsing with TagSoup
Type-Defect
Priority-Medium
|
14
|
Fixed |
boilerpipe-web: Charset encoding problem
Type-Defect
Priority-Medium
|
13
|
Accepted |
Missing Maven dependency
Type-Enhancement
Priority-Low
OpSys-All
|
12
|
Fixed |
Possible improvement to TerminatingBlocksFinder
Type-Enhancement
Priority-Medium
OpSys-All
Performance
|
11
|
WontFix |
Unconventional operator used for boolean logic
Type-Other
Priority-Low
|
10
|
Fixed |
Links on boilerpipe homepage are broken
Type-Defect
Priority-Low
OpSys-All
|
9
|
Fixed |
Add clone method to TextBlock
Type-Enhancement
Priority-Low
|
8
|
Done |
Can you fix or promote the bug fix of NekoHTML (#2909310) ?
Type-Defect
Priority-Medium
OpSys-All
|
7
|
Invalid |
Exclude Script tags
Type-Defect
Priority-Medium
|
6
|
WontFix |
2 to 3 mins taken for a some URLs
Type-Defect
Priority-Low
|
5
|
Fixed |
INSTALL.txt in src directory
Type-Enhancement
Priority-Medium
|
4
|
Fixed |
Ability to keep inline HTML in extracted content
Type-Enhancement
Priority-Low
OpSys-All
|
3
|
WontFix |
IDN <-> ACE Domain Names
Type-Defect
Priority-Low
OpSys-All
|
2
|
Verified |
Encoding problem? – Strange garbage introduced
Type-Defect
Priority-Medium
OpSys-All
|
1
|
Verified |
DefaultExtractor.INSTANCE.getText(html): Removes leading special charcater when it is coded in ascii
Type-Defect
Priority-Medium
OpSys-All
|