Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop "sgmllib3k"? #328

Closed
buhtz opened this issue Jan 23, 2023 · 4 comments
Closed

Drop "sgmllib3k"? #328

buhtz opened this issue Jan 23, 2023 · 4 comments

Comments

@buhtz
Copy link

buhtz commented Jan 23, 2023

sgmllib3k = "^1.0.0"

Hi Kurt,
on my research in your repo I would say that you droped "sgmllib3k" support long ago. So this dependency in pyproject.toml can be removed, too?

@kurtmckee
Copy link
Owner

kurtmckee commented Jan 23, 2023

It's still needed for parsing corrupt XML:

import sgmllib # type: ignore[import]

I've been working on migrating to lxml and Python's builtin html.parser in a very similar XML parsing project, kurtmckee/listparser. I'm hoping that my experience with that migration can translate to an update to feedparser's XML parsing, too.

@buhtz
Copy link
Author

buhtz commented Jan 23, 2023

Good that I asked first.

I used "searched in that repo" (by GitHub) but this piece of code wasn't shown to me.

@AndreasEnge
Copy link

May I kindly ask to keep this bug report open until the problem is fixed? I am working on the feedparser package in the GNU Guix distribution, where building sgmllib3k currently fails; my impression is that it is incompatible with Python 3.10, as this happens during the check phase:

FAIL: test_declaration_junk_chars (test_sgmllib.SGMLParserTestCase)

Traceback (most recent call last):
File "/tmp/guix-build-python-sgmllib3k-1.0.0-1.7999646.drv-0/source/test_sgmllib.py", line 310, in test_declaration_junk_chars
self.check_parse_error("")
File "/tmp/guix-build-python-sgmllib3k-1.0.0-1.7999646.drv-0/source/test_sgmllib.py", line 127, in check_parse_error
parser.feed(source)
File "/tmp/guix-build-python-sgmllib3k-1.0.0-1.7999646.drv-0/source/sgmllib.py", line 98, in feed
self.goahead(0)
File "/tmp/guix-build-python-sgmllib3k-1.0.0-1.7999646.drv-0/source/sgmllib.py", line 168, in goahead
k = self.parse_declaration(i)
File "/gnu/store/i0d555a5fd7isi606aqqmbp5zgy9jh6p-python-3.10.7/lib/python3.10/_markupbase.py", line 134, in parse_declaration
raise AssertionError("unexpected %r char in declaration" % rawdata[j])
AssertionError: unexpected '$' char in declaration

So I have doubts that sgmllib3k (assuming we simply disabled its tests) would still parse corrupt XML...

Andreas

@kurtmckee
Copy link
Owner

It install and works with feedparser on Python 3.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants