New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debian's UDD feeds freak out feedparser #112
Comments
What behavior do you expect from feedparser in this case? Should the invalid entries be silently ignored? Should feedparser produce entries without a link? Maybe UDD should be fixed? That feed is not valid. |
it should:
I do this in feed2exec: if not item.get('id'):
item['id'] = item.get('title') it's just a dumb heuristic, but it works better than crashing on an arbitrary feed. at the very least, i would want feedparser to be robust (ie. not crash) on bad content. delivering a non-empty feed is extra... |
Hmm, that heuristic would work in this particular case but in the wild repeated entry titles are pretty common (e.g., http://www.pusheen.com/rss) so I wouldn't want it built into feedparser except on an opt-in basis. As a feedparser user I'd rather have no ID than a heuristic that I can't fix. My first inclination for a heuristic would have been to use the item date as a final fall-back, but that doesn't work for this feed either. :-/ So maybe skipping |
yep, i don't mind rolling my own heuristics here... i guess what i need here is for feedparser to ... er... not crash. :) |
@anarcat, are you still seeing this behavior? If so, I'll jump in on this and work to get feedparser to quit crashing. Re: GUID heuristics, feedparser won't be updated to inject GUID's but you're right, feedparser shouldn't be crashing!! =) |
i still get the same error than originally reported. should i send a PR to get the failing unit test in place? to reproduce, you simply need to do this:
and run the test suite. |
Perfect, I'll try to get this fixed.
…On May 7, 2018 1:36:10 PM UTC, anarcat ***@***.***> wrote:
i still get the same error than originally reported. should i send a PR
to get the failing unit test in place?
to reproduce, you simply need to do this:
```
wget -O tests/illformed/udd.xml
'https://udd.debian.org/dmd/?email1=anarcat%40debian.org&email2=&email3=&packages=&ignpackages=photofloat&nosponsor1=on&format=rss#todo'
```
and run the test suite.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#112 (comment)
|
FYI: There is also another problem with debian related feeds. Please open a bug report on for Debian against the |
My personal UDD todo list breaks feedparser. If you add the tests to the "illformed" directory, tox says:
the problem seems to be there is no
guid
field and an emptylink
field on some entries, which breaks (reasonable) expectations from feedparser...The text was updated successfully, but these errors were encountered: