Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to parse some url #215

Closed
richshaw2015 opened this issue Jun 28, 2020 · 1 comment
Closed

failed to parse some url #215

richshaw2015 opened this issue Jun 28, 2020 · 1 comment

Comments

@richshaw2015
Copy link

for example:
https://www.mihu.live/index.php/feed/

got:

{'feed': {},
 'entries': [],
 'bozo': 1,
 'headers': {'Server': 'nginx',
  'Date': 'Sun, 28 Jun 2020 03:05:07 GMT',
  'Content-Type': 'text/html',
  'Transfer-Encoding': 'chunked',
  'Connection': 'close',
  'Vary': 'Accept-Encoding',
  'Content-Encoding': 'gzip'},
 'href': 'https://www.mihu.live/index.php/feed/',
 'status': 403,
 'encoding': 'utf-8',
 'bozo_exception': xml.sax._exceptions.SAXParseException('syntax error'),
 'version': '',
 'namespaces': {}}
@kurtmckee
Copy link
Owner

It appears that feedparser's built-in HTTP client is not interacting with the mihu.live server in a nice way. The server is returning HTTP 403 and an HTML document, not an actual feed.

If you install the requests package and use that, mihu returns a valid feed. As I intend to scrap the custom HTTP client in feedparser and use an established package like requests, I am going to close this issue -- no time will be invested to fix feedparser's HTTP client code. Please try using this to make your code more robust:

import requests
import feedparser

doc = requests.get('https://www.mihu.live/index.php/feed/')
result = feedparser.parse(doc.text)
print(result.entries[1].title)  # Currently outputs "Oracle Live SQL——SQL在线练习平台"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants