My favorites | Sign in
Project Logo
                
New issue | Search
for
| Advanced search | Search tips
Issue 2: error when parsing this feed
3 people starred this issue and may be notified of changes. Back to list
Status:  Fixed
Owner:  ----
Closed:  Yesterday
Type-Defect
Priority-Medium


Sign in to add a comment
 
Reported by yura.smolsky, Apr 19, 2007
What steps will reproduce the problem?
1. try to parse this feed from command line with feedparser. check attached 
file

I expect to see parsed feed, but script produces error.

I took feedparser.py from SVN latests verion. I run it with python 2.4.3 on 
WindowsXP.


content.xml
56.6 KB   Download
Comment 1 by Florian.Steinel, Dec 04, 2007
feed error with traceback:

>>> d = feedparser.parse("http://ftp.gnome.org/pub/GNOME/LATEST.xml")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "feedparser.py", line 2623, in parse
    feedparser.feed(data)
  File "feedparser.py", line 1441, in feed
    sgmllib.SGMLParser.feed(self, data)
  File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
    self.goahead(0)
  File "/usr/lib/python2.4/sgmllib.py", line 134, in goahead
    k = self.parse_endtag(i)
  File "/usr/lib/python2.4/sgmllib.py", line 296, in parse_endtag
    self.finish_endtag(tag)
  File "/usr/lib/python2.4/sgmllib.py", line 336, in finish_endtag
    self.unknown_endtag(tag)
  File "feedparser.py", line 476, in unknown_endtag
    method()
  File "feedparser.py", line 1217, in _end_description
    value = self.popContent('description')
  File "feedparser.py", line 700, in popContent
    value = self.pop(tag)
  File "feedparser.py", line 641, in pop
    output = _resolveRelativeURIs(output, self.baseuri, self.encoding)
  File "feedparser.py", line 1594, in _resolveRelativeURIs
    p.feed(htmlSource)
  File "feedparser.py", line 1441, in feed
    sgmllib.SGMLParser.feed(self, data)
  File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
    self.goahead(0)
  File "/usr/lib/python2.4/sgmllib.py", line 129, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.4/sgmllib.py", line 283, in parse_starttag
    self.finish_starttag(tag, attrs)
  File "/usr/lib/python2.4/sgmllib.py", line 314, in finish_starttag
    self.unknown_starttag(tag, attrs)
  File "feedparser.py", line 1589, in unknown_starttag
    _BaseHTMLProcessor.unknown_starttag(self, tag, attrs)
  File "feedparser.py", line 1460, in unknown_starttag
    strattrs = u''.join([u' %s="%s"' % (key, value) for key, value in
uattrs]).encode(self.encoding)
LookupError: unknown encoding: 

LATEST.xml
34.8 KB   Download
Comment 2 by danielle.davout, Dec 10, 2007
I have no problem with this URL with Python 2.4.4 and feedparser
__version__ = "4.1"# + "$Revision: 1.92 $"[11:15] + "-cvs on Debian etch
Comment 3 by Florian.Steinel, Dec 11, 2007
(In reply to comment #2)
Danielle, the http://ftp.gnome.org/pub/GNOME/LATEST.xml feed is OK now, but sometimes
not.
You have to use the attached LATEST.xml .
Comment 4 by vaidhy, Apr 26, 2008
The problem is a bug with python's sgmllib. Check out
http://mail.python.org/pipermail/python-bugs-list/2007-February/037082.html for more
details. I have attached a patch for feedparser.py to resolve this issue. 

Please apply the attached patch to feedparser to solve the problem

feedparser.patch
748 bytes   Download
Comment 5 by adewale, Yesterday (26 hours ago)
This bug has been fixed as of Python 2.5.2 on Ubuntu.

The following code works:
>>> import feedparser
>>> f = feedparser.parse("http://feedparser.googlecode.com/issues/attachment?
aid=-7827582398651082781&name=LATEST.xml")
>>> f.feed
{'lastbuilddate': u'Tue, 04 Dec 2007 08:12:41 +0000', 'publisher': 
u'webmaster@gnome.org', 'subtitle': u"A list of recent files released on GNOME's FTP 
site", 'links': [{'href': u'http://ftp.gnome.org/pub/GNOME/', 'type': 'text/html', 'rel': 
'alternate'}], 'title': u'GNOME FTP Releases', 'subtitle_detail': {'base': 
u'http://feedparser.googlecode.com/issues/attachment?aid=-
7827582398651082781&name=LATEST.xml', 'type': 'text/html', 'value': u"A list of 
recent files released on GNOME's FTP site", 'language': None}, 'title_detail': {'base': 
u'http://feedparser.googlecode.com/issues/attachment?aid=-
7827582398651082781&name=LATEST.xml', 'type': 'text/plain', 'value': u'GNOME 
FTP Releases', 'language': None}, 'link': u'http://ftp.gnome.org/pub/GNOME/', 
'publisher_detail': {'email': u'webmaster@gnome.org'}}

I'm marking this closed since the problem is in the Python libraries and they've been 
fixed.
Status: Fixed
Sign in to add a comment

Hosted by Google Code