My favorites | Sign in
Project Logo
                
Details: Show all Hide all

Today

  • 14 hours ago
    issue 198 (<image> not parsed for http://www.spacetelescope.org/rss/vod...) reported by th.perl   -   What steps will reproduce the problem? >>> import feedparser >>> f = feedparser.parse('http://www.spacetelescope.org/rss/vodcast.xml') >>> f.feed.image {'href': None} >>> print f.feed.image.href What is the expected output? http://www.spacetelescope.org/design/hcsquarelogo144.jpg What do you see instead? None What version of the product are you using? python-feedparser, Ubuntu package version 4.1-14 On what operating system? Ubuntu Linux 9.10 The XML file is attached in case the original feed changes.
    What steps will reproduce the problem? >>> import feedparser >>> f = feedparser.parse('http://www.spacetelescope.org/rss/vodcast.xml') >>> f.feed.image {'href': None} >>> print f.feed.image.href What is the expected output? http://www.spacetelescope.org/design/hcsquarelogo144.jpg What do you see instead? None What version of the product are you using? python-feedparser, Ubuntu package version 4.1-14 On what operating system? Ubuntu Linux 9.10 The XML file is attached in case the original feed changes.

Yesterday

  • 34 hours ago
    issue 57 (.modified attribute raises AttributeError) commented on by corbinbs   -   Here's a patch that sets modified to None if the Last-Modified header was not supplied. This allows me to run the first example and avoid the AttributeError: Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> d = feedparser.parse('http://news.ycombinator.com/rss') >>> d.modified >>> d.modified is None True >>>
    Here's a patch that sets modified to None if the Last-Modified header was not supplied. This allows me to run the first example and avoid the AttributeError: Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> d = feedparser.parse('http://news.ycombinator.com/rss') >>> d.modified >>> d.modified is None True >>>

Last 7 days

  • Jan 05, 2010
    issue 197 (segmentation fault during feed parsing) reported by nikolay.panov   -   I have tried to parse a feed from http://google.dirson.com/rss.php: $ python Python 2.5.4 (r254:67916, Nov 19 2009, 19:46:21) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> feedparser.__version__ '4.2-pre-303-svn' >>> len(feedparser.parse('/tmp/dirson.xml')) [1] 16610 segmentation fault python It is very strange, but I see this issue only when lxml-2.2.4 and mechanize-0.1.11 are installed (same thing if installed httplib2). BTW, I have done strace on python interpreter and the last lines are the following: stat64("http://my.netscape.com/publish/formats/rss-0.91.dtd", 0xbfb4f844) = -1 ENOENT (No such file or directory) --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ Then I have removed DTD declaration from the feed and got the following: $ python Python 2.5.4 (r254:67916, Nov 19 2009, 19:46:21) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> feedparser.__version__ '4.2-pre-303-svn' >>> len(feedparser.parse('/tmp/dirson-nodtd.xml')) 6 I have also tested this feed with http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fgoogle.dirson.com%2Frss.php and got the following message: "The use of this DTD has been deprecated by Netscape". So, it seems that something wrong with deprecated DTD parsing.
    I have tried to parse a feed from http://google.dirson.com/rss.php: $ python Python 2.5.4 (r254:67916, Nov 19 2009, 19:46:21) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> feedparser.__version__ '4.2-pre-303-svn' >>> len(feedparser.parse('/tmp/dirson.xml')) [1] 16610 segmentation fault python It is very strange, but I see this issue only when lxml-2.2.4 and mechanize-0.1.11 are installed (same thing if installed httplib2). BTW, I have done strace on python interpreter and the last lines are the following: stat64("http://my.netscape.com/publish/formats/rss-0.91.dtd", 0xbfb4f844) = -1 ENOENT (No such file or directory) --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ Then I have removed DTD declaration from the feed and got the following: $ python Python 2.5.4 (r254:67916, Nov 19 2009, 19:46:21) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> feedparser.__version__ '4.2-pre-303-svn' >>> len(feedparser.parse('/tmp/dirson-nodtd.xml')) 6 I have also tested this feed with http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fgoogle.dirson.com%2Frss.php and got the following message: "The use of this DTD has been deprecated by Netscape". So, it seems that something wrong with deprecated DTD parsing.
  • Jan 05, 2010
    issue 137 (UnicodeDecodeError on parse('http://bloomington.craigslist.o...) commented on by joe.boyle   -   I'm also using Feedparser 4.1 to parse a craigslist feed and am experiencing this same issue. Does this bug tracker support a "will not fix" or "cannot reproduce" states - "fixed" is misleading.
    I'm also using Feedparser 4.1 to parse a craigslist feed and am experiencing this same issue. Does this bug tracker support a "will not fix" or "cannot reproduce" states - "fixed" is misleading.
  • Jan 01, 2010
    issue 196 (Error parsing this feed with non-English characters) commented on by wal...@ninua.com   -   Right. The latest download version breaks as the stack trace I submitted earlier shows. But the latest SVN version is much better. It passes the invalid character along to the client program to deal with it.
    Right. The latest download version breaks as the stack trace I submitted earlier shows. But the latest SVN version is much better. It passes the invalid character along to the client program to deal with it.
  • Jan 01, 2010
    issue 196 (Error parsing this feed with non-English characters) Status changed by adewale   -   I've checked and we don't mark the feed as bozo. However since all the data is there I'm going to mark this as fixed.
    Status: Fixed
    I've checked and we don't mark the feed as bozo. However since all the data is there I'm going to mark this as fixed.
    Status: Fixed
  • Jan 01, 2010
    issue 151 (UTF-8 encoding fails) Status changed by adewale   -   Marking as fixed since the error can't be reproduced using Feedparser 4.1.
    Status: Fixed
    Marking as fixed since the error can't be reproduced using Feedparser 4.1.
    Status: Fixed
  • Jan 01, 2010
    issue 137 (UnicodeDecodeError on parse('http://bloomington.craigslist.o...) Status changed by adewale   -   I'm marking this as fixed since CraigsList seem to have fixed their feed. I can't reproduce the Unicode error using Feedparser 4.1 or the latest version of the code.
    Status: Fixed
    I'm marking this as fixed since CraigsList seem to have fixed their feed. I can't reproduce the Unicode error using Feedparser 4.1 or the latest version of the code.
    Status: Fixed
  • Jan 01, 2010
    issue 121 (Title fails to parse correctly in utf8/Arabic feeds) Status changed by adewale   -   >>> f = feedparser.parse("http://www.razanghazzawi.com/feed/") >>> for e in f.entries: print e.title Generates the correct titles from that feed as of today.
    Status: Fixed
    >>> f = feedparser.parse("http://www.razanghazzawi.com/feed/") >>> for e in f.entries: print e.title Generates the correct titles from that feed as of today.
    Status: Fixed
  • Jan 01, 2010
    issue 114 (UnicodeEncodeError) Status changed by adewale   -   I'm marking this bug as invalid since it seems to be a bug in the underlying Python library. If you have a feed that reliably reproduces the problem then I'd be happy to re- open this bug.
    Status: Invalid
    I'm marking this bug as invalid since it seems to be a bug in the underlying Python library. If you have a feed that reliably reproduces the problem then I'd be happy to re- open this bug.
    Status: Invalid
  • Jan 01, 2010
    issue 128 (UnicodeDecodeError when parsing http://www.projekt6.de/?feed...) commented on by djc.ocht...@gmail.com   -   Ping on this, since the project seems to be alive again.
    Ping on this, since the project seems to be alive again.

Last 30 days

  • Dec 31, 2009
    issue 73 (parse_declaration returns incorrect result on partial <![CDA...) Status changed by adewale   -   Fixed in revision 303
    Status: Fixed
    Fixed in revision 303
    Status: Fixed
  • Dec 31, 2009
    r303 (Fixed infinite loop caused by incomplete CDATA block. This w...) committed by adewale   -   Fixed infinite loop caused by incomplete CDATA block. This was the same bug as: http://code.google.com/p/feedparser/issues/detail?id=73
    Fixed infinite loop caused by incomplete CDATA block. This was the same bug as: http://code.google.com/p/feedparser/issues/detail?id=73
  • Dec 31, 2009
    issue 143 (feedparser goes in to indefinate loop for reading bad xml fi...) changed by adewale   -  
    Status: Duplicate
    Status: Duplicate
  • Dec 31, 2009
    issue 73 (parse_declaration returns incorrect result on partial <![CDA...) commented on by adewale   -   Issue 143 has been merged into this issue.
    Issue 143 has been merged into this issue.
  • Dec 30, 2009
    issue 196 (Error parsing this feed with non-English characters) commented on by adewale   -   Did it mark the feed as a bozo? If so, we can close this bug.
    Did it mark the feed as a bozo? If so, we can close this bug.
  • Dec 29, 2009
    issue 60 (Feedparser fails on some picasa feeds) Status changed by adewale   -  
    Status: Fixed
    Status: Fixed
  • Dec 29, 2009
    issue 60 (Feedparser fails on some picasa feeds) commented on by adewale   -   I'm marking this as fixed since the problem can no longer be reproduced. The current version (in Subversion) correctly renders all the titles in that feed. dannychai: If you have a picasa feed that reproduces the problem please re-open this bug and attach the feed. jonathan.ruta: I can't delete bugs. The best I can offer you is that the closing of this bug means people won't accidentally stumble across it.
    I'm marking this as fixed since the problem can no longer be reproduced. The current version (in Subversion) correctly renders all the titles in that feed. dannychai: If you have a picasa feed that reproduces the problem please re-open this bug and attach the feed. jonathan.ruta: I can't delete bugs. The best I can offer you is that the closing of this bug means people won't accidentally stumble across it.
  • Dec 28, 2009
    issue 188 (Problem with parsing Media RSS of Yahoo) commented on by prashantchaudharry   -   Thanks :)
    Thanks :)
  • Dec 28, 2009
    issue 143 (feedparser goes in to indefinate loop for reading bad xml fi...) commented on by nikolay.panov   -   This big is still reproducing on my system: Python 2.5.2 (r252:60911, Jan 4 2009, 17:40:26) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> feedparser.__version__ '4.2-pre-301-svn' >>> feedparser.parse('/tmp/test.xml') ^\[2] 12525 quit python
    This big is still reproducing on my system: Python 2.5.2 (r252:60911, Jan 4 2009, 17:40:26) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> feedparser.__version__ '4.2-pre-301-svn' >>> feedparser.parse('/tmp/test.xml') ^\[2] 12525 quit python
  • Dec 27, 2009
    issue 175 (simple typo in specification of "valid W3CDTF (numeric timez...) Status changed by adewale   -   feedparser._parse_date_w3dtf('2003-12-31T10:14:55-08:00') returns (2003, 12, 31, 18, 14, 55, 2, 365, 0) because that timezone is 8 hours _behind_ So when we _add_ the 8 hours to 10 we get 18. The bug is incorrect. See here: http://www.w3.org/TR/NOTE-datetime for more information.
    Status: Invalid
    feedparser._parse_date_w3dtf('2003-12-31T10:14:55-08:00') returns (2003, 12, 31, 18, 14, 55, 2, 365, 0) because that timezone is 8 hours _behind_ So when we _add_ the 8 hours to 10 we get 18. The bug is incorrect. See here: http://www.w3.org/TR/NOTE-datetime for more information.
    Status: Invalid
  • Dec 27, 2009
    r302 (Reverting revision 300. The original documentation was corre...) committed by adewale   -   Reverting revision 300. The original documentation was correct and the bug was wrong. feedparser._parse_date_w3dtf('2003-12-31T10:14:55-08:00') returns (2003, 12, 31, 18, 14, 55, 2, 365, 0) because that timezone is 8 hours _behind_ So when we _add_ the 8 hours to 10 we get 18
    Reverting revision 300. The original documentation was correct and the bug was wrong. feedparser._parse_date_w3dtf('2003-12-31T10:14:55-08:00') returns (2003, 12, 31, 18, 14, 55, 2, 365, 0) because that timezone is 8 hours _behind_ So when we _add_ the 8 hours to 10 we get 18
  • Dec 27, 2009
    issue 14 ([ 1598443 ] Failure to support ISO8601/W3CDTF fractional sec...) Status changed by adewale   -   Applied patch in revision 301
    Status: Fixed
    Applied patch in revision 301
    Status: Fixed
  • Dec 27, 2009
    r301 (Changed handling of fractional seconds to comply with the sp...) committed by adewale   -   Changed handling of fractional seconds to comply with the spec: http://www.w3.org/TR/NOTE-datetime which says there can be _1_ or more digits rather than the 0 or more that the code was dealing with.
    Changed handling of fractional seconds to comply with the spec: http://www.w3.org/TR/NOTE-datetime which says there can be _1_ or more digits rather than the 0 or more that the code was dealing with.
  • Dec 27, 2009
    issue 105 (feedparsertest fails under python 2.5.2) changed by adewale   -  
    Status: Duplicate
    Status: Duplicate
  • Dec 27, 2009
    issue 90 (HTML character references are not correcly converted by sgml...) commented on by adewale   -   Issue 105 has been merged into this issue.
    Issue 105 has been merged into this issue.
  • Dec 27, 2009
    issue 168 (Value of tag 'id' is wrong) changed by adewale   -  
    Status: Duplicate
    Status: Duplicate
  • Dec 27, 2009
    issue 47 ([ 1440553 ] misparses core elements within extension element...) commented on by adewale   -   Issue 168 has been merged into this issue.
    Issue 168 has been merged into this issue.
  • Dec 27, 2009
    issue 175 (simple typo in specification of "valid W3CDTF (numeric timez...) Status changed by adewale   -   Fixed in revision 300
    Status: Fixed
    Fixed in revision 300
    Status: Fixed
  • Dec 27, 2009
    r300 (Minor correction to documentation to fix: http://code.google...) committed by adewale   -   Minor correction to documentation to fix: http://code.google.com/p/feedparser/issues/detail?id=175
    Minor correction to documentation to fix: http://code.google.com/p/feedparser/issues/detail?id=175
  • Dec 27, 2009
    issue 174 (Error while parsing example ) Status changed by adewale   -   The current version of the code in Subversion does not have this problem. I'm marking this fixed since the next release will be soon.
    Status: Fixed
    The current version of the code in Subversion does not have this problem. I'm marking this fixed since the next release will be soon.
    Status: Fixed
  • Dec 27, 2009
    issue 119 (Ignores multiple authors in Atom feeds) changed by adewale   -  
    Status: Duplicate
    Status: Duplicate
  • Dec 27, 2009
    issue 42 ([ 1458381 ] 4.1: No support for multiple authors) commented on by adewale   -   Issue 119 has been merged into this issue.
    Issue 119 has been merged into this issue.
  • Dec 27, 2009
    issue 178 (Truncated link attributes) Status changed by adewale   -  
    Status: Fixed
    Status: Fixed
  • Dec 27, 2009
    issue 188 (Problem with parsing Media RSS of Yahoo) Status changed by adewale   -   >>> import feedparser >>> news_rss_url = "http://rss.ent.yahoo.com/movies/thisweek.xml" >>> f = feedparser.parse(news_rss_url) >>> f.entries[0].media_thumbnail [{'url': u'http://l.yimg.com/eb/ymv/us/img/hv/photo/movie_pix/sony_pictures_classics/th e_white_ribbon/thewhiteribbon_smallposter-th.jpg', 'width': u'50', 'height': u'74'}] The above code does what you want with the current version of the codebase. We now preserve attributes in namespaced elements. Please file a separate bug about the lack of support for repeated namespaced elements
    Status: Fixed
    >>> import feedparser >>> news_rss_url = "http://rss.ent.yahoo.com/movies/thisweek.xml" >>> f = feedparser.parse(news_rss_url) >>> f.entries[0].media_thumbnail [{'url': u'http://l.yimg.com/eb/ymv/us/img/hv/photo/movie_pix/sony_pictures_classics/th e_white_ribbon/thewhiteribbon_smallposter-th.jpg', 'width': u'50', 'height': u'74'}] The above code does what you want with the current version of the codebase. We now preserve attributes in namespaced elements. Please file a separate bug about the lack of support for repeated namespaced elements
    Status: Fixed
  • Dec 27, 2009
    issue 143 (feedparser goes in to indefinate loop for reading bad xml fi...) Status changed by adewale   -   The current version of the codebase in Subversion gives this output for both of the above feeds: 'feed': {}, 'encoding': 'utf-8', 'bozo': 1, 'version': '', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('Document is empty\n',)} This bug only happen with Feedparser 4.1. I'm marking this bug as closed but please email the development list: http://groups.google.com/group/feedparser-dev if you can a repeatable regression test for this bug:.
    Status: Fixed
    The current version of the codebase in Subversion gives this output for both of the above feeds: 'feed': {}, 'encoding': 'utf-8', 'bozo': 1, 'version': '', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('Document is empty\n',)} This bug only happen with Feedparser 4.1. I'm marking this bug as closed but please email the development list: http://groups.google.com/group/feedparser-dev if you can a repeatable regression test for this bug:.
    Status: Fixed
  • Dec 27, 2009
    issue 142 (AttributeError thrown even if feed does have the attribute d...) Status changed by adewale   -   Marking as WontFix since I can't reproduce the bug with the current version of the codebase. If you get a feed that repeatedly reproduces the error then please re-open this issue and attach that file
    Status: WontFix
    Marking as WontFix since I can't reproduce the bug with the current version of the codebase. If you get a feed that repeatedly reproduces the error then please re-open this issue and attach that file
    Status: WontFix
  • Dec 27, 2009
    issue 127 (The <media:title> overrides the 'content' field) Status changed by adewale   -   This bug cannot be reproduced since the feed it points to has changed. Please re-open it if you have a test feed that still triggers the bug using the current version of the code
    Status: WontFix
    This bug cannot be reproduced since the feed it points to has changed. Please re-open it if you have a test feed that still triggers the bug using the current version of the code
    Status: WontFix
  • Dec 27, 2009
    issue 120 (Universal feed Parser strips of media from media:thumbnail) changed by adewale   -  
    Status: Duplicate
    Status: Duplicate
  • Dec 27, 2009
    issue 100 (Media RSS attributes not kept?) commented on by adewale   -   Issue 120 has been merged into this issue.
    Issue 120 has been merged into this issue.
  • Dec 27, 2009
    issue 76 (<media:title> tag misparsed as <title>) commented on by adewale   -   Issue 116 has been merged into this issue.
    Issue 116 has been merged into this issue.
  • Dec 27, 2009
    issue 116 (Issue 76 patch not in 4.1) changed by adewale   -   I'm merging this into issue 76 on the grounds that the patch from their is already in the repository and will be in the next release.
    Status: Duplicate
    I'm merging this into issue 76 on the grounds that the patch from their is already in the repository and will be in the next release.
    Status: Duplicate
  • Dec 27, 2009
    issue 101 (IndexError: pop from empty list) commented on by adewale   -   I can't reproduce your bug with the current codebase. Instead I get: >>> f = feedparser.parse("index-error-pop-empty-list.xml") >>> f {'feed': {}, 'encoding': 'utf-8', 'bozo': 1, 'version': '', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('Document is empty\n',)}
    I can't reproduce your bug with the current codebase. Instead I get: >>> f = feedparser.parse("index-error-pop-empty-list.xml") >>> f {'feed': {}, 'encoding': 'utf-8', 'bozo': 1, 'version': '', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('Document is empty\n',)}
  • Dec 27, 2009
    issue 90 (HTML character references are not correcly converted by sgml...) commented on by adewale   -   I'll commit the patch if someone can find a test feed that triggers the bug.
    I'll commit the patch if someone can find a test feed that triggers the bug.
  • Dec 27, 2009
    issue 59 (atom:summary isn't populated) Status changed by adewale   -  
    Status: Fixed
    Status: Fixed
  • Dec 27, 2009
    issue 31 ([ 1501902 ] Atom 1.0 link missing .txt) Status changed by adewale   -   Fixed in revision 299
    Status: Fixed
    Fixed in revision 299
    Status: Fixed
  • Dec 27, 2009
    r299 (Minor correction in RFC url to fix: http://code.google.com/p...) committed by adewale   -   Minor correction in RFC url to fix: http://code.google.com/p/feedparser/issues/detail?id=31
  • Dec 27, 2009
    issue 24 ([ 1546854 ] Support for media urls) Status changed by adewale   -   Both media:content and media:thumbnail are in the codebase. I'm closing this bug but please re-open it if you have a test feed that shows that url support isn't working.
    Status: Fixed
    Both media:content and media:thumbnail are in the codebase. I'm closing this bug but please re-open it if you have a test feed that shows that url support isn't working.
    Status: Fixed
  • Dec 27, 2009
    issue 23 ([ 1559875 ] Link parsing is buggy, produces garbage for RSS ...) Status changed by adewale   -   The current trunk works: >>> import feedparser >>> f = feedparser.parse("annotated-rss20.xml") >>> f.feed.links [{'href': u'http://example.org/', 'type': 'text/html', 'rel': 'alternate'}]
    Status: Fixed
    The current trunk works: >>> import feedparser >>> f = feedparser.parse("annotated-rss20.xml") >>> f.feed.links [{'href': u'http://example.org/', 'type': 'text/html', 'rel': 'alternate'}]
    Status: Fixed
  • Dec 27, 2009
    issue 8 ([ 1651355 ] Feed cannot be parsed) commented on by adewale   -   Can you try to reproduce this problem please. None of the feeds above trigger this bug in Python 2.5.2
    Can you try to reproduce this problem please. None of the feeds above trigger this bug in Python 2.5.2
 
Hosted by Google Code