Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with the Media-RSS implementation #195

Open
azmeuk opened this issue Nov 16, 2019 · 5 comments
Open

Issues with the Media-RSS implementation #195

azmeuk opened this issue Nov 16, 2019 · 5 comments

Comments

@azmeuk
Copy link
Contributor

azmeuk commented Nov 16, 2019

Hello,
I noticed some issues with the media-rss implementation. Before trying to fix them, I would like to discuss it here.

media:group is ignored

According to the Media-RSS specification, the <media:group> tag is used to group several links/representation for a same media. However, my understanding is that feedparser just ignores this tag, and consider every <media:content> as a new media.

It allows grouping of media:content elements that are effectively the same content, yet different representations. For instance: the same song recorded in both the WAV and MP3 format. It's an optional element that must only be used for this purpose.

def _start_media_group(self, attrs_d):
# don't do anything, but don't break the enclosed tags either
pass

def _start_media_content(self, attrs_d):
context = self._get_context()
context.setdefault('media_content', [])
context['media_content'].append(attrs_d)

The description is set on the feed entry

The <media:description> tag belongs to the media, but feedparser updates the feed entry description.

def _start_media_description(self, attrs_d):
self._start_description(attrs_d)
def _end_media_description(self):
self._end_description()

Some tags are missing

For instance, the <media:subtitle> tag is not handled by feedparser.

Attributes are ignored

When tags are handled, a lot of the attributes in the Media-RSS specification are just ignored. For instance, <media:description> can either be plain text or html but feedreader does not make a difference.

So...

I would like to tackle this issues, but there could be some backward compatibility problems. How can I manage this? I believe Media-RSS is not much used, and the simpler option for me is just to break the compatibility so feedparser can correctly respect the specification.
What do you think?

@buhtz
Copy link

buhtz commented Dec 11, 2019

Could you please give us a short description about what MediaRSS is for. Maybe a real use case would improve the understanding.

@azmeuk
Copy link
Contributor Author

azmeuk commented Dec 11, 2019

Of course. Media-RSS is used to describe medias, such as audio or video files, and their metadata (thumbnails, description, number of views/listening, rating, links to read the media in different format etc.)

It is used in every youtube feeds (example) or peertube feeds (example though support should improve in an upcoming version).

@chaimae26
Copy link

I have the same issue , did you solve it?

@azmeuk
Copy link
Contributor Author

azmeuk commented Jan 9, 2020

Actually this would take some time to fix. I am willing to do a patch, but I would like to be sure that it will merged in the end before I start.

@kurtmckee What do you think?

@o-felixz
Copy link

This is something we are very interested in as well, especially when it comes to children in media:content, such as media:title (i.e. associating e.g. image titles with the images themselves).

I have started work on a patch but the changes are breaking at this time (see example below).

Main changes:

  1. media:group (not part of below example) and media:content are now containers as expected. media:group may contain media:contents.
  2. media:{x} now generates media_{x} keys instead of {x} keys. The keys previously known as media_{x} are now known as media_{x}_details (this is mainly to make tags distinguishable from attributes of the parent media:{x})
  3. media:title is no longer used as a fallback for a missing title (consequence of 2. above. Fixable but probably violating expectations?)

Any thoughts on these changes and how they affect the parsed data?

@azmeuk Is this in line with what you had in mind or were you planning on something different?

@kurtmckee Is this in line with the project as a whole?


Input file
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"
     xmlns:dcterms="http://purl.org/dc/terms/">

  <channel>
    <title>Music Videos 101</title>
    <link>http://www.foo.com</link>
    <description>Discussions of great videos</description>
    <item>
      <title>The latest video from an artist</title>
      <link>http://www.foo.com/item1.htm</link>
      <media:content url="http://www.foo.com/movie.mov" fileSize="12216320" type="video/quicktime" expression="full">
        <media:player url="http://www.foo.com/player?id=1111" height="200" width="400" />
        <media:hash algo="md5">dfdec888b72151965a34b4b59031290a</media:hash>
        <media:credit role="producer">producer's name</media:credit>
        <media:credit role="artist">artist's name</media:credit>
        <media:category scheme="http://blah.com/scheme">
          music/artistname/album/song
        </media:category>
        <media:text type="plain">
          Oh, say, can you see, by the dawn's early light
        </media:text>
        <media:rating>nonadult</media:rating>
        <dcterms:valid>
          start=2002-10-13T09:00+01:00;
          end=2002-10-17T17:00+01:00;
          scheme=W3C-DTF
        </dcterms:valid>
      </media:content>
    </item>
  </channel>
</rss>
Parsed data WITHOUT changes
[
  {
    "title": "The latest video from an artist",
    "title_detail": {
      "type": "text/plain",
      "language": null,
      "base": "",
      "value": "The latest video from an artist"
    },
    "links": [
      {
        "rel": "alternate",
        "type": "text/html",
        "href": "http://www.foo.com/item1.htm"
      }
    ],
    "link": "http://www.foo.com/item1.htm",
    "media_content": [
      {
        "url": "http://www.foo.com/movie.mov",
        "filesize": "12216320",
        "type": "video/quicktime",
        "expression": "full"
      }
    ],
    "media_player": {
      "url": "http://www.foo.com/player?id=1111",
      "height": "200",
      "width": "400",
      "content": ""
    },
    "media_hash": {
      "algo": "md5"
    },
    "media_credit": [
      {
        "role": "producer",
        "content": "producer's name"
      },
      {
        "role": "artist",
        "content": "artist's name"
      }
    ],
    "credit": "artist's name",
    "tags": [
      {
        "term": "music/artistname/album/song",
        "scheme": "http://blah.com/scheme",
        "label": null
      }
    ],
    "media_text": {
      "type": "plain"
    },
    "media_rating": {
      "content": "nonadult"
    },
    "rating": "nonadult",
    "validity": "start=2002-10-13T09:00+01:00;\n          end=2002-10-17T17:00+01:00;\n          scheme=W3C-DTF",
    "validity_start": "2002-10-13T09:00+01:00",
    "validity_start_parsed": [
      2002,
      10,
      13,
      8,
      0,
      0,
      6,
      286,
      0
    ]
  }
]
Parsed data WITH changes
[
  {
    "title": "The latest video from an artist",
    "title_detail": {
      "type": "text/plain",
      "language": null,
      "base": "",
      "value": "The latest video from an artist"
    },
    "links": [
      {
        "rel": "alternate",
        "type": "text/html",
        "href": "http://www.foo.com/item1.htm"
      }
    ],
    "link": "http://www.foo.com/item1.htm",
    "media_content": [
      {
        "url": "http://www.foo.com/movie.mov",
        "filesize": "12216320",
        "type": "video/quicktime",
        "expression": "full",
        "media_player": {
          "url": "http://www.foo.com/player?id=1111",
          "height": "200",
          "width": "400",
          "content": ""
        },
        "media_hash": {
          "algo": "md5"
        },
        "media_credit_details": [
          {
            "role": "producer",
            "content": "producer's name"
          },
          {
            "role": "artist",
            "content": "artist's name"
          }
        ],
        "media_credit": "artist's name",
        "tags": [
          {
            "term": "music/artistname/album/song",
            "scheme": "http://blah.com/scheme",
            "label": null
          }
        ],
        "media_text": {
          "type": "plain"
        },
        "media_rating_details": {
          "content": "nonadult"
        },
        "media_rating": "nonadult",
        "validity": "start=2002-10-13T09:00+01:00;\n          end=2002-10-17T17:00+01:00;\n          scheme=W3C-DTF",
        "validity_start": "2002-10-13T09:00+01:00",
        "validity_start_parsed": [
          2002,
          10,
          13,
          8,
          0,
          0,
          6,
          286,
          0
        ]
      }
    ]
  }
]
Output diff
...
    "media_content": [
      {
        "url": "http://www.foo.com/movie.mov",
        "filesize": "12216320",
        "type": "video/quicktime",
-        "expression": "full"
-      }
-    ],
+        "expression": "full",
        "media_player": {
          "url": "http://www.foo.com/player?id=1111",
          "height": "200",
...
-    "media_credit": [
+       "media_credit_details": [
          {
            "role": "producer",
            "content": "producer's name"
          },
          {
            "role": "artist",
            "content": "artist's name"
          }
        ],
-    "credit": "artist's name",
+        "media_credit": "artist's name",
...
        "media_text": {
          "type": "plain"
        },
-    "media_rating": {
+        "media_rating_details": {
          "content": "nonadult"
        },
-    "rating": "nonadult",
+        "media_rating": "nonadult",
        "validity": "start=2002-10-13T09:00+01:00;\n          end=2002-10-17T17:00+01:00;\n          scheme=W3C-DTF",
...
+  }
+]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants