Export to GitHub

mp4v2 - issue #98

MP4Read returns 0 on filename with a umlaut


Posted on May 1, 2011 by Massive Cat

What steps will reproduce the problem? 1. Create a file with a non-ascii character (a umlaut for example) 2. Call MP4Read to open the file

What is the expected output? What do you see instead? The call to MP4Read fails (returns 0 for the handle)

Please use labels and text to provide additional information. I suspect this is something to do with the way I'm calling it. I see the project is set to build Unicode, so that is the way I was passing the string in (I also tried converting to UTF-8 but no joy). My app is written in VB.Net and I'm using a standard string. Works fine as long as there aren't any non-ascii characters.

Any help would be appreciated,

Dan

Comment #1

Posted on May 2, 2011 by Quick Kangaroo

On windows, I expect the filename argument to be UTF-8 encoded. If that's not working, are there any log messages that help indicate what's going on?

Comment #2

Posted on May 2, 2011 by Swift Lion

Comment deleted

Comment #3

Posted on May 2, 2011 by Happy Lion

Looks like regression is in r453; let me see if I can get a fix in...

to reproduce, use mp4info with any file containing an umlaut on windows. not sure what the state of things is on *nix, but windows definitely does not work.

Comment #4

Posted on May 2, 2011 by Happy Lion

I see in r410 of the branch where the code was changed:

  • replace MultiByteToWideChar with custom code to remove differences across versions of windows and for improved error detection and illegal character handling

...could you explain what differences someone may see? At the moment, I'm in favor of reverting to the previous code, which is much more straight-forward (and works--I tested it).

Comment #5

Posted on May 2, 2011 by Quick Kangaroo

I'll write up more explanation soon...I'm a little squeezed for the next couple of days. As I look at this particular issue, I think the problem may be in the way mp4info processes its command line arguments. I'll look more closely fairly soon. I'd really like to keep the code that's there, but I'll justify why in more detail, also soon.

Comment #6

Posted on May 2, 2011 by Quick Kangaroo

From the Remarks section of:

http://msdn.microsoft.com/en-us/library/dd319072%28v=vs.85%29.aspx

Starting with Windows Vista, this function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function on valid UTF-8 strings will behave the same way as on earlier Windows operating systems.

In other words, on pre-Vista error handling of malformaed surrogate pairs isn't right. And since we didn't set MB_ERR_INVALID_CHARS, we don't learn about other kinds of invalid byte sequences. Reverting to the code that uses MultiByteToWideChar, but using MB_ERR_INVALID_CHARS is tempting, but even that has different behavior on different versions of windows. From the notes about that flag:

Starting with Windows Vista, the function does not drop illegal code points if the application does not set this flag.

Windows 2000 with SP4 and later, Windows XP: Fail if an invalid input character is encountered. If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION.

So, the point of the code I wrote is to detect and log invalid byte sequences in a consistent way across all versions of Windows.

Comment #7

Posted on May 2, 2011 by Massive Cat

But since neither the utility code in the project (mp4file, mp4info, ...) doesn't work with the new code, nor does the code I wrote in VB.Net (which used to work) there's been some change in the calling convention. I can easily debug into the MP4VB2 dll and see that the string that I'm passing in is single byte for all characters (see the attached image).

However, the a umlaut ([13]) shows as -28, since it's a signed character. Perhaps this is the problem? Seems to me UTF-8 shouldn't interpret the signed bit. Should these be declared as unsigned char, or is it something else.

I'll be glad to help debug this if someone can give me an idea of what to look for.

Dan

Attachments

Comment #8

Posted on May 3, 2011 by Quick Kangaroo

I'm not sure what version of File_win32.cpp you're using, but it's different than mine. r453 calls CreateFileW in StandardFileProvider::open like this:

_handle = CreateFileW( filename, access, share, NULL, crdisp, flags, NULL );

but the screenshot you attached shows:

_handle = CreateFileW( win32::Utf8ToWideChar( name ), access, share, NULL, crdisp, NULL );

What version of File_win32.cpp have you got?

Comment #9

Posted on May 3, 2011 by Massive Cat

I guess I hadn't sync'ed in a while. So I just sync'ed up and now when I build I get unresolved externals for mp4v2::impl::log.* (errorf, setVerbosity, ...) and mp4v2::platform::win32::Utf8ToWideChar. I see the definition of Utf8ToWideChar in platform_win32_impl.h, but I can't see where log is defined. Do I need some new define or something to get this to link?

Comment #10

Posted on May 3, 2011 by Quick Kangaroo

I don't see any mention of Utf8ToWideChar in platform_win32_impl.h (http://code.google.com/p/mp4v2/source/browse/trunk/libplatform/platform_win32_impl.h). Are there other files to sync as well?

Comment #11

Posted on May 3, 2011 by Massive Cat

I'm totally confused now. The file I had after sync'ing didn't match the one at the link. I see there's a subdir p in the link above. What URL should I be using to sync? I was using http://mp4v2.googlecode.com/svn/trunk, is that not the correct one? I've attached the platform_win32_impl.h I got when I sync'ed.

Confused,

Dan

Attachments

Comment #12

Posted on May 3, 2011 by Quick Kangaroo

I think the extra p in the URL is for browsing. What does "svn info" say when you run it in the root of your local repo?

For what it's worth, I tested mp4info with a file with an umlaut in the name (U+00E4) and it can't read the file. I invoked mp4info with CreateProcess so I'm confident I passed a properly UTF-8 encoded version of U+00E4 (which is 0xC3A4) as part of the filename. When I looked at argv inside mp4info, I see something else (0xE4, which in my case is followed by 0x2E -- the . separating name from extension). So, when E4 gets intepreted as the first byte of a 3 byte UTF-8 encoding, 2E should be a continuation character but isn't since only [0x80,0xBF] is valid.

I didn't revert to an older version of the tree to see what happens there, but the solution as I see it is to use (on windows only) the slightly painful combination of GetCommandLineW, CommandLineToArgvW, and then convert each argument from UTF-16 to UTF-8. When I did that, the proper UTF-8 makes it lower down and the file is opened fine.

My plan is to figure out a good way/place to put the code for retrieving the the UTF-8 encoding of the command line, and then call it from mp4info and the other tools. I'd love to hear confirmation from someone else that this is a good idea, and possibly even suggestions for how to structure the code.

Comment #13

Posted on May 4, 2011 by Massive Cat

First let me say that I'm a novice about SVN. I have only used it with MP4V2, and then only to sync the changes to the source tree. I use RapidSVN and the result from the info display is below.

FWIW, I don't really care about the utilities, and fixing them isn't the main problem. I suspect most people use the library, not the utilities. I need to figure out what the correct way is to call the program from my VB.Net code. I'll see if converting from UTF-16 to UTF-8 makes a difference.

But I'm really confused about the correct way to get the files from SVN. Before when I sync'ed, I got all the correct files, so I'm confused as to what has changed such that my files now don't match what's in the tree. Any hints on what I should try here would be most appreciated.

The results from RapidSVN info:

Name: URL: http://mp4v2.googlecode.com/svn/trunk/util Repository: http://mp4v2.googlecode.com/svn Repository UUID: 6e6572fa-98a6-11dd-ad9f-f77439c74b79 Revision: 461 Node Kind: directory Last Changed Author: kidjan Last Changed Rev: 461 Last Changed Date: 2011-04-26 19:42 GMT Text Last Updated: 1970-01-01 00:00 GMT Properties Last Updated: 1970-01-01 00:00 GMT Checksum:

Comment #14

Posted on May 4, 2011 by Massive Cat

If it helps, after finally getting the latest rev to build, I get an set of log statements when trying to open a file with an a umlaut:

mp4v2::platform::win32::Utf8ToFilename::Utf8DecodeChar: 0x74 is not a valid continuation character in a UTF-8 encoding mp4v2::platform::io::StandardFileProvider::open: CreateFileW(\?\C:\Downloads\ätest.m4v) failed (2) mp4v2::impl::MP4File::Open: open(C:\Downloads\ätest.m4v) failed (....\src\mp4file.cpp,395)

Comment #15

Posted on May 5, 2011 by Quick Kangaroo

This is basically the same error I get when I pass a UTF-8 encoded string to mp4info. I'd still like some feedback on my proposal to fix it (http://code.google.com/p/mp4v2/issues/detail?id=98#c12).

-DB

Comment #16

Posted on May 5, 2011 by Massive Cat

OK, I've figured out how to make my code work, but I'm not sure what it means with respect to the correct solution. The problem appears to be that it's not easy (or at least I don't understand how to) turn a unicode string into a utf-8 string. When I look at my string, it has 228 for the a umlaut, which is fine for unicode. But if I try to encode that string into utf-8 using encoding.utf8.getbytes and encoding.utf8.getstring, it turns the a umlaut into a question mark. What I did was use encoding.utf8.getbytes, then built the char array using asc(byte(i)) then turning that into a string. Using this, it works!

Maybe this is similar to what you were talking about dbyron? Now that I understand how to make this work, I'll do some more research about the "correct" way to accomplish this in .Net.

dbyron, you seem to be a windows guy, I'd like to talk to you about a performance issue on windows, that I have a fix for. Should I just open up a new issue (which I did some time back, I think before you got involved) or would you rather discuss it via email?

Dan

Comment #17

Posted on May 5, 2011 by Quick Kangaroo

I don't know any VB or .Net so I'm not sure I can be much help there. I'm not sure talking about something as a "unicode string" makes sense though. Any string is in a particular encoding, but as I understand it, there is no encoding called "unicode." Natively windows uses (at least at the Win32 API level), UTF-16, so maybe that's what you're starting with?

228 is 0xE4 and a with an umlaut is U+00E4 so it makes sense up to there.

When you say the umlaut turns into a question mark, what do you mean? What are the actual values of the bytes that come back? It sounds like you've got a solution, and it may be the correct one.

Sure, send me an email with other questions. I'll see what I can do.

Comment #18

Posted on May 5, 2011 by Massive Cat

I found the more correct solution, I was using type string in calling MP4Read/Modify, so I always had to turn it back into a string, which turned it back into UTF-16, which didn't work. The solution was to change the calling signature to byte() for the filename and use encoding.utf8.getbytes. It now works like a charm.

I'll send you (dbyron) and email later today or tomorrow to talk about the other issue, but I think (from my perspective at least) you can close this. Of course you'll still need to fix the utilities, but it sounds like you've got a solution for that.

Dan

Comment #19

Posted on May 5, 2011 by Massive Cat

dbyron, the link to your userid is broken. And what's your email address?

Dan

Comment #20

Posted on May 5, 2011 by Quick Kangaroo

http://code.google.com/u/dbyron0/ works for me, but only when I'm logged in. I'd rather not publish my email address here to protect from spam. That link has a captcha to at least make it a little harder to get to.

If that doesn't work, start another issue.

Thanks.

-DB

Comment #21

Posted on May 5, 2011 by Massive Cat

I'm logged in and when I click on that link I get a 404. I can click on mine or kidjan and they work.

You can just email me at temp@danhinsley.com and I'll get back to you with my regular email address.

Dan

Comment #22

Posted on May 5, 2011 by Massive Cat

I got your email, but when I tried to reply to it, it wasn't valid. Can you include your email address in the body of the text?

Comment #23

Posted on May 14, 2011 by Happy Lion

Ah sorry for the false alert, dbyron. I forgot that the command line utils don't handle unicode correctly; I tested within our application, that does call the API directly, and all is good. Thanks for following through though.

Dana, do you have C# interop definitions defined? if so, would you be willing to donate those to the library? and yes, when passing strings as utf8 in .NET, you have to take a few extra steps, but it sounds like you figured that out ;)

Comment #24

Posted on May 15, 2011 by Massive Cat

I use VB.Net, not C#. I've included the API Definitions I use in VB. Due to the way that arrays of strings and structures with strings are passed, I had to create a C++ DLL that sits between some of the API (like MP4GetItems, reading and setting chapters, ...) and my VB App. I could provide that as well if people wanted to access the library from VB.

Here are the definitions of the API's I call directly from my VB App.

Friend Declare Function MP4Read Lib "libMP4V2.dll" (ByVal filename As Byte(), ByVal verbosity As Integer) As IntPtr
Friend Declare Function MP4Modify Lib "libMP4V2.dll" (ByVal filename As Byte(), ByVal verbosity As Integer, ByVal flags As Integer) As IntPtr
Friend Declare Sub MP4TagsSetName Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetArtist Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetAlbum Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetReleaseDate Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetGenre Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetDescription Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetLongDescription Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetTVShow Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetTVNetwork Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetTVEpisodeID Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetTVSeason Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Integer)
Friend Declare Sub MP4TagsSetTVEpisode Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Integer)
Friend Declare Sub MP4TagsSetSortName Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetSortArtist Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetSortTVShow Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetCopyright Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetEncodingTool Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetKeywords Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetCategory Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetGrouping Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetComments Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetContentRating Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetPodcast Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetGapless Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetMediaType Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetCompilation Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetTrack Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As trackDisk)
Friend Declare Sub MP4TagsSetDisk Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As trackDisk)
Friend Declare Sub MP4TagsSetHDVideo Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetContentID Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Integer)
Friend Declare Sub MP4TagsRemoveArtwork Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal flags As Integer)
Friend Declare Sub MP4TagsStore Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal file As IntPtr)
Friend Declare Sub MP4Close Lib "libMP4V2.dll" (ByVal file As IntPtr)

Comment #25

Posted on Jun 26, 2011 by Happy Lion

Going to close this issue now--I think the remaining "need unicode support in command line tools" aspects of it are probably better represented in issue 108.

Comment #26

Posted on Jun 26, 2011 by Massive Cat

Sounds good to me. I fixed my app and I'm not sure the command line tools are all that important.

Comment #27

Posted on Jan 18, 2012 by Grumpy Lion

@dana,

I am interested in your API definitions to use libmp4v2 through vb.net...specifically how to save artwork and save information for the "rating" tag (PG, PG-13 etc).

I have made a post on the discussion group regarding this, but it's moderated so it might appear a little later.

Thanks

Comment #28

Posted on Jan 18, 2012 by Massive Cat

I've attached two files, one with the routines I use and one with the declares. Let me know if you have any questions.

Dan

Attachments

Comment #29

Posted on Jan 19, 2012 by Grumpy Lion

@dana,

Thanks for the attachments, but I'm still a little confused.

I see you have declarations for "MP4V2VBWrapper.dll", but what/where is this dll?

I guess the functions i'm looking for are:

Friend Declare Sub VBMP4SetRating Lib "MP4V2VBWrapper.dll" (ByVal buff As Byte(), ByVal file As IntPtr)
Friend Declare Sub VBMP4SetMOVI Lib "MP4V2VBWrapper.dll" (ByVal buff As Byte(), ByVal file As IntPtr)
Friend Declare Sub VBMP4SetKind Lib "MP4V2VBWrapper.dll" (ByVal buff As Byte(), ByVal file As IntPtr)
Friend Declare Sub VBMP4SetCoverArt Lib "MP4V2VBWrapper.dll" (ByVal buff() As Byte, ByVal len As Integer)

Right now in my code I only have code to write the coverArt (using .net 4.0 so I have to use CallingConvention.Cdecl or it throws an error):

<DllImport("libMP4V2.dll", CharSet:=CharSet.Ansi, CallingConvention:=CallingConvention.Cdecl)>
Friend Shared Sub MP4TagsAddArtwork(ByVal tags As IntPtr, ByVal art As CoverArtBox)
End Sub

And then using the following to try to write the art, but it doesn't seem to work, I see the thumbnail in windows explorer change but to a generic "media" type icon in windows, so i'm guessing what i'm saving in the data is not correct.

        If Images.Count > 0 Then
            Dim art As New libmp4v2.CoverArtBox
            Using ms As IO.MemoryStream = New IO.MemoryStream
                Images(0).Image.Save(ms, Images(0).Image.RawFormat)
                art.size = ms.Length
                art.data = ms.ToArray
                ms.Close()
            End Using
            art.type = libmp4v2.MP4TagArtworkType.MP4_ART_UNDEFINED
            libmp4v2.MP4TagsAddArtwork(tags, art)
        End If

And how would I go about writing the rating?

FYI, I only want to write tags/atoms not read.

Thanks for your help.

Comment #30

Posted on Jan 19, 2012 by Massive Cat

How about you email me directly. I can either give you the DLL or give you the DLL and the sources, whichever you prefer.

Dn

Comment #31

Posted on May 12, 2013 by Helpful Rhino

Dan, Are you willing to share the DLL and VB files with me as well?

Status: WontFix

Labels:
Type-Other Priority-High OpSys-Windows