I'm sorry folks to come complaining on such a great project.
I've been trying to consume the JSON files to use in a project of mine (XSS vulns in a webmail interface), and I've hit a brick wall since.. your JSON files aren't at all valid JSON. And so, my ruby script fails to interpret them, and I've spent a day and a half trying to pre-parse them,
I know they are valid javascript, but it's not my fault the JSON guy (and all the JSON fanbois out there) is such an anal guy that he cannot bear single quote strings, or comments.
Anyway, if your goal is to offer interop' and integration with whatever tool, you should move from valid javascript to valid JSON. I'd be glad to help you with the conversion (if I can pull it off) or even with more content if I my current interests lead me to one.
I'm again very sorry, and feel pretty lame, to open a ticket for such a stupid issue, but I think it will help others use the knowledge in your project to secure systems.
Thanks again, Rob'
Comment #1
Posted on Jul 21, 2011 by Massive LionComment deleted
Comment #2
Posted on Jul 21, 2011 by Massive LionHaha - nice ticket, I lol'd :) But I see your point - still am not planning to change the format of the source file at the moment. What would make sense in my opinion are small interface files - enabling interop with PHP, Ruby or whatever you need, written in the language the interop is necessary for. Same could go for a REST API we could host on html5sec.org (thinking html5sec.org/api/php, html5sec.org/api/ruby etc.).
Please let me know if you are interested in setting up what you specifically need - I'll most probably be glad to host it here or give you necessary commit privileges.
Cheers, .mario
Comment #3
Posted on Jul 21, 2011 by Massive GiraffeYeah, but the problem is parsing "loosely valid JSON" into JSON is a FUCKING nightmare (well, to be fair, parsing JSON is a nightmare for the aforemetioned anal reasons). And even more so when the things inside single-quoted-strings are invalid HTML that is supposed to break parsers.. Well, I suppose you write your files directly into 'not-JSON-but-the-thing-we-all-suppose-to-be-JSON', and not from a more strictly structured source where the change would be easy. Then yes, if my work could be of some use for the community, I'd be glad to write those regexps from hell to make an adapter from your files to strict JSON format.
We'll keep in touch, I hope soon. Rob'
Comment #4
Posted on Sep 28, 2011 by Swift GiraffeI'd just like to second this issue, the file is completely useless in its current format. It needs rewritten before anything can even be attempted to be done with it. The good news is that I've documented all of these issues so they can easily be fixed.
1.) Remove the /* */ comments 2.) Remove the "var items = " at the beginning 3.) Swap the " and ', JSON uses double quotes 4.) Remove the control characters. JSON considers anything < 0x1f as control characters. This includes things like 0x09 (tab characters) 5.) \xBC notation is not valid, it should be \u00BC. Same for all other "\x.." patterns. 6.) \' is not valid in JSON. These can safely be replaced with just a single quote. 7.) There are multiple places where the dictionaries have rogue commas at the end. It's always the browser section and the IDs of these are 89, 99, 100, and 102.
I'm including a small python script which addresses all of these issues except the rogue commas. After manually fixing the rogue commas, I was to read in the file with the built-in JSON parser. I'd like to stress that my script is not the best solution, but since I am not a committer, this is the best I can do. Hopefully the maintainers can use the script below to fix up the .json file and maintain the fixed version. Or, if the current version is valuable to someone, rename the current file to be a .js file and then use this script to create a .json file in the build process. That would let people who use Ruby, Python, Java, C++, PERL, or any other language to use the real JSON file while anyone who wants to use JS can use either one.
#
This script will simply fix and load the json file
# import json, re, string
remove comments, this is JSON, not javascript
data = open('html5security.json').read() data = re.sub(r'/*.*?*/', r'', data)
remove the newlines so the regex will work properly
data = re.sub(r'\r?\n', '', data)
strip everything outside the actual JSON data
get_array_only = re.compile(r'.?([.]).*', re.MULTILINE) data = get_array_only.sub(r'\1', data)
swap ' for " and " for '
data = data.translate(string.maketrans("'\"", "\"'"))
convert \xFF to \uFF
data = re.sub(r'\x([0-9a-fA-F]{2})', r'\u00\1', data)
remove the control characters
data = re.sub(r'[\x00-\x1f]*', r'', data)
Json doesn't allow \' (only \")
data = re.sub(r"[^\]\'", r"'", data)
Assuming the commas were fixed, we can now load the file in non-strict mode
j = json.loads(data)
Comment #5
Posted on Sep 29, 2011 by Massive GiraffeWell, I for one admire your courage for trying to regexp your way out of this problem. I tried to in ruby, but my (nonexistent) skills failed me. For the record, here is how I finally did it (when I noticed JSON, unlike XML, can have unicode chars in strings).
Since the js files are valid-js-but-not-valid-JSON, and that they actually assign variables, I just built an HTML file that loads the js, and I use the built-in JSON interpreter to convert it, and then copy-paste it into files. Lacks the automation, but works fine for me. Here is the barebones html file (works in all browsers but ie, one could replace textContent by innerText to make it work).
*******************BEGIN HTML FILE**************************** Converter textarea{ width:800px; height:200px; } function convert(){ var i = JSON.stringify(items); var c = JSON.stringify(categories); var p = JSON.stringify(payloads);
var divItems = document.getElementById("items"); var divCategories = document.getElementById("categories"); var divPayloads = document.getElementById("payloads");
var d1=document.createElement("textarea"); d1.textContent=i; divItems.appendChild(d1); var d2=document.createElement("textarea"); d2.textContent=c; divCategories.appendChild(d2); var d3=document.createElement("textarea"); d3.textContent=p; divPayloads.appendChild(d3);
}
Click me!!1!one!
Items
Categories
Payloads
*********************END HTML FILE******************************* PS: Notice what I did? Safely injected a string into HTML... One wonders..Rob'
Comment #6
Posted on Jun 26, 2012 by Massive LionFormat stays as it is. No further requests over the last n>6 months.
Status: WontFix
Labels:
Type-Defect
Priority-Medium