Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http_server uploaded file handling #14303

Closed
DartBot opened this issue Oct 22, 2013 · 10 comments
Closed

http_server uploaded file handling #14303

DartBot opened this issue Oct 22, 2013 · 10 comments
Assignees
Labels
area-pkg Used for miscellaneous pkg/ packages not associated with specific area- teams.

Comments

@DartBot
Copy link

DartBot commented Oct 22, 2013

This issue was originally filed by TerryMit...@gmail.com


Current http_body implementation decodes / parses HttpBodyFileUpload.content based upon the Content-Type header of the part.

However, file server applications need raw uploaded file data. Such applications save received files into their file system with no modification. I think we need a switch to disable decoding / parsing for all content types and just return type List<int> object as HttpBodyFileUpload.content.

Another solution would be to return simply List<int> for all uploaded files. I am not sure how much the impact of this on other kinds of applications. However, when a .json file was uploaded from HTML form, IE sends it as “Content-Type: text/plain” and Chrome, Firefox and Safari send it as “Content-Type: application/octet-stream” (I don’t know why). In any case, .json files uploaded through HTML form will never be parsed.

Regarding to the filename with multibyte characters, although Windows uses UTF-8, I think it might be safe to keep LATIN1 decoding. I am not familiar with other file systems. We can retrieve it using UTF8.decode(LATIN1.encode(part.filename)) with Windows, assuming that the LATIN1.decode(bytes) simply generates a String that has the same byte character to corresponding byte of bytes.

@madsager
Copy link
Contributor

cc @skabet.
Added Area-IO, Triaged labels.

@andersjohnsen
Copy link

Set owner to @skabet.
Removed Area-IO label.
Added Area-Pkg, Library-HttpServer, Accepted labels.

@andersjohnsen
Copy link

Hi Terry,

You are right, in most cases with file uploads, it's the raw binary one wants to access. What if we do the following:

  1. Always provide the raw List<int> data.
  2. Add a method to the FileUpload class: 'parsedData()', that will try and parse/decode the data depending on the mime type. We can even throw in a optional 'mineType' argument for it, so one can override the default mime type, e.g. parse as 'text/utf-8' instead of 'application/json'.

Regarding the filename, I think we should do a test and see what the different browsers upload. if we can hit a 90% success rate with some default encoding, that could be the way to go.

@andersjohnsen
Copy link

I just tried with both chrome and Windows, and I get the following:

With &lt;meta charset="UTF-8" />:
 - Chrome: as utf8
 - IE: as utf8

Without &lt;meta charset="UTF-8" />:
 - Chrome: multi-bytes replaced with ?
 - IE: as utf8

I think it's fine to use utf8-decoding for filenames.

@DartBot
Copy link
Author

DartBot commented Oct 30, 2013

This comment was originally written by TerryMit...@gmail.com


I confirmed it on my Windows Vista using following HTML text:

001 <!DOCTYPE html>
002 <html>
003 <head>
004 <title>file_upload_test</title>
005 <meta http-equiv="content-type" content="text/html; charset=UTF-8">
006 </head>
007 <body>
008 <form action="http://localhost:8080/DumpHttpMultipart"
009 enctype="multipart/form-data"
010 accept-charset="UTF-8"
011 method="POST"> <br>
012 What is your name? <input type="text" name="submitter"> <br>
013 What files are you sending? <input type="file" name="content"> <br>
014 <input type="submit" value="Send File">
015 </form>
016 </body>
017 </html>

If line 005 or 010 exists, Chrome, Firefox and Safari send filenames with multi-byte characters as UTF-8. Otherwise, such filenames are transmitted as Shit_JIS characters (one of most popular Japanese character encodings). Regardless of existence of line 005 or 010, IE sends them as UTF-8.

I agree to use UTF-8 decoding (current implementation uses ISO-8859-1 decoding) for filenames. It’s common to add line 005 for such applications.

@andersjohnsen
Copy link

Hi

What do you think about the following API?

/**
 * A HTTP content body produced by [HttpBodyHandler] for either [HttpRequest]
 * or [HttpClientResponse].
 /
abstract class HttpBody {
  /**
   
The actual data of the request.
   */
  List<int> get data;

  /**
   * Convert the data using mimeType.
   *
   * If mimeType is left unspecified, the Content-Type header will be used.
   */
  dynamic asMimeType({String mimeType});

  /**
   * Parse the [data] as text.
   *
   * If the headers contains a charset hint, that charset will be used.
   */
  String asText();

  /**
   * Parse the [data] as JSON.
   */
  dynamic asJSON();

  /**
   * Parse the data as either multipart/form-data or
   * application/x-www-form-urlencoded.
   *
   * The Content-Type header will be used to identify the parsing.
   */
  Map asFormPost();
}

/**
 * The [HttpBody] of a [HttpClientResponse] will be of type
 * [HttpClientResponseBody].
 */
abstract class HttpClientResponseBody implements HttpBody, HttpClientResponse {
}

/**
 * The [HttpBody] of a [HttpRequest] will be of type [HttpRequestBody].
 */
abstract class HttpRequestBody implements HttpBody, HttpRequest {
}

/**
 * A [HttpBodyFileUpload] object wraps a file upload, presenting a way for
 * extracting filename, contentType and the data of the uploaded file.
 /
abstract class HttpBodyFileUpload {
  /**
   
The filename of the uploaded file.
   */
  String get filename;

  /**
   * The [ContentType] of the uploaded file.
   */
  ContentType get contentType;

  /**
   * The content of the file.
   */
  List<int> get content;
}


cc @sethladd.

@sethladd
Copy link
Contributor

Thanks! I like how HttpRequestBody implements HttpRequest now. Also, I like how I can control how I get the body (json, text, etc) because sometimes a content-type is not set on the request.

@DartBot
Copy link
Author

DartBot commented Oct 31, 2013

This comment was originally written by TerryMit...@gmail.com


I think this will give us more flexible POST body data handling.

@anders-sandholm
Copy link
Contributor

Removed Library-HttpServer label.
Added Pkg-HttpServer label.

@DartBot DartBot added Type-Defect area-pkg Used for miscellaneous pkg/ packages not associated with specific area- teams. labels Feb 6, 2014
@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

This issue has been moved to dart-archive/http_server#11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-pkg Used for miscellaneous pkg/ packages not associated with specific area- teams.
Projects
None yet
Development

No branches or pull requests

6 participants