Export to GitHub

phantomjs - issue #52

File download


Posted on Feb 28, 2011 by Massive Giraffe

It would be good to accept (and save) 'Content-Disposition: attachment; filename=' content.

Comment #1

Posted on Feb 28, 2011 by Quick Rabbit

This is again related to issue 41.

Comment #2

Posted on Apr 26, 2011 by Grumpy Elephant

Issue 92 has been merged into this issue.

Comment #3

Posted on Jun 23, 2011 by Massive Bear

I'm trying to implement this functionality and not making much progress. Using the attached patch, I run:

$ bin/phantomjs examples/download.js

and get this output:

WebPage instantiated WebPage instantiated Download complete - fail

I added cout of "WebPage instantiated" (to verify my debug messages work as expected). I also added a cout in my downloadRequested slot. That one did not get displayed. Can someone spot what I'm doing wrong or let me know if I'm on the completely wrong track?

Here is where I found out about the downloadRequested signal: http://doc.qt.nokia.com/latest/qwebpage.html#downloadRequested

Attachments

Comment #4

Posted on Jun 23, 2011 by Massive Bear

Whoops, here is the patch file attachment without the ANSI color codes

Attachments

Comment #5

Posted on Aug 16, 2011 by Happy Ox

Any progress on this issue?

Comment #6

Posted on Aug 16, 2011 by Quick Rabbit

No progress as of now.

Comment #7

Posted on Aug 16, 2011 by Happy Ox

A friend of mine (http://svay.com/) just told me a nice trick for dealing around with this issue, using XHR within the page environment and base64 encoding to retrieve file contents and it works rather great. For the record you can find an example here: http://jsfiddle.net/3kUXy/

Comment #8

Posted on Jul 27, 2012 by Swift Dog

The URL to the file is not always known so XHR is not a general solution. For instance, if you are downloading a utility/bank/cc statement, you may have to click a link which will possibly execute some JS code and trigger another page load with a frame embedding the PDF. Or the statement comes in as an attachment.

What will it take to support the file download feature?

Requirement: Download files that come in embedded in the page/frame or as attachments. The URLs may or may not be known. Allow saving the files to the file system or "upload" them to a web server (so the server can save the files in a DB for instance).

Comment #9

Posted on Aug 10, 2012 by Happy Rabbit

I've got an early but functional version of this at

https://github.com/woodwardjd/phantomjs/tree/add_download_capabilities

Example:

var page = require('webpage').create();

page.onUnsupportedContentReceived = function(data) { console.log('Got a download at url: ' + data.url); page.saveUnsupportedContent('some.file.path', data.id); phantom.exit(); }

page.open('http://some.pdf.url.com/some.pdf');

I call this "early but functional" because it works where I've tested it (linux, PDF downloads), but has a likely small memory leak, and I'm not 100% convinced the callback mechanism I used is idea.

Comments desired.

Comment #10

Posted on Sep 1, 2012 by Massive Monkey

I've downloaded and built the git for above, but I can't seem to get the onUnsupportedContentReceived event to fire and calling saveUnsupportedContent throws an undefined error. Are there special build steps required to enable it?

Thanks, Robert

Comment #11

Posted on Sep 4, 2012 by Happy Rabbit

No special build steps required, as far as I know. If saveUnsupportedContent is undefined, maybe you haven't built the version in the add_download_capabilities branch (git checkout add_download_capabilities after the git clone)? Just speculating.

Comment #12

Posted on Sep 4, 2012 by Swift Wombat

I second the XHR+base64 method. It takes another 50+ lines of code to send to page.evaluate(), and I have to de-base64 the content afterward, and that's basically how CasperJS does it (as far as I can tell from their code—they do a lot of weird (unnecessary, in my book) binding with window.utils in the page context).

I used this one (first answer): http://stackoverflow.com/questions/7370943/retrieving-binary-file-content-using-javascript-base64-encode-it-and-reverse-de

It works great. Just be sure to try-catch the call to base64ArrayBuffer(), because Uint8Array(arrayBuffer) may throw an error, and check xhr.getHeader('content-type') == 'application/pdf' if you're doing pdf downloads like I was.

Comment #13

Posted on Oct 4, 2012 by Grumpy Hippo

I need this as well. Can't use the XHR method because the inline attachments I need to scrape don't come with a URL I can hit.

Comment #14

Posted on Oct 4, 2012 by Swift Wombat

Wouldn't inline attachments be even more easily downloaded? For an image: var content = page.evaluate(function() { return $('img#whatever').attr('src'); }); fs.write(yer_path, content, 'w');


Ariya, can you give some estimate of how long this feature (downloading a url) would take to implement? I'd love to get involved in PhantomJS development, but maybe this issue is a lot trickier than it sounds?

Comment #15

Posted on Oct 5, 2012 by Grumpy Hippo

Sorry, I didn't mean to write "inline". The file I need is not an image and is not part of the DOM. It gets sent as a result of a POST with the Content-Disposition header 'attachment;filename="report.csv"'

Comment #16

Posted on Nov 20, 2012 by Swift Bird

Hi there. I think the base64-encoding solution can only be a stop-gap solution.

  • Downloading big files will probably exhaust memory and base64 encoding and -decoding it will use up resources that would have better been spent elsewhere - therefore we want to have the option to redirect a downloaded stream to file
  • We may have pages where we cannot control the loading of a file that is not supported (e.g. PDF)
  • We may want to save resources that have already been loaded as part of the page (e.g. images)

I think the optimal solution would be to add functionality to the onResourceReceived hook to allow setting up a "redirection" handler, and if such a handler is set, unsupported file formats should silently be downloaded. This handler could then have another onDownloadFinished hook to resume operation once the download is done.

Comment #17

Posted on Jan 12, 2013 by Happy Horse

(No comment was entered for this change.)

Comment #18

Posted on Mar 16, 2013 by Happy Horse

Closing. This issue has been moved to GitHub: https://github.com/ariya/phantomjs/issues/10052

Status: Migrated

Labels:
Type-Enhancement Priority-Medium Milestone-FutureRelease