Export to GitHub

rolling-curl - issue #21

Better syntax for streaming file processing : huge XML files


Posted on Jul 9, 2011 by Happy Ox

This module is really wonderful... it saved the day in my application, which otherwise I would had to move to a threading solution!


However, the call syntax is not quite optimal for what I'm doing. I am processing huge XML files that are too big to load in memory, and too big to completely process before starting the curl operation. This syntax would be ideal:

<?php require("RollingCurl.php");

function request_callback ($response, $info, $request, $callback_parameter) { ... }

$rc = new RollingCurl("request_callback"); $rc->window_size = 20; $rq = new RollingCurlRequest(); while( $xml = get_next_xml_element() ) { $rq->url($xml['url']); $rq->callback_parameter = $xml; $rc->execute_until_blocked($rq); // Blocks if queue full } $rc->finish(); // Returns after last pending request is done ?>

Then I can maintain my streaming process, yet still stuff requests in to curl as fast as they will go. Also note the extra parameter that gets passed to the callback.

Comment #1

Posted on Jul 11, 2011 by Happy Ox

As a step in that direction, example.php could show attaching some context data to the request object (so the callback knows what triggered the request):

$rc = new RollingCurl("request_callback"); $rc->window_size = 20; $i = 0; foreach ($urls as $url) { $request = new RollingCurlRequest($url); $request->extra_data = $i++; $rc->add($request); } $rc->execute();

Comment #2

Posted on Jul 11, 2011 by Happy Ox

Another use case for the above syntax

In your callback you see http-equiv="refresh" And wish to queue the target page to load (the page you loaded is probably useless).

Status: New

Labels:
Type-Defect Priority-Medium