My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Introduction  
Introduction to Phirehose: A PHP interface to the Twitter Streaming API
Featured
Updated Aug 2, 2011 by fenn.bai...@gmail.com
Comment by abombox@gmail.com, Jan 1, 2010

A good implementation for enqueueStatus()?

Comment by project member fenn.bai...@gmail.com, Jan 4, 2010

Regarding implementations of enqueueStatus() it really depends on whether you need the data "live" or not. If you don't it's pretty easy, you could just enqueue to a flat file, ie:

  • Use fopen() to open a file when your client is instantiated
  • Write the raw status to the file using fputs()
  • Rotate the file every N minutes (or N lines)

Then you can just run a cron type task to process the rotated files every hour or whatever.

If you need to actually process/respond to the tweets live, you may want to look into a queueing system like RabbitMQ or something simpler like Redis (which is super easy & great).

Good luck!

Comment by dixr...@hotmail.com, Jan 15, 2010

Thanks for this code! Do you plan to continue working on it in the future ?

I was wondering if I have a website that wants to stream tweets from a bunch of twitter accounts that the visitor will provide their credentials, do I have to create multiple instances of your class and have multiple connections to twitter with all my user accounts to be able to get all the tweets (from the people they follow) I need ?

Thank you!

Comment by wernicke...@gmail.com, Jan 17, 2010

This is an amazing piece of software. Thank you so much for the time you've put into the development!

Comment by georgeme...@gmail.com, Jan 17, 2010

I'm having trouble keeping it active. I run it from the command line (php sample.php &) as a background process. I put this in the enqueueStatus() section just to dump it to a text file.

public function enqueueStatus($status) {
// BEGIN FILE OPS $time = date("YmdH");
if ($newTime!=$time) {
@fclose($fp2); $fp2 = fopen("{$time}.txt","a");
} fputs($fp2,$status); $newTime = $time;
// END FILE OPS
}

It works great for about an hour 1/2 to 2 hours or so. It creates the files and dumps the data from the stream into them, but then it just stops.

Any idea why?

Comment by project member fenn.bai...@gmail.com, Jan 17, 2010

In reply to dixroue@hotmail.com:

I do plan on maintaining this code for the moment - that said, if anyone is keen on contributing/helping support the project, I'd happily share this role.

Regarding your question - No, you should only need one set of credentials (for the moment) to stream tweets for up to 200 twitter accounts (using the FILTER method and follow attribute). You can get extended access by applying for it to Twitter. Alternately, you could possible connect multiple times, however, I'm not sure if they ban accounts that connect too many times from the same IP (you'll need to check the Twitter Streaming API docs).

Comment by project member fenn.bai...@gmail.com, Jan 17, 2010

In reply to wernicke.marcus: Thank you for the praise, I'm glad you like it/it's been useful.

In reply to georgemedia: Hmmm, not sure why it would "just stop" - I've used the library myself a fair bit in long running processes (weeks/months) and seems reliable.

I would perhaps put some print or log statements in your enqueueMethod to see if you can see what is dying. Also, redirect your daemon output to a file to ensure you're not getting PHP errors that you're not seeing.

Good luck!

Comment by samverme...@gmail.com, Jan 23, 2010

Awesome piece of code. Thank you so much! However I'm having the same problem as georgemedia. I will get tweets for about 30min or so and then it just stops. I echo the log messages and I don't see any error. How do I keep it alive?

Comment by project member fenn.bai...@gmail.com, Jan 26, 2010

Hey samvermette: Hmmm, I've been testing this myself and I can't seem to get it to stop processing status updates (at least, not reliably - I think it may have done it once).

At this point, I'm not certain whether this is a Twitter or a Phirehose problem.

When it stops working, do you still get the periodic status messages every minute summarizing the average statuses per second/etc?

As a worst case hack, you could put in some code in checkFilterPredicates() that said if it doesn't receive a status update for > 5 minutes, then reconnect or something like that, but ideally I'd like to track down if this is a problem with the library or Twitter itself.

Cheers!

Comment by Ivo.Baet...@gmail.com, Jan 29, 2010

I always get the following error Phirehose: Connecting to twitter stream: http://stream.twitter.com/1/statuses/filter.json with params: array ( 'delimited' => 'length', 'track' => 'blue,green,red,yellow',) Phirehose: Failed to connect to stream: HTTP ERROR 406: Not Acceptable (No filter parameters found. Expect at least one parameter: follow track).

Comment by samverme...@gmail.com, Jan 31, 2010

@fenn.bailey: I'm sure the issue has to do with my way of running the script. I had the described issue when accessing the file's url using a regular browser. After that I ended up running it using ssh. However whenever I close the ssh connection, the script will stop receiving tweets after 30 min. How do I keep it alive? I can't have that ssh connection up 24/7.. Am I supposed to have a cron job running the script every 30min?

Comment by project member fenn.bai...@gmail.com, Feb 1, 2010

@samvermette - Aaah, I see what you (and possibly everyone else) is having a problem with.

Phirehose (or any long running daemon-like script) should always be run from the command line environment and backgrounded (ie: it is not designed to be run from within a web-server/page).

Think of it like running apache itself or your mailer-daemon - you start it when you want, then have to specifically kill the process when you want it to stop, restart or whatever.

How this is done exactly is OS specific, but an example of backgrounding a process in linux is the command:

  php5 your-script.php >> /tmp/your-script.log 2>&1 &

You can find some info on process control here: http://linux.about.com/od/nwb_guide/a/gdenwb01t66.htm

Comment by project member fenn.bai...@gmail.com, Feb 1, 2010

@Ivo.Baettig - I'm not sure what's happening there. If I connect with the same params it looks as follows:

Phirehose: Connecting to twitter stream: http://stream.twitter.com/1/statuses/filter.json with params: array (  'delimited' => 'length',  'track' => 'blue,green,red,yellow',)

And works perfectly. Perhaps you could post some sample code to the user-group?

Comment by m.j.vand...@gmail.com, Feb 7, 2010

@Ivo.Baettig, the problem can be fixed by changing arg_separator.output = "&" to arg_separator.output = "&" in the php.ini and restart your webserver.

Comment by samverme...@gmail.com, Feb 9, 2010

@fenn.bailey thanks a lot for helping. The link really helped.

I'm connecting to my server through SSH. I used to run the script doing php5 twitter.php &. That used to start the script, and would run until 30min after I disconnect.

I've tried your command and the script stops running as soon as I quit Terminal. I tried using the nohup command, doing nohup php5 twitter.php &. The script kept running maybe 5 minutes after I quit Terminal. Am I missing something? Isn't nohup supposed to keep the script running even after disconnection?

Could it be something enforced by my web hosting (MediaTemple? in my case)?

Comment by project member fenn.bai...@gmail.com, Feb 9, 2010

Hey @samvermette,

Yeah, it looks like you're doing everything right - It is quite possible that MediaTemple? doesn't allow long-running PHP processes.

This is unfortunate, as any "live" streaming process must be long running (ie: a cron task is no good).

You may need to contact their support department for help on this (or consider another hosting provider).

Good luck!

Comment by samverme...@gmail.com, Feb 9, 2010

Thanks for clarifying. Indeed, Media Temple does limit PHP execution time to 120 seconds on all (gs) servers. http://kb.mediatemple.net/questions/1620/CGI+and+PHP+resource+limits

Comment by m.j.vand...@gmail.com, Feb 14, 2010

Please change line 437 to:

list($httpVer, $httpCode, $httpMessage) = preg_split('[\s]', trim(fgets($this->conn, 1024)), 3);

And make it compatible with php 5.3

Comment by project member fenn.bai...@gmail.com, Feb 15, 2010

@m.j.vanderveen: Well spotted, fixed (will be in next release).

Comment by 91white...@gmail.com, Mar 11, 2010

Hello,

I would like to know if its possible to automatically run my PHP script that use Phirehose when apache start/restart ?

Thank you

Comment by project member fenn.bai...@gmail.com, Mar 15, 2010

Hi @91whitez24,

This discussion on the mailing list should help you:

http://groups.google.com/group/phirehose-users/msg/c3c47b574260b2a1

Comment by garyc40%...@gtempaccount.com, Jun 9, 2010

One interesting thing.

It turns out phirehose keeps running even after the execution time limit has passed (at least on my local server as well as hosted server).

I tried running a script I wrote using phirehose in my browser (Chrome), then I click the "Stop" button of the browser, but the fetching process keeps running on the background. I had to manually turn off apache to stop it. Has anyone run into the same situation?

Comment by zheng1...@gmail.com, Jul 2, 2010

Great job!!!!

Helped me a lot! Thank you!!!

Comment by todd%fis...@gtempaccount.com, Aug 13, 2010

Hi, Twitter says that Basic Auth is getting disabled on August 16, 2010. Are you planning on making an Oauth version of Phirehose?

Comment by project member fenn.bai...@gmail.com, Aug 14, 2010

Hey Todd,

Basic Auth is only being disabled on the REST API, not streaming.

At the time the last version was written, there was no OAuth access to the streaming API, but I see it now exists.

It will definitely be in the next version, cheers!

Comment by lit...@gmail.com, Aug 31, 2010

On average how many tweets per second do you guys consume? i have a very lightweight enqueueStatus() and i am only managing 8 tweets per second with METHOD_SAMPLE.

What would you recommend to speed this up? my server stats: linode 512 with nothing else running on the box.

thanks

Comment by jigs...@gmail.com, Nov 14, 2010

I get this error, appreciate your feedback,

Phirehose: Connecting to twitter stream: http://stream.twitter.com/1/statuses/filter.json with params: array ( 'delimited' => 'length', 'follow' => '12,13,15,16,20,87',) Phirehose: Resolved host stream.twitter.com to 128.242.240.23 Phirehose: Connecting to 128.242.240.23 Phirehose: Connection established to 128.242.240.23 Phirehose: postdata is delimited=length&follow=12%2C13%2C15%2C16%2C20%2C87 Phirehose: HTTP failure 1 of 20 connecting to stream: HTTP ERROR 406: Not Acceptable (No filter parameters found. Expect at least one parameter: follow track locations annotations). Sleeping for 10 seconds.

-Jigs

Comment by phgorva...@gmail.com, Dec 8, 2010

How it works Please tell me in brief and in php working and with few source written like in enequestream written what please give me in brief intro how json is used in this i want to find tweets from twitter strimming api

Comment by alexbran...@gmail.com, Dec 29, 2010

Great piece of software...

I just started using it, but I notice that when I run the scripts from a browser, it will keep creating queue files even after stopping the page. any ideas?

Comment by mspr...@gmail.com, Jan 18, 2011

When I launch the script in the background ./tap.php & it refuses to run until I foreground it with fg . jobs says [1]+ Stopped ./tap.php

Any ideas? It runs perfectly fine under OS X, but not Ubuntu or Xubuntu (under virtualbox both) nor the Ubuntu install on Amazon EC. Funnily the same is true for even the simplest of scripts..

Is the only way to start it inside a screen session and then detach?


Answering my own question:

PHP CLI scripts not running in the background under Ubuntu/Debian-based distros, as per http://ubuntuforums.org/showthread.php?t=977332 and http://bipinb.com/making-php-program-as-daemon.htm#comment-142

The solution for now:

./yourscript.php < /dev/null &

Apparently PHP wants input from STDIN (or an interactive shell?) so feed it an EOF from /dev/null and it's happy!

Also: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=286356

Comment by project member fenn.bai...@gmail.com, Jan 29, 2011

@mspreij Hmm, interesting (and well spotted/found!).

That's an weird issue that's worth being aware of.

Comment by arevans...@yahoo.com, Apr 6, 2011

This library has been working very well for me. Seeing +-50 tweets per second with +-125 track predicates.

Thank you!

I initially had to remove the "count" parameter from the requests because Twitter complained about not having the necessary role to use the count parameter.

In my app I eventually may need to move beyond the 400 public track limit. My question is whether I would experience gaps in the stream between predicate changes due to the larger number of track terms, or whether the existing connection is maintained until the new predicates have been uploaded and streaming, and then only dropped as per the recommendation in the API documentation?

Comment by lech.grz...@gmail.com, Apr 8, 2011

Great Job!

Thanks Guys.

Comment by ezequiel...@gmail.com, Apr 29, 2011

If you are doing everything well and you are still getting the 406 error code, then try setting arg_separator.output to '&':

ini_set('arg_separator.output', '&');

This is because http_build_query() (used in Phirehose) is replacing '&' with '&amp;'

Comment by arevans...@yahoo.com, May 25, 2011

This library is the entry point into my Twitter project. http://followitt.com is the exit point.

A rock solid library. Thanks again.

Comment by gene.el...@gmail.com, Jul 15, 2011

Question. Does anyone have an example of using a queuing system like RabbitMQ? I am using Site Streams to store tweets about certain users and I need to process them right away, but I know I need to decouple the processing from the storing (drinking from the phirehose). Has anyone done this in PHP? Thanks for your help.

Comment by j...@jaxx.org, Jul 27, 2011

Hi all ! Hi Gene I started a micro-example project where you can already find an example twitter->amqp consumer https://github.com/happyjaxx/twabbitmqnjs

it uses php-ampqlib from https://github.com/tnc/php-amqplib

final goal is a three part engine: 1) Consumer (php), doing the minimum and fastest consuming (+ testing some RabbitMQ specific features like TTL on messages, avoids having >2000 tweets on a durable queue, though the processor is supposed to be started right after, so it has no real use in the end) 2) Message processing (php as well) to split on different topics (just as an example) 3) Spitting everything to the client through nodes.js/socket.io, client passing parameters to filter which topics he'll receive

It's crappy code I believe, I'm more of a sys admin :-)

JaXX

Comment by chen.1...@gmail.com, Aug 8, 2011

Hello, I just found that in Phirehose.php line 315, the function stream_select() is not implemented anywhere in the library files. This will cause "CRT parameters not detected" warning msg on my test machine which is currently WIN7 64bit.

Can this problem be solved?


Sign in to add a comment
Powered by Google Project Hosting