MilestonesIn ProgressM4TODO - More code comments
- Cache MessageInfo objects instead of raw IMAP replies, to speed up replay speed
- Reduce memory consumption
- Wiki page outlining basic design
- Total unique recipients, senders, lists.
- Refactor jwzthreading.py to not run into recursion limits
- Combine recipients/senders based on --me input
DONE - Run on Enron corpus and upload results
- Add JS obfuscation for printed email addresses
- Tarball for downloads
UpcomingM5- Top N tables of domains for senders, recipients
- Mailbox size over time
- Support non-Gmail servers (go through all mailboxes instead of just All Mail)
- Split out sent mail, starred, etc.
- Break down by all mail vs. label
- X-mailer distribution
- Attachment extension distribution
DoneM0Finished on 12/25/2007 - Fetch mail headers for all mail
- Fetch labels for all mail
- Record/replay support for FETCH to speed up development
- Optimize StringScanner
- Chart with messages by day of week
- Chart with messages by time of day
- Chart with messages per year
- Chart with messages per month
- Chart with messages per day
- Column layout
M1Finished on 1/1/2008 - Table with top recipients (messages and bytes)
- Add tabs (date, size, sender, recipient)
- Table with top senders (messages and bytes)
- Table with top list-ids's (messages and bytes)
- Title with total counts, date range
- Size distribution
- Table with top messages by size
- Improve SubjectSenderFormatter (max length/clipping, better from name extraction, tooltip with email address)
- Dividers between years in month drop-down
- Skip over empty stats in stat collections (e.g. months with no data)
M2Finished on 1/21/2008 - Handle encoded names/subjects
- Linkify messages/senders/recipients to searches
- Normalize +addresses
- Remove "All Mail" from all stat titles
- Thread list stats
- Instead of using longest name for an address, use the most common
- Thread length stats
- Thread sender stats
- Construct threads from in-reply-to
M3Finished on 3/16/2008 - Table with top senders to me
- Table with top recipients from me
- Allow "me" email addresses to be specified
- Allow things to be excluded
- Filled graph of senders
- Filled graph of recipients
- Filled graph of lists
- Split up stats.py
- Add support for secure password entry (getpass module)
- Split up large threads that rely purely on subjects
- Getting started wiki page
- Better progress in output (when fetching a chunk, say how many are left)
- Link to SVN log feed
- Distribution of senders to me
- Distribution of recipients from me
|
Why not abstract out the IMAP functionality into a generic MailSource? abstraction? A lot of people archive mail locally in mbox or MailDir?, and it will be much easier/faster to run this over local storage than going through IMAP.
is it not possible to get some kind of hosted pre-installed version of this? It would be pretty cool as I can't install python and don't know the first thing about code. thanks.
Can this be back-ported to Python 2.4 so that those of us with corporate machines can use it? Thanks.
a simple .exe would be amazing. I have played with Python, Cheetah, Monkey and Squirrel for hours. I have no idea what I am doing and can't get anything to work.
A simple version would spread like wildfire, I'm sure.
THANKS!!!!!
This is freaking cool... What a geek I am. Up and running in about 30 seconds on my Mac and almost done downloading now. :) Thanks.
One thing that could be very nice is to create a Web App with this functionality, for example using the brand new Google App Engine... Creating a simple interface just to put your email, and then using the same python code, and for example Google Chart API to show the result... (but i lack the knowledge to do it myself). Anyone??
It would be great to track information about my average response times to emails. Specifically, this could be displayed as a graph of the distribution of bucketed response times (e.g. with buckets <1h, 1-4h, 4-8h, 8-24h, 1-3d, >3d).
Please, an OS X executable! I really want this to work!
Hi Mihai- this is really a great script. I ran it a year ago and it worked fine, but I think now that my Gmail box is significantly larger, I'm running into the max recursion problem that you mentioned as being an "in progress" task for M4. Has there been any more progress on that?
Here is the traceback:
01:16:12,422? Logging out 01:16:12,798? Identifying "me" messages 01:17:15,812? 27905 messages are from "me" 01:17:15,812? 64515 messages are to "me" 01:17:15,837? Extracting threads Traceback (most recent call last):
(this goes on for awhile...)
RuntimeError?: maximum recursion depth exceeded while calling a Python object