MilestonesIn ProgressM4TODO - More code comments
- Cache MessageInfo objects instead of raw IMAP replies, to speed up replay speed
- Reduce memory consumption
- Wiki page outlining basic design
- Total unique recipients, senders, lists.
- Refactor jwzthreading.py to not run into recursion limits
- Combine recipients/senders based on --me input
DONE - Run on Enron corpus and upload results
- Add JS obfuscation for printed email addresses
- Tarball for downloads
UpcomingM5- Top N tables of domains for senders, recipients
- Mailbox size over time
- Support non-Gmail servers (go through all mailboxes instead of just All Mail)
- Split out sent mail, starred, etc.
- Break down by all mail vs. label
- X-mailer distribution
- Attachment extension distribution
DoneM0Finished on 12/25/2007 - Fetch mail headers for all mail
- Fetch labels for all mail
- Record/replay support for FETCH to speed up development
- Optimize StringScanner
- Chart with messages by day of week
- Chart with messages by time of day
- Chart with messages per year
- Chart with messages per month
- Chart with messages per day
- Column layout
M1Finished on 1/1/2008 - Table with top recipients (messages and bytes)
- Add tabs (date, size, sender, recipient)
- Table with top senders (messages and bytes)
- Table with top list-ids's (messages and bytes)
- Title with total counts, date range
- Size distribution
- Table with top messages by size
- Improve SubjectSenderFormatter (max length/clipping, better from name extraction, tooltip with email address)
- Dividers between years in month drop-down
- Skip over empty stats in stat collections (e.g. months with no data)
M2Finished on 1/21/2008 - Handle encoded names/subjects
- Linkify messages/senders/recipients to searches
- Normalize +addresses
- Remove "All Mail" from all stat titles
- Thread list stats
- Instead of using longest name for an address, use the most common
- Thread length stats
- Thread sender stats
- Construct threads from in-reply-to
M3Finished on 3/16/2008 - Table with top senders to me
- Table with top recipients from me
- Allow "me" email addresses to be specified
- Allow things to be excluded
- Filled graph of senders
- Filled graph of recipients
- Filled graph of lists
- Split up stats.py
- Add support for secure password entry (getpass module)
- Split up large threads that rely purely on subjects
- Getting started wiki page
- Better progress in output (when fetching a chunk, say how many are left)
- Link to SVN log feed
- Distribution of senders to me
- Distribution of recipients from me
|
Why not abstract out the IMAP functionality into a generic MailSource? abstraction? A lot of people archive mail locally in mbox or MailDir?, and it will be much easier/faster to run this over local storage than going through IMAP.
is it not possible to get some kind of hosted pre-installed version of this? It would be pretty cool as I can't install python and don't know the first thing about code. thanks.
Can this be back-ported to Python 2.4 so that those of us with corporate machines can use it? Thanks.
a simple .exe would be amazing. I have played with Python, Cheetah, Monkey and Squirrel for hours. I have no idea what I am doing and can't get anything to work.
A simple version would spread like wildfire, I'm sure.
THANKS!!!!!
This is freaking cool... What a geek I am. Up and running in about 30 seconds on my Mac and almost done downloading now. :) Thanks.
One thing that could be very nice is to create a Web App with this functionality, for example using the brand new Google App Engine... Creating a simple interface just to put your email, and then using the same python code, and for example Google Chart API to show the result... (but i lack the knowledge to do it myself). Anyone??
It would be great to track information about my average response times to emails. Specifically, this could be displayed as a graph of the distribution of bucketed response times (e.g. with buckets <1h, 1-4h, 4-8h, 8-24h, 1-3d, >3d).
Please, an OS X executable! I really want this to work!