|
|
Requirements
- A Gmail or Google Apps mail account
- Python 2.5 or later
Installation
Download a stable snapshot of the code. Alternatively, if you'd like to get the latest code from Subversion, you can do this with:
svn checkout http://mail-trends.googlecode.com/svn/trunk/ mail-trends
You will also need Cheetah (a template system) installed. You can download it and then follow the installation instructions. If you are using an apt-based Linux distribution sudo apt-get install python-cheetah may be enough for you.
Running
Go into the mail-trends directory:
cd mail-trends
Run the program, replacing the username and passwords as appropriate (if you omit the password in the commandline you will be (securely) prompted for it):
python main.py \ --server=imap.gmail.com \ --use_ssl \ --username=username@gmail.com \ --me=username@gmail.com,username@somedomain.com \ --skip_labels
You will be prompted for the password for your account (you can also use --password= to specify it as an argument. The --me argument lets you specify what emails should be considered as being sent to/from you.
The output is in HTML, placed in out/index.html. Open that in your favorite web browser.
Other options
--skip_labels does not categorize messages into labels (there are not stats that use them yet).
--filter_out=... can be used to ignore certain messages. You can use to:username@example.com, from:username@example.com or list:listid.example.com to filter out by recipient, sender or list (you can have multiple filter clauses by separating them with commas).
You can pass in --record and --replay as command line arguments to respectively capture and play black message fetches from the IMAP server. This is meant to aid in development by speeding up data fetches.
--max_messages=NNNN can be used to limit the number of messages that are fetched. Normally the most recent messages are selected, use --random_subset to specify that the messages be chosen over all the messages.
Sign in to add a comment

I'm encountering an issue: on Mac OS X 10.5.2, with a "stock" python as supplied/patched by Apple.
The error is "ImportError?: No module named util". The top line on the trace is main.py, line 274.
Please help!
BTW, suggest you set up a Google Group for this tool. Thanks!
I also saw the "import util" issue. You can hack around it by editing the files in mail-trends/templates/.tmpl to remove the dependency on util.
re: import util
I have python2.4 and 2.5 installed on the Debian box I tested this with; /usr/bin/python on my box points to 2.4. I changed main.py to use 2.5 and it worked fine.
i just ran 'sudo cp mail-trends/templates/util.py /usr/lib/python2.5/'
Any ideas on this one? Running Vista, latest python download
RuntimeError?: maximum recursion depth exceeded
Ubuntu 7.10 only with Python 2.5 has the same import problem. 'sudo cp mail-trends/templates/util.py /usr/lib/python2.5/' solves the problem.
I got a problem, Python 2.5.2 on Windows XP Pro SP2 Updated.
Traceback (most recent call last):
OverflowError?: mktime argument out of rangeI fixed it by changing the 1970 on line 18 in messageinfo.py to 1972 (1971 would work well too I think) :)
But still stuck on the same problem the 'aviflax' has, and the suggestion 'jonathanbetz' says is one that I do not understand. Tried several things that could work. But they dont, so if anyone has a more precise description that would be great.
I had the "import util" problem too, all I did was to copy util.py from /mail-trends/templates/ to /mail-trends/. Windows XP SP2, Python 2.5
There is this messages came out after processing the 8000th email messages:
`Traceback (most recent call last):
File "/usr/lib64/python2.5/imaplib.py", line 986, in get_tagged_response get_response() File "/usr/lib64/python2.5/imaplib.py", line 903, in get_response get_line() File "/usr/lib64/python2.5/imaplib.py", line 996, in get_line File "/usr/lib64/python2.5/imaplib.py", line 1162, in readline socket.sslerror: (8, 'EOF occurred in violation of protocol')`Working now on Mac OS 10.4.11, using the MacPython? package after doing several of the tweaks above (esp moving the "util.py" file).
This error message appears, apparently harmless:
/Users/emv/src/mail-trends/mail-trends/Cheetah/NameMapper?.py:289: RuntimeWarning?: Python C API version mismatch for module namemapper: This Python has API version 1013, module namemapper has version 1012.
I got this message before the program shut down. Where do I find the compiled C version of namemapper? You don't have the C version of NameMapper? installed! I'm disabling Cheetah's us eStackFrames option as it is painfully slow with the Python version of NameMappe? r. You should get a copy of Cheetah with the compiled C version of NameMapper?.
I got the C-version of NameMapper? but the program is re-fetching all the email again. Is there a way to ask it to just generate the report?
if you are using ubuntu just type sudo apt-get install python-cheetah
to install the cheetah templating library.
amar.rama
Looks at the text above:
You can pass in --record and --replay as command line arguments to respectively capture and play black message fetches from the IMAP server. This is meant to aid in development by speeding up data fetches.
Jeez, this is a mess under linux, there are some typos too. I've got this diff to fix non-ssl, fix library paths and untie the code from Gmail:
Index: messageinfo.py =================================================================== --- messageinfo.py (revision 125) +++ messageinfo.py (working copy) @@ -1,6 +1,6 @@ import email -import email.utils -import email.header +import email.Utils +import email.Header import imaplib import md5 import time @@ -77,7 +77,7 @@ ccs = self.GetHeaderAll('cc') resent_tos = self.GetHeaderAll('resent-to') resent_ccs = self.GetHeaderAll('resent-cc') - all_recipients = email.utils.getaddresses( + all_recipients = email.Utils.getaddresses( tos + ccs + resent_tos + resent_ccs) # Cleaned up and uniquefied @@ -98,7 +98,7 @@ header_value = self.GetHeader(header) header_value = header_value.replace("\n", " ") header_value = header_value.replace("\r", " ") - name, address = email.utils.parseaddr(header_value) + name, address = email.Utils.parseaddr(header_value) if address: name, address = self._GetCleanedUpNameAddress(name, address) @@ -118,7 +118,7 @@ def _GetDecodedValue(self, value): try: - pieces = email.header.decode_header(value) + pieces = email.Header.decode_header(value) unicode_pieces = \ [unicode(text, charset or "ascii") for text, charset in pieces] return u"".join(unicode_pieces) @@ -165,4 +165,4 @@ def __str__(self): return "%s (size: %d, date: %s)" % ( - self.GetHeader("subject"), self.size, self.__date_string) \ No newline at end of file + self.GetHeader("subject"), self.size, self.__date_string) Index: mail.py =================================================================== --- mail.py (revision 125) +++ mail.py (working copy) @@ -6,8 +6,8 @@ import messageinfo import stringscanner -MAILBOX_GMAIL_ALL_MAIL = "[Gmail]/All Mail" -MAILBOX_GMAIL_PREFIX = "[Gmail]" +MAILBOX_GMAIL_ALL_MAIL = "INBOX" +MAILBOX_GMAIL_PREFIX = "INBOX" class Mail(object): def __init__(self, server, use_ssl, username, password, @@ -24,7 +24,7 @@ if record or replay: self.__cache = cache.FileCache() - imap_constructor = use_ssl and imaplib.IMAP4_SSL or imablib.IMAP4 + imap_constructor = use_ssl and imaplib.IMAP4_SSL or imaplib.IMAP4 logging.info("Connecting") @@ -190,4 +190,4 @@ return message_infos def __AssertOk(self, response): - assert response == "OK" \ No newline at end of file + assert response == "OK"Most was about email.header and email.utils needing to be email.Header and email.Utils. Also a imablib typo.
Now I'm stuck on this:
[2008-03-26 11:48:33,422] Connecting [2008-03-26 11:48:35,665] Logging in [2008-03-26 11:48:36,912] Selecting mailbox 'INBOX' [2008-03-26 11:48:38,319] Fetching message infos [2008-03-26 11:48:38,319] Fetching message list [2008-03-26 11:48:39,878] 6869 messages were listed [2008-03-26 11:48:39,878] Fetching info for 100 messages (100/100) [2008-03-26 11:48:42,297] Parsing replies [2008-03-26 11:48:42,472] Got 100 message infos [2008-03-26 11:48:42,472] Logging out [2008-03-26 11:48:42,627] Identifying "me" messages Traceback (most recent call last): File "main.py", line 251, in ? message_infos = GetMessageInfos(opts) File "main.py", line 102, in GetMessageInfos for name, address in message_info.GetRecipients(): File "/home/mixonic/Projects/mail-trends/messageinfo.py", line 88, in GetRecipients name, address = self._GetCleanedUpNameAddress(name, address) File "/home/mixonic/Projects/mail-trends/messageinfo.py", line 147, in _GetCleanedUpNameAddress popular_name_pair = \ TypeError: max() takes no keyword argumentsI'm no python programmer, but I do see max() getting a key:
popular_name_pair = \ max(cache[address].items(), key=lambda pair: pair[1])Though I'm not sure what's going on there, nor why such a major language feature would be different on my box than others. Oh wait, python 2.5.
Oh well. Maybe I can rewrite it in ruby :-).
I tested it on another account, but it didnt work.
When I changed the language from English (UK) to English (US) it worked. Just to let you know.
Hi all, if you read about this software in MeioBit?, you need to change your language from brazilian portuguese to English (US) to make the software work.
Se você lei sobre este software no meiobit, você vai precisar alterar o idioma do seu gmail, de português do brasil (pt_BR) para Inglês (US). Senão, não funciona.
Nice. Just worked for me with OS X 10.4 and Python 2.5
~Matt
Hi, just downloaded and played around with this on my Win XP, Python 2.5 system. I encountered into this error on the first run:
OverflowError?: mktime argument out of rangehowever, this modification eliminated the error and then the whole thing worked
According to my doc installed with Python 2.5, the 4th element in the tuple is hour and can have a value in the range 0-23, so it is strange: The time value as returned by gmtime(), localtime(), and strptime(), and accepted by asctime(), mktime() and strftime(), is a sequence of 9 integers. The return values of gmtime(), localtime(), and strptime() also offer attribute names for individual fields.
Index Attribute Values 0 tm_year (for example, 1993) 1 tm_mon range [1,12] 2 tm_mday range [1,31] 3 tm_hour range [0,23] 4 tm_min range [0,59] 5 tm_sec range [0,61]; see (1) in strftime() description 6 tm_wday range [0,6], Monday is 0 7 tm_yday range [1,366] 8 tm_isdst 0, 1 or -1; see below
I also encountered the following error when setting to --max-messages=10000 I guess it's google banning my IP. Is there a workaround for this?
[2008-03-26 21:00:22,608] Fetching info for 1000 messages (6000/7128) Traceback (most recent call last):
File "E:\Python25\Lib\imaplib.py", line 986, in get_tagged_response get_response() File "E:\Python25\Lib\imaplib.py", line 903, in get_response get_line() File "E:\Python25\Lib\imaplib.py", line 996, in get_line File "E:\Python25\Lib\imaplib.py", line 1162, in readline socket.sslerror: (8, 'EOF occurred in violation of protocol')Hello,
My problem is a time out problem, I tried manually like above [root@fennec mail-trends]# python Python 2.5.1 (r251:54863, Oct 30 2007, 13:54:11) 4.1.2 20070925 (Red Hat 4.1.2-33)? on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import imaplib >>> m=imaplib.IMAP4_SSL("imap.gmail.com")
And I never get connected ...
I tested gmail with the web interface (using firefox) and it works perfectly.
Does google blocks python access and if so how can I do a work-around ?
For windows users not having a compiled C version of namemapper: you'll find it here: http://cheetahtemplate.org/download.html
I have this mistake in result:
[2008-03-27 17:17:02,109] Connecting [2008-03-27 17:17:02,357] Logging in [2008-03-27 17:17:02,955] Selecting mailbox 'Gmail?/All Mail' Traceback (most recent call last):
AssertionError?please, mail me if u can: ihor.polyakov@ GOOGLE
Hm... I get an error as well
[2008-03-27 16:08:39,411] Initializing Traceback (most recent call last):
AssertionError?Very slick. Thank you for this useful application.
This is really cool... well done
Same error :
[2008-03-27 23:41:44,948] Connecting [2008-03-27 23:41:45,194] Logging in [2008-03-27 23:41:45,975] Selecting mailbox 'Gmail?/All Mail' Traceback (most recent call last): File "main.py", line 251, in <module> message_infos = GetMessageInfos?(opts) File "main.py", line 54, in GetMessageInfos? m.SelectAllMail?() File "/home/tyn0r/mail-trends/mail.py", line 60, in SelectAllMail? self.SelectMailbox?(MAILBOX_GMAIL_ALL_MAIL) File "/home/tyn0r/mail-trends/mail.py", line 65, in SelectMailbox? self.AssertOk?(r) File "/home/tyn0r/mail-trends/mail.py", line 193, in AssertOk? assert response == "OK" AssertionError?I'm on Ubuntu Hardy with Cheetah install by sudo apt-get install python-cheetah ... Any ideas ? Thanks
I've got Bus error ... What's that?
Any idea?
OS: FreeBSD 4.10-RELEASE Python version: 2.5 (port version) Cheetah version: 2.0.1 (port version)
@r.fluttaz - I'm getting the same problem too (Mac 10.5.2)
With python2.5 and Cheeter installed, I have applied the patch of mixo...@synitech.com and this is a good result:
Works with GMail and Google Apps.
My OS is Debian Etch.
Cheers, Carlos Hellín.
The "maximum recusion depth" error seems to be a function of the number of email messages. The only way I've worked around it is to use max_messages= and pick a number smaller than my total (~114K). I've seen this problem on both my Mac Pro (4GB RAM) and Ubuntu Linux (8GB RAM).
this seems to work for no one but its author. That's too bad, it looks really cool. I get the same issue as most here, with Assertion Errors
Traceback (most recent call last):
s imaplib.error: ALERT? Invalid credentials (Failure)Please help me. Maybe Restart computer after installing Python ?? Thanks
Worked fine for me on an MBP running OSX 10.5.2. Looking forward to a future version that works with other IMAP setups, as I don't use gmail for my primary email. I'd chip in, but I've got no python experience yet. Keep up the good work!
getting the message_infos error. Running 10.4, Fink installation, Py2.5 and just installed Cheetah...
was worried about the json dependency to python-twitter, so I installed it. No change to the python errors
File "main.py", line 251, in <module> message_infos = GetMessageInfos(opts) File "main.py", line 51, in GetMessageInfos "random_subset" in opts ...I want to change the colors of the graphs. How can I do that. I don't like the yellow color.
Hey, thanks for writing this!
I get "RuntimeError?: maximum recursion depth exceeded" when I try to run with 20000+ messages. It runs OK with 10000. I have 45000+ total, so it would be nice to get this fixed... thx
worked fine for me on gmail, but not on another imap server
Has yet to work for me. Oh, it might download the first 50,000 messages, but it never, ever, ever finishes. Tried at least 4 times. Tried stopping all other connections. Simply does not work.
Worked well for me.. thanks!
It also worked for me!! Thank you! I had to adjust it with max_messages=8000 (i have more than 14000, and i got "Maximun recursion depth exceeded" at "Extracting threads" when i try with 9000 or more messages).
Anyway, it's very nice. It could be better for me if I can get some graphs with absolute numbers, not just relative graphs.
I will keep an eye how it evolves! Thanks again!
This looks like pure awesomeness!! Running it right now.
I got it to work, in Ubuntu 7.04 (the only LiveCD I had lying around), after enabling the Universe source in Synaptic and installing both python 2.5 and python-cheetah, I was able to run this with the following command: python2.5 main.py --server=imap.gmail.com --use_ssl --username=xzy@gmail.com --me=xzy@gmail.com --skip_labels
make sure you put your own address in there. you will get prompted for your password. It completed successfully for 9018 messages. This project still needs a lot of work. I'm having a hard time believing that this was not tested in Ubuntu 7.04.
I get:
[2008-04-15 00:31:34,320] Connecting [2008-04-15 00:31:34,476] Logging in [2008-04-15 00:31:35,800] Selecting mailbox 'Gmail?/All Mail' Traceback (most recent call last):
AssertionError?That's on Ubuntu Gutsy, and occurred with both a fresh install of Cheetah and a reinstall with an apt repository.
To solve the problem: File "main.py", line 35, in GetOptsMap? assert "username" in opts_map AssertionError?
a solution is to hardcode the program parameteres. Open main.py and change lines 33 and 34: for name, value in opts:
with:
After this you just call: python main.py from the command line and you are good to go.
if you are using localized version of gmail, then you should change constant from: MAILBOX_GMAIL_ALL_MAIL = "Gmail?/All Mail" to match your localized gmail, in my case (Indonesian), it is: MAILBOX_GMAIL_ALL_MAIL = "Gmail?/Semua Email" it works for me
maybe there sould be automatic detection of localized version?
I can confirm that the localization solutions works. I couldn't get it to work for Dutch (Alle Berichten) but when I switched Gmail to English (US) before running the .py script it worked and all of my emails were processed. Anyone has a clue how to get the correct description for a certain language?
I tried to run it on 110,000 emails in my gmail account. It was using about 1GB of RAM (can something be done about this?).
After waiting for a good hour, I got the following error:
File "mail-trends\jwzthreading.py", line 54, in len
RuntimeError?: maximum recursion depth exceededI'm using Windows XP. I tried on my linux box, but it only has 512MB RAM, and the script was taking forever to run with all the swapping going on.
Did anyone find a fix for 'socket.sslerror: (8, 'EOF occurred in violation of protocol')'
error?
Thanks!
Personally, I'm running on a MBP with OS 10.5.2 and at the beginning I had some problems running the main.py. In fact I got the following error: [2008-04-15 12:05:36,281] Selecting mailbox 'Gmail?/All Mail' Traceback (most recent call last):
AssertionError?Then I tried to change the language of my mailbox from Italian to English (US) and I re-ran the script: everything is just fine now. So if you use a localized version of your Gmail, before running the script, just change your language from the prefs panel.
Artem:
I just ran into the 'socket.sslerror: (8, 'EOF occurred in violation of protocol')' when I was hacking the script to parse a local imap server.
The error is a result of a typo on line 27 of mail.py (imablib.IMAP4_SSL versus imaplib.IMAP4_SSL). Here's the diff.
I have had the "assertion" error and I've tried changin Gmail from Spanish to English and it works. I want to apply the patch described by romihardiyanto but I don't know my localized version. I'm using Spanish but 'Gmail/Spanish Mail' don't work. Neither do 'Gmail/Español Mail' because 'ñ' gives an error.
For those (jdvalentine, ihor.polyakov, r.fluttaz, chasseurmic, etc.) with the following error:
AssertionError?
I encountered the same. I discovered that the mailbox "Gmail?/All Mail" doesn't exist - turns out because I have a @googlemail.com account, rather than @gmail.com.
I simply edited lines 9 and 10 of mail.py :
MAILBOX_GMAIL_ALL_MAIL = "[Google Mail]/All Mail"
MAILBOX_GMAIL_PREFIX = "[Google Mail]"
And it worked a treat.
freezingkiwis, what is your Google Mail language?
See, Open source really works... So many people around have improved and fixed this up!
By the way, Word for me on Ubuntu In single "go" The commands to run, sudo easy_install-2.5 cheetah svn checkout http://mail-trends.googlecode.com/svn/trunk/ mail-trends And the python main.py...
I get the following error ` Traceback (most recent call last):
` File "/usr/lib/python2.5/imaplib.py", line 807, in check_bye imaplib.abort: System ErrorIn reference to some of the previous comments, I've gotten mail-trends working with Python 2.4 as there's actually only one 2.5 dependency, in messageinfo.py:148 there's a lambda:
popular_name_pair = \ max(cache[address].items(), key=lambda pair: pair[1])A simple try/except fixes things:
try: popular_name_pair = \ max(cache[address].items(), key=lambda pair: pair[1]) except TypeError: import operator popular_name_pair = \ sorted(cache[address].items(), reverse=True, key=operator.itemgetter(1))[0]Past 100 messages or so, I get the "maximum recursion depth exceeded" error. What is the solution?