Someday, during the Python Tuesdays event at Calango Hacker Club, one of the participants came with a question on how to automate its email reading process at Gmail categorizing their messages into something that could be important and something that is not. That question motivated me looking after how to that, and then into this post that maybe can help someone that is trying to do something similar with Python.
The email Protocols
Email systems are robust because they rely upon well-established protocols used by all email platforms across the internet. These protocols are defined and published by documents called RFC that stands for Request for Comments and resemble the PEPs from Python, but for protocols and patterns that define the operation of communication systems through the internet.
To send emails, we use the SMTP protocol (RFCs 821 e 5321), and to receive them, we use IMAP (RFC 3501) or POP (RFCs 918 e 1081) protocols.
Sending emails
Simple Mail Transfer Protocol or SMTP is a protocol from the application layer used to send emails. It operates over TCP/IP, and your communication is often made using the port 25 or 587 for unencrypted connections and 465 for TLS/SSL
encrypted connections.
Python comes with batteries included, so we don’t need to install any external package to deal with SMTP. We use a library called smtplib:
The code above is a raw version of an email sent. We built the message from the text so you can see the basic structure of an email message, but Python has a library to create more complex emails.
We are sending the email from a local server, and it can be an issue because the email network works as a chain of trust. If your local server doesn’t follow the best practices of email sending, or if the chain doesn’t know your IP, your messages can be marked as SPAM, or be dropped by the receiver.
To solve these questions, and to grant its delivery, let’s change the code a little to use Gmail as our gateway:
import smtplib
from email.mime.text import MIMEText
# connect with Google's serverssmtp_ssl_host ='smtp.gmail.com'smtp_ssl_port =465# use username or email to log inusername ='origin@gmail.com'password ='password'from_addr ='origin@gmail.com'to_addrs = ['destiny@gmail.com']
# the email lib has a lot of templates# for different message formats,# on our case we will use MIMEText# to send only textmessage = MIMEText('Hello World')
message['subject'] ='Hello'message['from'] = from_addr
message['to'] =', '.join(to_addrs)
# we'll connect using SSLserver = smtplib.SMTP_SSL(smtp_ssl_host, smtp_ssl_port)
# to interact with the server, first we log in# and then we send the messageserver.login(username, password)
server.sendmail(from_addr, to_addrs, message.as_string())
server.quit()
In case you have problems to connect at Google, you need to enable the “Less secure app access”.
Google blocks access from apps that it judges as not following its security standards, the problem is they don’t have a clear explanation of what these standards are, besides that it’s not a trivial task for whom is starting and doing the first test to struggle with that.
Now we can use Google as our gateway to send the messages, but keep in mind that even with Google as your gateway there is a policy of email sending, so be careful with SPAMS.
Receiving emails
The IMAP Internet Message Access Protocol is used to receive emails, and as on SMTP it operates at the application layer over TCP/IP. The port used for its connections is 143 for unencrypted and 993 for encrypted.
Another protocol that works for this task is POP Post Office Protocol, but IMAP is better due to its synchronization between the client and the server and also the ability to access more than the email inbox.
The process of receiving emails is more complicated than sending because you also have to search for the message and decode it:
import email
import imaplib
EMAIL ='mymail@mail.com'PASSWORD ='password'SERVER ='imap.gmail.com'# connect to the server and go to its inboxmail = imaplib.IMAP4_SSL(SERVER)
mail.login(EMAIL, PASSWORD)
# we choose the inbox but you can select othersmail.select('inbox')
# we'll search using the ALL criteria to retrieve# every message inside the inbox# it will return with its status and a list of idsstatus, data = mail.search(None, 'ALL')
# the list returned is a list of bytes separated# by white spaces on this format: [b'1 2 3', b'4 5 6']# so, to separate it first we create an empty listmail_ids = []
# then we go through the list splitting its blocks# of bytes and appending to the mail_ids listfor block in data:
# the split function called without parameter# transforms the text or bytes into a list using# as separator the white spaces:# b'1 2 3'.split() => [b'1', b'2', b'3'] mail_ids += block.split()
# now for every id we'll fetch the email# to extract its contentfor i in mail_ids:
# the fetch function fetch the email given its id# and format that you want the message to be status, data = mail.fetch(i, '(RFC822)')
# the content data at the '(RFC822)' format comes on# a list with a tuple with header, content, and the closing# byte b')'for response_part in data:
# so if its a tuple...if isinstance(response_part, tuple):
# we go for the content at its second element# skipping the header at the first and the closing# at the third message = email.message_from_bytes(response_part[1])
# with the content we can extract the info about# who sent the message and its subject mail_from = message['from']
mail_subject = message['subject']
# then for the text we have a little more work to do# because it can be in plain text or multipart# if its not plain text we need to separate the message# from its annexes to get the textif message.is_multipart():
mail_content =''# on multipart we have the text message and# another things like annex, and html version# of the message, in that case we loop through# the email payloadfor part in message.get_payload():
# if the content type is text/plain# we extract itif part.get_content_type() =='text/plain':
mail_content += part.get_payload()
else:
# if the message isn't multipart, just extract it mail_content = message.get_payload()
# and then let's show its result print(f'From: {mail_from}')
print(f'Subject: {mail_subject}')
print(f'Content: {mail_content}')
On this code, we extracted only the plain text of the email, but there’s much more you can extract, like the HTML content and the annexed files, but this can be a topic for a next post.
BONUS: Automate the cleaning of bloated mailboxes
Recently, my mailbox got bloated, and I had to do a clean up to get some space, but the high amount of messages overloaded the Gmail interface.
So, to solve that, and also for automating this tedious manual task, I wrote this:
import imaplib
# caso sua caixa de emails esteja muito cheia# use esta variável para aumentar o limite do# tamanho da respostaimaplib._MAXLINE =1000000EMAIL ='mymail@gmail.com'PASSWORD ='password'SERVER ='imap.gmail.com'mail = imaplib.IMAP4_SSL(SERVER)
mail.login(EMAIL, PASSWORD)
# select the box you want to cleanmail.select('bloated_box')
status, search_data = mail.search(None, 'ALL')
mail_ids = []
for block in search_data:
mail_ids += block.split()
# define the range for the operationstart = mail_ids[0].decode()
end = mail_ids[-1].decode()
# move the emails to the trash# this step is Gmail specific because# it doesn't allow excluding messages# outside the trashmail.store(f'{start}:{end}'.encode(), '+X-GM-LABELS', '\\Trash')
# access the Gmail trashmail.select('[Gmail]/Trash')
# mark the emails to be deletedmail.store("1:*", '+FLAGS', '\\Deleted')
# remove permanently the emailsmail.expunge()
# close the mailboxesmail.close()
# close the connectionmail.logout()
In case the number of messages is too big, you can make a loop to group it into smaller chunks to remove.
For more elaborated searches using the search search command, you can look at the RFC 3501 that defines the IMAP protocol and describes all possible search parameters, and other commands with details.