I run a community social network site with around 15,000 users. It is built around forums that can be participated in via web or email, so we generate a lot of messages. It's community supported, no advertising, no spam - email traffic is 100% its own users messaging each other (i.e. not unsolicited nor commercial). It's been running for several years, but we had a recent rewrite that resulted in massive performance improvements, including to email send rates, along with enormously improved standards compliance. Since then we've had problems delivering to all hotmail/outlook/office365 domains.
Old system:
- No bounce handling - system was sending over 4 million bouncing messages per month!
- Ignored remote receiver limits - messages would retry continuously for weeks
- Often massive message sizes, up to 25Mb each!
- A bug meant some messages were sent in duplicate, causing up to 250,000 excess messages
- No SPF compliance - every message was sent with a forged from address!
- No DKIM
- No TLS
- Forward and reverse DNS did not match
- Limited double-opt-in - users could receive messages at unverified addresses, but not post.
Despite this terrible spammer-like behaviour, we had little or no trouble delivering to hotmail.
New system:
- Bounce handling - every bounce is handled immediately and addresses removed from lists as appropriate - our real bounce rate is now close to 0.
- Full deferral support, well behaved queuing of retries with exponential back-off.
- Small messages, no attachments, linked images only from sender domain, all over TLS (with HSTS & HPKP).
- Full SPF compliance - no forgery of from addresses.
- Every message is DKIM-signed.
- Full, verifiable TLS in and outbound
- Forward and reverse DNS match
- Full double-opt-in
So, you'd have thought that such a significant improvement in sending behaviour would improve deliverability, but as far as outlook/hotmail is concerned, the opposite is true. To start with we received rate limit blocks (no deferrals, just complete blocks), so we dropped our mail server's overall concurrency (across ALL domains) to 1, but it made no difference. I understand rate limits are necessary, but they should be implemented with deferrals, not blocks, and we do simply have a fairly large number of messages to send.
Because we now handle bounces properly, this has essentially resulted in unsubscribing every user from across all of these MS mail services, our traffic to hotmail has dropped to near 0, and we've seen a massive drop in activity, to much complaining from our users.
We have had some deferrals from yahoo, but nothing major, and no issues with other ISPs, but due to their popularity, hotmail etc counts for a fair proportion of our users.
What we are seeing now is straightforward black-holing - messages are accepted, but are never delivered to mailboxes, even if the sender's address is in my address book. Here's a (redacted) mail server log from a message I sent myself today, showing a perfectly good delivery that simply never showed up in my mailbox (or my junk mailbox):
Aug 18 00:02:01 [INFO] [-] [core] [outbound] Created transaction: 295909F6-3510-49D1-BB62-1BE01D8984A7
Aug 18 00:02:01 [DEBUG] [outbound] running send_email hooks
Aug 18 00:02:01 [DEBUG] [outbound] Sending mail: 1471478521810_0_20414_4326.cham01.example.com
Aug 18 00:02:01 [DEBUG] [outbound] running get_mx hooks
Aug 18 00:02:01 [INFO] [outbound] Looking up AAAA records for: mx2.hotmail.com
Aug 18 00:02:01 [ERROR] [outbound] DNS lookup of mx2.hotmail.com failed: Error: queryAaaa ENODATA mx2.hotmail.com
Aug 18 00:02:01 [INFO] [outbound] Looking up A records for: mx2.hotmail.com
Aug 18 00:02:01 [INFO] [outbound] Attempting to deliver to: 65.55.92.184:25 (0) (27)
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 220 SNT004-MC4F11.hotmail.com Sending unsolicited commercial or bulk e-mail to Microsoft's computer network is prohibited. Other restrictions are found at http://privacy.microsoft.com/en-us/anti-spam.mspx. Wed, 17 Aug
2016 17:02:02 -0700 \r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] C: EHLO example.com
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-SNT004-MC4F11.hotmail.com (3.22.0.9) Hello [92.243.0.0]\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-SIZE 36909875\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-PIPELINING\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-8bitmime\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-BINARYMIME\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-CHUNKING\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-STARTTLS\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-AUTH LOGIN\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-AUTH=LOGIN\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250 OK\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] C: STARTTLS
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 220 SMTP server ready\r\n
Aug 18 00:02:02 [INFO] [outbound] secured: cipher=ECDHE-RSA-AES256-SHA384 version=TLSv1/SSLv3 verified=true cn="*.hotmail.com" organization="undefined" issuer="Microsoft Corporation" expires="Apr 14 00:34:04 2018 GMT" fingerprint=A1:AD:33:1F:DD:42:DF:3E:33:89:C9:A6:39:DC:90:6C:EE:45:B8:4B
Aug 18 00:02:02 [PROTOCOL] [outbound] C: EHLO example.com
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-SNT004-MC4F11.hotmail.com (3.22.0.9) Hello [92.243.0.0]\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-SIZE 36909875\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-PIPELINING\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-8bitmime\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-BINARYMIME\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-CHUNKING\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-AUTH LOGIN\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250-AUTH=LOGIN\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] S: 250 OK\r\n
Aug 18 00:02:02 [PROTOCOL] [outbound] C: MAIL FROM:
There should be enough identifiers in there for someone with sufficient karma to track this down.
How can I fix this? It seems utterly ridiculous that we are being penalised so violently for improving our sending behaviour.
Thanks for listening!
Update: I have also submitted an E-form request to outlook support about this.