View previous topic :: View next topic |
Author |
Message |
lostinspace2011 Apprentice
Joined: 09 Sep 2005 Posts: 230
|
Posted: Wed Oct 20, 2010 2:54 pm Post subject: Postfix conversation with .. timed out while sending message |
|
|
I managed to run my mail server for some time without any issues. In my setup I am using Postfix, Amavisd, Spamassassing and Clamd as described in http://www.gentoo.org/doc/en/mailfilter-guide.xml. However for the past two days I have been struggling with message getting stuck in the outgoing queue.
sendmail -bp shows the following messages for some (most) of my outgoing messages.
Quote: |
conversation with mx1.bt.mail.yahoo.com[212.82.111.207] timed out while sending message body
...
lost connection with mx292.emailfiltering.com[194.116.199.31] while sending end of data -- message may be sent more than once
...
(host nk11b01-smtp-mx004.mac.com[17.148.17.44] said: 421 4.4.2 Timeout while waiting for command. (in reply to end of DATA command))
...
-- 407 Kbytes in 6 Requests
|
Most of the messages are very small (<10K) and one is larger (>350K), so I don't think this problem is linked to the size of the message.
I have tried to disable amavisd by comenting out the following in /etc/postfix/main.cf
Code: | #content_filter = smtp-amavis:[localhost]:10024 |
However this did not resolve the problem either. I can see outbound traffic and some of the message seem to still get received at their destination. However the message is still shown in the outgoing messages.
I have already tried sending messages via telnet directly to the same destination address and this works, so I am lead to rule out a network problem. I also tried using telnet to send via my own server using port 25, 10024 (amavisd) and 100025 (smtp-delivery), however the message ended up getting stuck every time. I also tried this with firewall on and off with the same results and since telnet to the destination address works, I don't think this is a network issue.
Any other suggestions on what I can do to debug and analyse the cause of this problem.
Thanks in advance
Alex |
|
Back to top |
|
|
Anarcho Advocate
Joined: 06 Jun 2004 Posts: 2970 Location: Germany
|
Posted: Wed Oct 20, 2010 3:21 pm Post subject: |
|
|
There are suggestions on the net to lower the MTU to 1400 or even 1000 and try again.
If this helps, maybe your server doesn't get the ICMP messages for path MTU discovery?
See also: http://www.postfix.org/faq.html#timeouts _________________ ...it's only Rock'n'Roll, but I like it! |
|
Back to top |
|
|
lostinspace2011 Apprentice
Joined: 09 Sep 2005 Posts: 230
|
Posted: Wed Oct 20, 2010 4:35 pm Post subject: Fixed.. but not sure why |
|
|
I decided to leave this issue for now and do something else for an hour or two. And when I got back all messages were sent. I then re-enabled the amavis integration and tried sending new messages. All went without any issue. Not sure what causes this and I really don't understand why. Maybe it was just some strange network congestion which only affected certain email recipients. Oh well. Still I will keen an eye on this one for a couple of days.
Thanks for your help and the link
Alex |
|
Back to top |
|
|
lostinspace2011 Apprentice
Joined: 09 Sep 2005 Posts: 230
|
Posted: Sat Oct 23, 2010 4:25 pm Post subject: Problem persists |
|
|
It would appear after some time the problem came back. I haven't been able to figure out exactly what is causing yet. So I am still looking for any other suggestion on what can be done about this.
Could it be that one of the timeout settings in postfix is too small:
postconf |grep timeout
Code: |
smtp_connect_timeout = 30s
smtp_data_done_timeout = 600s
smtp_data_init_timeout = 120s
smtp_data_xfer_timeout = 180s
smtp_helo_timeout = 300s
smtp_mail_timeout = 300s
smtp_quit_timeout = 300s
smtp_rcpt_timeout = 300s
smtp_rset_timeout = 20s
smtp_starttls_timeout = 300s
smtp_tls_session_cache_timeout = 3600s
smtp_xforward_timeout = 300s
smtpd_policy_service_timeout = 100s
smtpd_proxy_timeout = 100s
smtpd_starttls_timeout = 300s
smtpd_timeout = 300s
smtpd_tls_session_cache_timeout = 3600s
|
Thanks in advance
Alex |
|
Back to top |
|
|
lostinspace2011 Apprentice
Joined: 09 Sep 2005 Posts: 230
|
Posted: Sun Oct 24, 2010 11:29 am Post subject: Small progress |
|
|
It would appear that the message are timing out after 5 minutes (300 seconds).
Quote: |
messages:Oct 24 11:03:53 bumblebee amavis[5533]: (05533-05) sending SMTP response: "250 2.0.0 Ok, id=05533-05, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as E1EFB6283CC"
messages:Oct 24 11:03:53 bumblebee postfix/smtp[8654]: 169A66283C0: to=<REMOVED@iinet.net.au>, relay=localhost[127.0.0.1]:10024, delay=2.9, delays=0.07/0.01/0.02/2.8, dsn=2.0.0, status=sent (250 2.0.0 Ok, id=05533-05, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as E1EFB6283CC)
messages:Oct 24 11:08:55 bumblebee postfix/smtp[8660]: E1EFB6283CC: to=<REMOVED@iinet.net.au>, relay=as-av.iinet.net.au[203.0.178.180]:25, delay=302, delays=0.03/0.02/0.58/301, dsn=4.4.2, status=deferred (lost connection with as-av.iinet.net.au[203.0.178.180] while sending end of data -- message may be sent more than once)
|
However this message is rather small
Quote: |
E1EFB6283CC 10631 Sun Oct 24 11:03:53 REMOVED @ sender.xyz (lost connection with as-av.iinet.net.au[203.0.178.180] while sending end of data -- message may be sent more than once)
REMOVED @ iinet.net.au
|
I'd say it must be some sort of network issue, but it is affected most of my messages but not all. I managed to send a 18Mb email to a test account as well as received it back, so I don't think it is network related. I did some ping measurements to this particular server with the following results
Quote: | bumblebee ~ # ping 203.0.178.180
PING 203.0.178.180 (203.0.178.180) 56(84) bytes of data.
64 bytes from 203.0.178.180: icmp_req=1 ttl=60 time=20.4 ms
64 bytes from 203.0.178.180: icmp_req=2 ttl=60 time=19.3 ms
64 bytes from 203.0.178.180: icmp_req=3 ttl=60 time=19.0 ms
64 bytes from 203.0.178.180: icmp_req=4 ttl=60 time=18.2 ms
64 bytes from 203.0.178.180: icmp_req=5 ttl=60 time=21.1 ms
64 bytes from 203.0.178.180: icmp_req=6 ttl=60 time=18.9 ms
64 bytes from 203.0.178.180: icmp_req=7 ttl=60 time=19.9 ms
^C
--- 203.0.178.180 ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6026ms
rtt min/avg/max/mdev = 18.278/19.606/21.192/0.935 ms
|
I eventually tried to change the MTU on my ADSL router to 1400 and issues sendmail -q to retry message delivery. Shortly afterwards all message had been delivered.
I then disabled MTU path discovery using the following command and set the MTU back to 1492.
Code: | sysctl -w net.ipv4.ip_no_pmtu_disc=1 |
Then I resent some test messages, all of which were delivered. Upon further testing other messages got stuck again and were only delivered after changing the MTU to 1400.
Thanks in advance
Alex |
|
Back to top |
|
|
lostinspace2011 Apprentice
Joined: 09 Sep 2005 Posts: 230
|
Posted: Mon Oct 25, 2010 1:01 pm Post subject: More progress |
|
|
Using MTU of 1492 on my router I get:
Code: |
bumblebee log # tracepath -n web47.justhost.com
1: 192.168.0.3 0.223ms pmtu 1500
1: 192.168.0.1 0.976ms
1: 192.168.0.1 0.883ms
2: 192.168.0.1 0.966ms pmtu 1492
2: 203.215.5.244 36.195ms
3: 203.215.4.18 80.277ms
4: 203.215.20.68 91.846ms asymm 5
5: 114.31.193.237 93.607ms asymm 6
6: 114.31.193.237 96.809ms
7: 208.178.246.85 296.737ms asymm 8
8: 208.178.246.85 368.962ms
9: 69.31.111.94 313.895ms asymm 13
10: 69.31.111.94 315.777ms asymm 13
11: 99.198.126.118 317.966ms asymm 14
12: no reply
13: no reply
|
And using an MTU of 1400 on my router I get:
Code: |
bumblebee log # tracepath -n web47.justhost.com
1: 192.168.0.3 0.233ms pmtu 1500
1: 192.168.0.1 0.997ms
1: 192.168.0.1 0.902ms
2: 192.168.0.1 0.983ms pmtu 1400
2: 203.215.5.244 34.550ms
3: 203.215.4.36 35.384ms
4: 203.215.20.6 88.938ms
5: 203.215.20.148 92.369ms asymm 4
6: 114.31.199.58 293.905ms
7: 114.31.199.58 296.088ms asymm 6
8: 69.31.110.229 314.788ms asymm 12
9: 69.31.110.229 316.066ms asymm 12
10: 99.198.126.118 314.154ms asymm 14
11: 99.198.126.118 315.451ms asymm 14
12: no reply
13: no reply
|
Not quite sure what this means though. Will be consulting the man pages.
Last edited by lostinspace2011 on Mon Oct 25, 2010 3:34 pm; edited 1 time in total |
|
Back to top |
|
|
lostinspace2011 Apprentice
Joined: 09 Sep 2005 Posts: 230
|
Posted: Mon Oct 25, 2010 3:28 pm Post subject: Confusion prevails |
|
|
After playing with various MTU values on both the router trying 1400 on both, then various permutation of PPPoA and PPPoE messages were still getting stuck. At this point setting my servers MTU back to 1500, the routers to 1400 and issuing sendmail -q did not delivery the messages. In this case a reboot and some patience resolved the problem.
Still is very confusing and there seem to be several factors contributing.
Alex |
|
Back to top |
|
|
|