Limits on address expansion
19 answers - 685 bytes -

Hi,
I am adding something to an exim config that could result in a large number
(20,000) email addresses. It is a secondary college and the ability to
mail to a class or all students is wanted. The expansion will be done
through class-list lookup in a mysql database. Several other complications
of course (who can do it, don't auto-create new cyrus accounts, ).
Is this something that exim (on a reasonably powerful Linux box) could handle ?
I can't see anything in the documentation that talks about this.
Most expansions would be much smaller - less than 100, but 'all' is quite big but
I don't expect to happen often.
No.1 | | 818 bytes |
| 
Fri, 2007-02-09 at 14:28 +0000, Alain Williams wrote:
I am adding something to an exim config that could result in a large number
(20,000) email addresses. It is a secondary college and the ability to
mail to a class or all students is wanted. The expansion will be done
through class-list lookup in a mysql database. Several other complications
of course (who can do it, don't auto-create new cyrus accounts, ).
Is this something that exim (on a reasonably powerful Linux box) could handle ?
I can't see anything in the documentation that talks about this.
Exim will process the recipients of the message serially, so it may take
quite a while for the last student to get his copy. but Exim certainly
won't have a problem with it, nor make a problem for its host.
No.2 | | 1220 bytes |
| 
Sat, 10 Feb 2007 04:21:29 +0100, Kjetil Torgrim Homme
<kjetilho (AT) ifi (DOT) uio.nowrote:
Fri, 2007-02-09 at 14:28 +0000, Alain Williams wrote:
>I am adding something to an exim config that could result in a large number
>(20,000) email addresses. It is a secondary college and the ability to
>mail to a class or all students is wanted. The expansion will be done
>through class-list lookup in a mysql database. Several other complications
>of course (who can do it, don't auto-create new cyrus accounts, ).
>
>Is this something that exim (on a reasonably powerful Linux box) could handle ?
>I can't see anything in the documentation that talks about this.
>
>Exim will process the recipients of the message serially, so it may take
>quite a while for the last student to get his copy.
I was just bitten by this, and a postfix user of course immediately
claimed that postfix does not have a message-level lock and will
deliver to multiple recipients of a single message simultaneously.
This might be off-topic here, but can somebody say whether this is
true?
Greetings
Marc
No.3 | | 1246 bytes |
| 
Saturday 10 February 2007 04:21, Kjetil Torgrim Homme wrote:
Fri, 2007-02-09 at 14:28 +0000, Alain Williams wrote:
I am adding something to an exim config that could result in a large
number (20,000) email addresses. It is a secondary college and the
ability to mail to a class or all students is wanted. The expansion will
be done through class-list lookup in a mysql database. Several other
complications of course (who can do it, don't auto-create new cyrus
accounts, ).
Is this something that exim (on a reasonably powerful Linux box) could
handle ? I can't see anything in the documentation that talks about this.
Exim will process the recipients of the message serially, so it may take
quite a while for the last student to get his copy. but Exim certainly
won't have a problem with it, nor make a problem for its host.
Exim will route all recipients of the message serially before actually
delivering (transporting), so it may take a long while for any recipient to
get their copy, depending on how many recipients are remote. After routing,
local deliveries are performed serially, while up to remote_max_parallel
remote deliveries are performed concurrently.
No.4 | | 2899 bytes |
| 
Sat, 10 Feb 2007, Magnus Holmgren wrote:
Exim will process the recipients of the message serially, so it may take
quite a while for the last student to get his copy. but Exim certainly
won't have a problem with it, nor make a problem for its host.
Exim will route all recipients of the message serially before actually
delivering (transporting), so it may take a long while for any recipient to
get their copy, depending on how many recipients are remote. After routing,
local deliveries are performed serially, while up to remote_max_parallel
remote deliveries are performed concurrently.
Yes, that's right. of the reasons for doing all the routing first is
that you can then discover how many of the recipients are routed to the
same servers - and send just a single copy for them. So if all your
20000 recipients are at hotmail, you don't send 20000 copies of the
message.
Another reason for doing all the routing first is that sometimes, the
result of routing causes a rewrite of one or more of the headers. This
can happen if the configuration allows non-FQDN domains. This is not
something that I recommend, but it is something that certain sites used
to do - and some still do (there was a very recent query about it). For
example, if a message is sent from x@localname1 to y@localname2,
containing
From: x@localname1
To: y@localname2
then the DNS lookups for localname1 and localname2 can be configured to
expand them to localname1.something.tld and localname2.something.tld
respectively. Exim will then rewrite the From: and the To:
appropriately. This is particularly important if there are off-site
recipients, because you should only send FQDNs off-site.
In order for this to work, all the routing must be done before any of
the deliveries. In practice, Exim does not hold up delivery if some of
the addresses cannot be routed (DNS unreachable, for example). This is a
compromise, not perfect, but "good enough" solution.
course, in principle, the routing could be done in parallel while
still doing all the routing before any delivering. However, I do tend to
go for the "keep it simple" solution where I can. In fact, the original
releases of Exim also did all deliveries serially, but that was made
more complicated so that multiple remote deliveries could proceed in
parallel, as Magnus says.
Doing parallel routing might be in danger of losing the
same_domain_copy_routing optimization.
In practice, people who are delivering to thousands of recipients
usually use some kind of mailing list software, and such applications
can be configured to limit the number of recipients per message to a
hundred or so. If you do that, you get the benefits of parallel routing
from the multiple messages.
No.5 | | 449 bytes |
| 
Sat, 10 Feb 2007 17:05:45 +0100, Magnus Holmgren
<holmgren (AT) lysator (DOT) liu.sewrote:
>After routing,
>local deliveries are performed serially, while up to remote_max_parallel
>remote deliveries are performed concurrently.
How does the concurrent delivery to remote hosts take place? A second
queue runner is not going to pick up the message as it is locked.
Greetings
Marc
No.6 | | 328 bytes |
| 
Wed, 14 Feb 2007, Marc Haber wrote:
How does the concurrent delivery to remote hosts take place?
The delivery process forks more than one subprocess. See section 1.13 in
the manual ("Delivery in detail").
A second queue runner is not going to pick up the message as it is
locked.
Not relevant.
No.7 | | 178 bytes |
| 
Wed, 14 Feb 2007, Philip Hazel wrote:
The delivery process forks more than one subprocess. See section 1.13 in
the manual ("Delivery in detail").
I meant 3.13.
No.8 | | 483 bytes |
| 
Wed, 14 Feb 2007 15:52:47 +0000 (GMT), Philip Hazel
<ph10 (AT) hermes (DOT) cam.ac.ukwrote:
Wed, 14 Feb 2007, Marc Haber wrote:
>How does the concurrent delivery to remote hosts take place?
>
>The delivery process forks more than one subprocess. See section 1.13 in
>the manual ("Delivery in detail").
I see. IM, the default is way too low, makes exim look bad when
compared with postfix.
Greetings
Marc
No.9 | | 708 bytes |
| 
14/02/07, Marc Haber <mh+exim-users (AT) zugschlus (DOT) dewrote:
Wed, 14 Feb 2007 15:52:47 +0000 (GMT), Philip Hazel
<ph10 (AT) hermes (DOT) cam.ac.ukwrote:
Wed, 14 Feb 2007, Marc Haber wrote:
>How does the concurrent delivery to remote hosts take place?
>
>The delivery process forks more than one subprocess. See section 1.13 in
>the manual ("Delivery in detail").
>
I see. IM, the default is way too low, makes exim look bad when
compared with postfix.
With respect, Exim looking good when compared with Postfix should not
be an objective. Being good with respect to its own design principles
should.
Peter
No.10 | | 1581 bytes |
| 
Wed, 14 Feb 2007, Marc Haber wrote:
>The delivery process forks more than one subprocess. See section 1.13 in
>the manual ("Delivery in detail").
I see. IM, the default is way too low, makes exim look bad when
compared with postfix.
There is no sensible default that will suit all installations and all
requirements. The current default of 2 was set a long time ago -
changing it would affect existing sites when they upgrade. It is
perfectly reasonable for the vast majority of personal email, where
messages rarely have more than 2 recipients.
Sites that are doing predominantly mailing-list deliveries (with lots of
recipients per message) have normally got to look carefully at their
configurations in any case.
A thought: I wonder, from your previous mention of queue runners, if you
think that the queue runner is the *only* way messages are delivered? If
so, you haven't quite understood the way Exim (as normally configured)
works. If 20 messages arrive at once, all 20 will immediately be
delivered, using up to 40 simultaneous outgoing connections (with
remote_max_parallel at the default of 2). The queue runners work only on
messages that have previously had a temporary delivery error.
I said "as normally configured", because it is possible to make Exim
behave otherwise. If you set queue_only=true, then messages do indeed
sit on the queue and are delivered one at a time by a queue runner
(though you can have multiple queue runners).
No.11 | | 1399 bytes |
| 
Thu, 15 Feb 2007 09:37:23 +0000 (GMT), Philip Hazel
<ph10 (AT) hermes (DOT) cam.ac.ukwrote:
>A thought: I wonder, from your previous mention of queue runners, if you
>think that the queue runner is the *only* way messages are delivered? If
>so, you haven't quite understood the way Exim (as normally configured)
>works. If 20 messages arrive at once, all 20 will immediately be
>delivered, using up to 40 simultaneous outgoing connections (with
>remote_max_parallel at the default of 2). The queue runners work only on
>messages that have previously had a temporary delivery error.
The system in question is running with queue_only = true to take
advantage of the two-stage queue running process invoked by exim -qq.
Most of my setups use real mailing list software which batches up the
outgoing mail into many messages with a few hundred recipients each.
The current case, where an alias is used to expand a single address
into a list of 20K recipients, is a "first" for me and I was
dumbfounded by exim not having delivered to the first thousand
recipients after an hour.
Unfortunately, the system owner decided to replace exim with postfix
before I found out about remote_max_parallel. Postfix delivers to all
20K recipients in like five minutes.
Greetings
Marc
No.12 | | 1811 bytes |
| 
Thu, 15 Feb 2007, Marc Haber wrote:
The system in question is running with queue_only = true to take
advantage of the two-stage queue running process invoked by exim -qq.
That is an unusual configuration, for which the default settings were
not designed. Not only is remote_max_parallel relevant, but
same_domain_copy_routing is probably wanted here. You would find that if
you grepped the spec for "mailing list".
Most of my setups use real mailing list software which batches up the
outgoing mail into many messages with a few hundred recipients each.
Indeed, that is sensible.
The current case, where an alias is used to expand a single address
into a list of 20K recipients, is a "first" for me and I was
dumbfounded by exim not having delivered to the first thousand
recipients after an hour.
You are lucky it works at all - not Exim, which should manage 20K
recipients - but if you try to send a single copy somewhere with more
than a few hundred recipients, it may get rejected. RFC 2921 says:
recipients buffer
The minimum total number of recipients that must be buffered is
100 recipients. Rejection of messages (for excessive recipients)
with fewer than 100 RCPT commands is a violation of this
specification.
More than 100 recipients in the same domain is therefore risky. Another
point is that, with 20K recipients, Exim will use a lot of main memory
(but that may not be an issue in these days of multi-gigabyte memories).
Unfortunately, the system owner decided to replace exim with postfix
before I found out about remote_max_parallel. Postfix delivers to all
20K recipients in like five minutes.
So Postfix is clearly optimized well by default for this application.
No.13 | | 1267 bytes |
| 
Thu, Feb 15, 2007 at 09:37:23AM +0000, Philip Hazel wrote:
A thought: I wonder, from your previous mention of queue runners, if you
think that the queue runner is the *only* way messages are delivered? If
so, you haven't quite understood the way Exim (as normally configured)
works. If 20 messages arrive at once, all 20 will immediately be
delivered, using up to 40 simultaneous outgoing connections (with
remote_max_parallel at the default of 2). The queue runners work only on
messages that have previously had a temporary delivery error.
Is there any reason why local delivery is single treaded ? With modern
multi CPU/core machines is might be nice to have a local_max_parallel
option. Especially where address expansion results in lots of local addresses.
This is my situation -- 4 CPU box & local delivery is to cyrus which will have 'wait'
time while it is speaking to active directory; so a certain amount of parallel
local delivery (in addition to batch_max) would probably make sense.
PS: in answer to my question that kicked this thread off, my current worst case expansion
works like a dream, some 8,500 local addresses. 'current' since I think that this
may rise.
No.14 | | 1801 bytes |
| 
Thu, 15 Feb 2007, Alain Williams wrote:
Is there any reason why local delivery is single treaded ? With modern
multi CPU/core machines is might be nice to have a local_max_parallel
option. Especially where address expansion results in lots of local addresses.
The original implementation of Exim was single threaded for all
deliveries; first the local ones (each in a separate subprocess, for
security, which was waited for before continuting), then the remote ones
(in the main delivery process, which changed from root to exim before
doing this).
Then remote_max_parallel was added - in Exim 3 it forked separate
delivery processes only if there was more than one remote recipient. In
Exim 4, it always forks, so the main delivery process no longer has to
change its uid.
The motivation for parallelizing remote deliveries was that a single
delivery can take a long time, so even with as few as two remote
recipients (to different servers), parallelism is a gain. My feeling at
the time was that local deliveries are quick, so I didn't bother to
change them. As far as I know, nobody else has questioned this.
Multiple local deliveries into normal file mailboxes will often be
hitting the same file system - whether this matters or not, I don't
know.
This is my situation -- 4 CPU box & local delivery is to cyrus which
will have 'wait' time while it is speaking to active directory; so a
certain amount of parallel local delivery (in addition to batch_max)
would probably make sense.
How are you delivering to cyrus? Using the lmtp transport? I suppose
that is the one local transport that could cause substantial delays.
(I guess pipe could too, if the pipe fills up)
No.15 | | 1051 bytes |
| 
Thu, Feb 15, 2007 at 03:18:33PM +0000, Philip Hazel wrote:
This is my situation -- 4 CPU box & local delivery is to cyrus which
will have 'wait' time while it is speaking to active directory; so a
certain amount of parallel local delivery (in addition to batch_max)
would probably make sense.
How are you delivering to cyrus? Using the lmtp transport? I suppose
that is the one local transport that could cause substantial delays.
(I guess pipe could too, if the pipe fills up)
Cyrus, yes via lmtp. I have only tested it so far with 6 deliveries, and
it has all done in less than 1 second. I suspect that a delay may be cyrus/sasl
talking to M$ Active Directory (which is on another machine of course).
I do NT know if there will be a problem, the largest 'common' delivery
that I expect is about 200, it prob doesn't matter if the 'all' (of several
thousand) takes some time.
I will post results here (along with hardware description) when I know them.
No.16 | | 2489 bytes |
| 
Thu, 15 Feb 2007 12:18:12 +0000 (GMT), Philip Hazel
<ph10 (AT) hermes (DOT) cam.ac.ukwrote:
Thu, 15 Feb 2007, Marc Haber wrote:
>The system in question is running with queue_only = true to take
>advantage of the two-stage queue running process invoked by exim -qq.
>
>That is an unusual configuration, for which the default settings were
>not designed. Not only is remote_max_parallel relevant, but
>same_domain_copy_routing is probably wanted here. You would find that if
>you grepped the spec for "mailing list".
same_domain_copy_routing is set by default in Debian systems, which
was incidentally used on the system at hand.
I actually grepped the spec for mailng list and did not find any
mention of remote_max_parallel in a one hundred line vicinity of
"mailing list".
>The current case, where an alias is used to expand a single address
>into a list of 20K recipients, is a "first" for me and I was
>dumbfounded by exim not having delivered to the first thousand
>recipients after an hour.
>
>You are lucky it works at all - not Exim, which should manage 20K
>recipients - but if you try to send a single copy somewhere with more
>than a few hundred recipients, it may get rejected.
I am aware of that. However, the method of address list delivery was
forced upon the local admin, and he didn't have the time to find a
light-weight software which would do the splitting in a sensible
manner. Anybody can recommend one?
>More than 100 recipients in the same domain is therefore risky.
I know that. The end customer didn't seem to care.
>Another
>point is that, with 20K recipients, Exim will use a lot of main memory
>(but that may not be an issue in these days of multi-gigabyte memories).
That's a non-issue.
>Unfortunately, the system owner decided to replace exim with postfix
>before I found out about remote_max_parallel. Postfix delivers to all
>20K recipients in like five minutes.
>
>So Postfix is clearly optimized well by default for this application.
Yes. Without having looked at postfix any closer, I suspect that
postfix does not use a per-message lock as exim does.
Greetings
Marc
No.17 | | 321 bytes |
| 
Thu, 15 Feb 2007, Alain Williams wrote:
Cyrus, yes via lmtp.
Just to be absolutely certain, you are using the lmtp transport, as
opposed to the smtp transport in LMTP mode?
I will post results here (along with hardware description) when I know them.
I'm sure that will be interesting.
No.18 | | 270 bytes |
| 
Thu, 15 Feb 2007, Marc Haber wrote:
I actually grepped the spec for mailng list and did not find any
mention of remote_max_parallel in a one hundred line vicinity of
"mailing list".
Ah, sorry, it was same_domain_copy_routing that I was referring to.
No.19 | | 394 bytes |
| 
Thu, Feb 15, 2007 at 04:45:43PM +0000, Philip Hazel wrote:
Thu, 15 Feb 2007, Alain Williams wrote:
Cyrus, yes via lmtp.
Just to be absolutely certain, you are using the lmtp transport, as
opposed to the smtp transport in LMTP mode?
Yes:
local_delivery_cyrus:
driver = lmtp
socket = CYRUS_LMTP_SCKET
batch_max = 20
user = cyrus
group = mail