DSM

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • IMAP Draft: Cluster

    21 answers - 3883 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Cluster
    Imap is quite resource intensive. This requires to share the load
    between multiple hosts. Another goal is being fault tolerant and to have
    a fail-over strategy. This means not having 99.999% availability. This
    could be done using a cluster capable RDBMS. At the best this would mean
    that you can setup your IMAP servers like accessing one single database.
    Partition of Mailboxes
    The first option is to distribute the mailboxes among different
    repositories. E.g. one DB host for user mailboxes, one for shared
    mailboxes and one for news. You could even put the user mailboxes on
    different hosts.
    limitation this introduces it that quotas cannot span multiple
    repositories. There are performance issues when copying/moving between
    different repositories.
    Although the setup is quite simple it will always take some time to
    administer when e.g. migrating mailboxes from one repository to the
    other.
    Multiple IMAP Servers
    Setting up multiple IMAP severs should also not be too complicated when
    using standard RDBMS as backends. Requests could be shared between the
    servers randomly using DNS or other techniques that involve load
    balancing.
    The only thing the repository implementation on the IMAP server has to
    provide is caching uids, flags, and message numbers for the session
    lifetime and delivering events when it detects changes on the database.
    The client should cope with a failing message retrieval in the case it
    has been deleted by another session.
    If the mailboxes are distributed in a sufficient way, multiple IMAP
    servers accessing multiple RDBMS could be able to deal with a high
    volume of traffic and a high volume of stored messages
    F
    RDBMS like MySQL support master/slave replication. When the master fails
    a slave starts to do his job. IMAP servers could be externally
    redirected to the new master by e.g. dynamic DNS. But maybe we should
    add a possibility to allow changing target server for a repository
    online.
    An unanswered question to me is the following situation: The master
    crashes but its hard-disks are okay and the slave is a few seconds late
    with receiving log data. If I would change the slave to master now, I
    would lost statements that are already written to the masters
    hard-disk.
    Write To Master, Read From Slave
    This is suggested in the MySQL replication FAQ. An unanswered question
    for now is: The slave maybe a few seconds late. When I write to the
    master and afterwards read from the slave I may not find what I've just
    written there. That would be quite confusing for the client. First it
    would require checking how late the slave is. If it is more than x
    seconds late I should read from the master directly. Second I would have
    to cache what I've written to the master to be able to give the right
    responses to the client.
    A realistic and not too complicated approach would be the following.
    Fortunately IMAP does not allow to change message or header content.
    When I want to edit a message e.g. in the Drafts folder, it would be
    deleted first and gets a new uid. So it would be no problem to list
    messages and retrieve flags from the master and load message and header
    content from the slave(s). If retrieving the message from the slave
    fails, because it its out of sync (quite unlikely) the IMAP server could
    fall back reading it from the master.
    Because listing messages and retrieving flags is quite inexpensive and
    retrieving message content could be quite expensive, this solution could
    scale very well.
    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.1 | | 1262 bytes | |

    I am not going too much into the topic itself, just a few more general
    questions at first.

    Joachim Draeger wrote:
    Cluster

    Imap is quite resource intensive.

    Is it? Why?
    How does IMAP resource consumption compare to already existing James
    parts like PP3 and UserRepositories?

    I would have guessed think that the resource consumption is not
    depending on the protocol in the first place, but on the number of
    users, the number of transmitted emails and the size of those emails.

    This requires to share the load
    between multiple hosts. Another goal is being fault tolerant and to have
    a fail-over strategy. This means not having 99.999% availability. This
    could be done using a cluster capable RDBMS. At the best this would mean
    that you can setup your IMAP servers like accessing one single database.

    Are you still talking about James at this point or some specialized IMAP
    server apart from James?

    Clustering at DB level would mean to not support dbfile, file and other
    types of repositories.

    Bernd

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.2 | | 4182 bytes | |

    Hi Bernd,

    Imap is quite resource intensive.

    Is it? Why?
    How does IMAP resource consumption compare to already existing James
    parts like PP3 and UserRepositories?

    The normal SMTP/PP life-cycle is 1. delivering a message 2. retrieve by
    pop3, store on users hard-disk, delete from server. The user will sort
    and archive his messages on his own computer.
    With IMAP all messages stay on the server. The client is not required to
    cache anything. The user will browse through the mailboxes fetch headers
    and content, perform searches, copy messages.
    With PP3 mailboxes with the size of a GB are very unlikely.
    IMAP is designed to keep a TCP connection open all the time do be
    informed when new mail arrives. Clients are even opening multiple
    concurrent connections when browsing through mailboxes.
    In a worst case, when nobody is ill or on holidays, you have at least
    one open connection per user all the time.

    I would have guessed think that the resource consumption is not
    depending on the protocol in the first place, but on the number of
    users, the number of transmitted emails and the size of those emails.

    Yes, but on IMAP the user is dealing with the same message several
    times.
    course it's not the protocol itself but the way it is used.

    This requires to share the load
    between multiple hosts. Another goal is being fault tolerant and to have
    a fail-over strategy. This means not having 99.999% availability. This
    could be done using a cluster capable RDBMS. At the best this would mean
    that you can setup your IMAP servers like accessing one single database.

    Are you still talking about James at this point or some specialized IMAP
    server apart from James?

    For accessing a DB cluster you don't need any specialization. Any
    application that uses JDBC should be able to do it quite out of the box.

    Clustering at DB level would mean to not support dbfile, file and other
    types of repositories.

    Clustering at DB level would just mean following some rules in the
    implementation. DBMS are designed to be accessed simultaneously from
    multiple hosts.
    dbfile mmh, good point. I just made a few thoughts. There are also
    various solutions to access a network file-system without a single point
    of failure. And it is promising the best scalability for really high
    volume. 100 GB in a file-system is no problem 100 GB in a DB might be a
    nightmare. :-)
    The simultaneous access could be managed through a central DB.

    Reading a message:

    1. acquire a read-only lock to the row representing the message
    2. perform the read from network file system
    3. release the lock

    Concurrent read operation would be no problem. Even setting flags while
    another is reading is possible when they are stored in a different
    table. This includes the \Deleted flag. expunging deleted messages
    will be blocked until everyone has finished reading.
    At the moment locking in the JDBCMailRepository is done by a local
    HashTable and has some weak points in its implementation.

    For file-system only based repository I don't see a chance for being
    imap capable and cluster-able at the same time.
    There has to be one instance that has to assign the uids.
    Even when doing it without uid and IMAP capability accessing a
    file-based store from multiple hosts has always drawbacks.
    There is no solution for reliable locking. Even Maildir has its weak
    points when under heavy access. Trying to delete a file another one is
    reading means starting a timer and multiple retries. If someone else is
    modifying flags the file-name changes and you have to list the whole
    directory again to see if it is still there

    Former Maildir proposals had such scary ideas like only being able to
    deliver one email per second and process to a mailbox, retrying every 2
    seconds up to 24 hours.

    Joachim

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.3 | | 1582 bytes | |

    In a worst case, when nobody is ill or on holidays, you have at least
    one open connection per user all the time.
    How would one handle in the JVM and code these very long lasting connections?
    Also, doesn't this require too a totally different testing approach?

    >Clustering at DB level would mean to not support dbfile, file and other
    >types of repositories.


    Clustering at DB level would just mean following some rules in the
    implementation. DBMS are designed to be accessed simultaneously from
    multiple hosts.
    dbfile mmh, good point. I just made a few thoughts. There are also
    various solutions to access a network file-system without a single point
    of failure.
    What about JCR? (e.g. the Jackrabbit implementation). Could
    it solve some of the problems, or it is too far away from making
    it usable as a "file system" for IMAP?

    100 GB in a DB might be a
    nightmare. :-)
    Especially for Java based DBs :).

    Former Maildir proposals had such scary ideas like only being able to
    deliver one email per second and process to a mailbox, retrying every 2
    seconds up to 24 hours.
    What are the minimal performance requirements for JAMES's first IMAP implementation?
    Doesn't the IMAP protocol impose at least some "hints" regarding performance and timings?

    Thanks in advance,

    Ahmed.

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.4 | | 4093 bytes | |

    Hi Ahmed,

    Am Donnerstag, den 06.07.2006, 10:47 +0200 schrieb Ahmed Mohombe:
    In a worst case, when nobody is ill or on holidays, you have at least
    one open connection per user all the time.
    How would one handle in the JVM and code these very long lasting connections?

    Why? Are there issues?
    I'm not aware of any problems with many sockets treated by many threads
    running a long time apart from the not avoidable network, cpu and memory
    consumption.
    We should try to keep the data hold by the sessions small. If there are
    really many users it should be possible to run an optimistic setup: not
    all connected users will start the most expensive operations at the same
    time. So maybe it is needed to have a strategy for the worst case:
    Before everything gets so slow that nobody can do anything activate a
    dynamic connection limit.
    In this case the IMAP server should run on a separate machine not to
    influence smtp/spooling on high loads.

    Is that the direction of your question?

    Actually there is a inactivity limit of 30 minutes defined in the RFC
    and clients should be able to deal with the fact that they might be
    thrown out from time to time.

    Also, doesn't this require too a totally different testing approach?

    IM for low level tests not at all. For session based tests we have to
    continue making the optimistic assumption that if it works one time it
    should work all of the time.
    course long time fine tuned stress tests will be very important. I'm
    very curious to see how the first alpha release performs with the
    to-be-written imap-postage module!
    Until that i can really only make dim presumptions about how many
    concurrent users are possible. 10? 100? 1000?

    >Clustering at DB level would mean to not support dbfile, file and other
    >types of repositories.


    Clustering at DB level would just mean following some rules in the
    implementation. DBMS are designed to be accessed simultaneously from
    multiple hosts.
    dbfile mmh, good point. I just made a few thoughts. There are also
    various solutions to access a network file-system without a single point
    of failure.
    What about JCR? (e.g. the Jackrabbit implementation). Could
    it solve some of the problems, or it is too far away from making
    it usable as a "file system" for IMAP?

    Well, I don't have any knowledge about JCR. I tried to quickly browse
    some docs/faqs.
    But it seems to be able to use it reliable Jackrabbit uses a JDBC
    back-end, too. And I guess it wasn't made to store lots of emails. The
    overhead of implementing this might be greater than using JDBC directly.
    This gives us the ability to tune performance.
    How do you think JCR might help?

    What are the minimal performance requirements for JAMES's first IMAP implementation?

    As i said above I can only make assumptions. But it is an important
    question. The first goal is to have something stable.
    I think the cpu/memory usage for pure protocol handling is not that
    higher than in PP3. The critical part is moving the data.
    Apart from directly accessing mime-parts or performing searches it
    should compare to a smtp/pop3 session.
    Did anyone try how many simultaneous PP3 sessions are realistic to have
    an acceptable throughput?

    Doesn't the IMAP protocol impose at least some "hints" regarding performance and timings?

    Not really. Maybe clients have timeouts. Apart from that there are no
    timing issues. The client has to deal with the fact that an answer for a
    command may take long. The time when IMAP was designed internet was
    probably a very very slow business. :-)
    So the benchmark will be what the user will accept.

    Thanks in advance,

    I really appreciate any input.

    Joachim

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.5 | | 4296 bytes | |

    Am Donnerstag, den 06.07.2006, 10:47 +0200 schrieb Ahmed Mohombe:
    In a worst case, when nobody is ill or on holidays, you have at least
    one open connection per user all the time.
    >How would one handle in the JVM and code these very long lasting connections?


    Why? Are there issues?
    I'm not aware of any problems with many sockets treated by many threads
    running a long time apart from the not avoidable network, cpu and memory
    consumption.
    Well, if too many resources are for long time "occupied" and not released,
    the JVM won't even get the change to do GC on those object trees.
    The problem is also that those resources are limited (at least for most
    of the people).

    We should try to keep the data hold by the sessions small. If there are
    really many users it should be possible to run an optimistic setup: not
    all connected users will start the most expensive operations at the same
    time. So maybe it is needed to have a strategy for the worst case:
    Before everything gets so slow that nobody can do anything activate a
    dynamic connection limit.
    Exactly :).

    In this case the IMAP server should run on a separate machine not to
    influence smtp/spooling on high loads.
    Having a second machine is already a too high system requirement. Many
    small companies that have their server to an ISP would gladly use IMAP
    but only on that one machine.
    Many providers also do not allow messing with MX records or other things, so
    many small companies get one machine, one IP address and a few domain names.
    Even if the loading is not too high, somehow all services on that machine should run on
    that single one machine without consuming the resources totally from other
    processes in case of higher loading pikes.

    Is that the direction of your question?
    This would be the purpose: to guarantee somehow that at least those who
    are in (or do not overpass a limit) get decent handing even if more
    users try to access the service.

    Clustering at DB level would mean to not support dbfile, file and other
    types of repositories.
    Clustering at DB level would just mean following some rules in the
    implementation. DBMS are designed to be accessed simultaneously from
    multiple hosts.
    dbfile mmh, good point. I just made a few thoughts. There are also
    various solutions to access a network file-system without a single point
    of failure.
    >What about JCR? (e.g. the Jackrabbit implementation). Could
    >it solve some of the problems, or it is too far away from making
    >it usable as a "file system" for IMAP?


    Well, I don't have any knowledge about JCR. I tried to quickly browse
    some docs/faqs.
    But it seems to be able to use it reliable Jackrabbit uses a JDBC
    back-end, too. And I guess it wasn't made to store lots of emails. The
    overhead of implementing this might be greater than using JDBC directly.
    This gives us the ability to tune performance.
    How do you think JCR might help?
    Well, JCR is a standard and can have many types of back-ends (not just JDBC).
    It is used for CMSes and DMSes (e.g. Magnolia) and has all the features that require
    reliability, scalability, etc.
    I asked about JCR since IMAP from a user perspective looks more like a DMS. In fact,
    many small companies use the Exchange server(and some plug-ins) with IMAP exactly in this
    way: to access easily all the documents from everywhere and all the time.

    >What are the minimal performance requirements for JAMES's first IMAP implementation?


    As i said above I can only make assumptions. But it is an important
    question. The first goal is to have something stable.
    I think the cpu/memory usage for pure protocol handling is not that
    higher than in PP3. The critical part is moving the data.
    Apart from directly accessing mime-parts or performing searches it
    should compare to a smtp/pop3 session.
    This sounds good.

    Ahmed.

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.6 | | 4763 bytes | |

    Am Donnerstag, den 06.07.2006, 15:39 +0200 schrieb Ahmed Mohombe:
    Am Donnerstag, den 06.07.2006, 10:47 +0200 schrieb Ahmed Mohombe:
    In a worst case, when nobody is ill or on holidays, you have at least
    one open connection per user all the time.
    >How would one handle in the JVM and code these very long lasting connections?


    Why? Are there issues?
    I'm not aware of any problems with many sockets treated by many threads
    running a long time apart from the not avoidable network, cpu and memory
    consumption.
    Well, if too many resources are for long time "occupied" and not released,
    the JVM won't even get the change to do GC on those object trees.

    Is it like in Windows 95 that had unavoidable crashed after 21 days? ;-)
    like in Windows 98 that had to be rebooted to get back full ;-)
    performance?
    Well that would deny using Java on servers at all. You mean if I set a
    variable that was bound for 1 week to null it won't ever by gc'ed?
    A socket that has been opened for 24h can't be closed?
    Well I know many servers, just like James, are fighting with memory
    leaks and class loader issues. But those problems are mostly self-made
    and maybe seldom related to JVM Bugs
    Could you give a concrete example?

    The problem is also that those resources are limited (at least for most
    of the people).

    Memory and cpu, of course. They will limit the possible number of
    threads and sockets. I don't think we will get into limits just because
    of number of threads running or number of sockets opened.

    In this case the IMAP server should run on a separate machine not to
    influence smtp/spooling on high loads.
    Having a second machine is already a too high system requirement. Many
    small companies that have their server to an ISP would gladly use IMAP
    but only on that one machine.

    Using one machine will always mean that you have to limit performance to
    play safe and have reserves. This will first be done by an obligatory
    connection limit. After having benchmarks results we could see how a
    dynamic connection limit could make sense.

    But the best performance will be achieved with a dedicated machine.

    Is that the direction of your question?
    This would be the purpose: to guarantee somehow that at least those who
    are in (or do not overpass a limit) get decent handing even if more
    users try to access the service.

    n users should be able to get at least one idle connection, n/x
    expensive operations can be run simultaneously, managed by a scheduler.

    I'm really happy about such proposals. Even if they are not
    first-priority features, we can keep them in mind while doing
    design/implementation not to make decisions that would block them.

    Clustering at DB level would mean to not support dbfile, file and other
    types of repositories.
    Clustering at DB level would just mean following some rules in the
    implementation. DBMS are designed to be accessed simultaneously from
    multiple hosts.
    dbfile mmh, good point. I just made a few thoughts. There are also
    various solutions to access a network file-system without a single point
    of failure.
    >What about JCR? (e.g. the Jackrabbit implementation). Could
    >it solve some of the problems, or it is too far away from making
    >it usable as a "file system" for IMAP?


    Well, I don't have any knowledge about JCR. I tried to quickly browse
    some docs/faqs.
    But it seems to be able to use it reliable Jackrabbit uses a JDBC
    back-end, too. And I guess it wasn't made to store lots of emails. The
    overhead of implementing this might be greater than using JDBC directly.
    This gives us the ability to tune performance.
    How do you think JCR might help?
    Well, JCR is a standard and can have many types of back-ends (not just JDBC).

    Yes, but the questions is which back-ends are already there for
    production and what are the features. E.g. jackrabbit site says,
    file-based back-end does not support transactions and might by
    inconsistent after an unclean JVM shutdown.

    I asked about JCR since IMAP from a user perspective looks more like a DMS. In fact,
    many small companies use the Exchange server(and some plug-ins) with IMAP exactly in this
    way: to access easily all the documents from everywhere and all the time.

    Yeah, another example why imap is so resource intensive. :-)

    Joachim

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.7 | | 6478 bytes | |

    In a worst case, when nobody is ill or on holidays, you have at least
    one open connection per user all the time.
    How would one handle in the JVM and code these very long lasting connections?
    Why? Are there issues?
    I'm not aware of any problems with many sockets treated by many threads
    running a long time apart from the not avoidable network, cpu and memory
    consumption.
    >Well, if too many resources are for long time "occupied" and not released,
    >the JVM won't even get the change to do GC on those object trees.

    Is it like in Windows 95 that had unavoidable crashed after 21 days? ;-)
    like in Windows 98 that had to be rebooted to get back full ;-)
    performance?
    Well almost, but but not that bad :).

    Well that would deny using Java on servers at all. You mean if I set a
    variable that was bound for 1 week to null it won't ever by gc'ed?
    A socket that has been opened for 24h can't be closed?
    Well I know many servers, just like James, are fighting with memory
    leaks and class loader issues. But those problems are mostly self-made
    and maybe seldom related to JVM Bugs
    Exactly. This is not the fault of the JVM, nor Java, but of the applications.
    However the reality is that complex software *have* bugs. This is a matter
    of fact, and in the context of long running it is even more visible.
    Nowadays each software depends on so many frameworks/libraries, that is
    almost impossible to eliminate all the bugs, and in many cases new bugs can come
    in by "innocent updates" of some libs.

    Could you give a concrete example?
    Let's take Tomcat. It's a wonderful example. It has a very high quality,
    still many versions have some sort of memory leaks or other bugs that can
    be seen for services running over months. Restarting the server every week-end
    or every month is a common practice in most of the companies, and seems to
    be much cheaper than to search the real bugs (that in many of the cases are
    in the custom webapplication, or the combination of webapplication+Tomcat).
    I had many cases when simply upgrading to a new Tomcat version the webapplications
    didn't exposed more memory leaks (but unfortunately the new tomcat version
    brought other new bugs).

    AFAIK Mailets are supposed to be used with IMAP too, so the situation
    might be a little similar to Tomcat :).


    >The problem is also that those resources are limited (at least for most
    >of the people).


    Memory and cpu, of course. They will limit the possible number of
    threads and sockets. I don't think we will get into limits just because
    of number of threads running or number of sockets opened.

    In this case the IMAP server should run on a separate machine not to
    influence smtp/spooling on high loads.
    >Having a second machine is already a too high system requirement. Many
    >small companies that have their server to an ISP would gladly use IMAP
    >but only on that one machine.


    Using one machine will always mean that you have to limit performance to
    play safe and have reserves. This will first be done by an obligatory
    connection limit. After having benchmarks results we could see how a
    dynamic connection limit could make sense.
    This sounds good.

    But the best performance will be achieved with a dedicated machine.
    I know, but this is what most of the users won't have, nor do they have now.
    A big problem with J2EE in general is that that such applications are fantastic
    ant very scalable, but only upward (very seldom downward). This is why PHP has
    such a big success.
    IMH this downward scalability should not be forgotten in the case of JAMES.

    Clustering at DB level would mean to not support dbfile, file and other
    types of repositories.
    Clustering at DB level would just mean following some rules in the
    implementation. DBMS are designed to be accessed simultaneously from
    multiple hosts.
    dbfile mmh, good point. I just made a few thoughts. There are also
    various solutions to access a network file-system without a single point
    of failure.
    What about JCR? (e.g. the Jackrabbit implementation). Could
    it solve some of the problems, or it is too far away from making
    it usable as a "file system" for IMAP?
    Well, I don't have any knowledge about JCR. I tried to quickly browse
    some docs/faqs.
    But it seems to be able to use it reliable Jackrabbit uses a JDBC
    back-end, too. And I guess it wasn't made to store lots of emails. The
    overhead of implementing this might be greater than using JDBC directly.
    This gives us the ability to tune performance.
    How do you think JCR might help?
    >Well, JCR is a standard and can have many types of back-ends (not just JDBC).


    Yes, but the questions is which back-ends are already there for
    production and what are the features. E.g. jackrabbit site says,
    file-based back-end does not support transactions and might by
    inconsistent after an unclean JVM shutdown.
    Yes, I know. The Jackrabbit implementation is not perfect, but IMH when evaluating
    if to use JCR or not, the standard should matter as that is ready (the implementation
    can be improved till it fully implements the standard).
    I'm not an expert in JCR nor IMAP, that's why I asked.
    Considering that Jackrabbit is also an Apache project, maybe some of the authors could
    better answer some questions.
    IMH is worth asking since if JCR is usable/compatible with IMAP than it could shorten greatly the
    release date of JAMES IMAP.

    >I asked about JCR since IMAP from a user perspective looks more like a DMS. In fact,
    >many small companies use the Exchange server(and some plug-ins) with IMAP exactly in this
    >way: to access easily all the documents from everywhere and all the time.


    Yeah, another example why imap is so resource intensive. :-)
    I really don't know how could one prevent such misuse. It is the typical "hammer" syndrome :).

    Ahmed.

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.8 | | 2794 bytes | |

    Joachim Draeger wrote:
    Hi Bernd,

    Imap is quite resource intensive.
    >>
    >>Is it? Why?
    >>How does IMAP resource consumption compare to already existing James
    >>parts like PP3 and UserRepositories?


    The normal SMTP/PP life-cycle is 1. delivering a message 2. retrieve by
    pop3, store on users hard-disk, delete from server. The user will sort
    and archive his messages on his own computer.
    With IMAP all messages stay on the server. The client is not required to
    cache anything. The user will browse through the mailboxes fetch headers
    and content, perform searches, copy messages.
    With PP3 mailboxes with the size of a GB are very unlikely.
    IMAP is designed to keep a TCP connection open all the time do be
    informed when new mail arrives. Clients are even opening multiple
    concurrent connections when browsing through mailboxes.
    In a worst case, when nobody is ill or on holidays, you have at least
    one open connection per user all the time.

    K, thank you for explaining (although I am not completely with you on
    the PP3 mailbox size case, since I am keeping all my PP3 mails on the
    server, too ;-) ).

    >>I would have guessed think that the resource consumption is not
    >>depending on the protocol in the first place, but on the number of
    >>users, the number of transmitted emails and the size of those emails.


    Yes, but on IMAP the user is dealing with the same message several
    times.
    course it's not the protocol itself but the way it is used.


    >>This requires to share the load
    >>

    between multiple hosts. Another goal is being fault tolerant and to have
    a fail-over strategy. This means not having 99.999% availability. This
    could be done using a cluster capable RDBMS. At the best this would mean
    that you can setup your IMAP servers like accessing one single database.
    >>
    >>Are you still talking about James at this point or some specialized IMAP
    >>server apart from James?


    For accessing a DB cluster you don't need any specialization. Any
    application that uses JDBC should be able to do it quite out of the box.

    Keeping the whole of James in mind, I think it is more worthwhile to
    think about clustering James instead of clustering IMAP servers using
    RDBMSs.

    Bernd

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.9 | | 1259 bytes | |

    Joachim Draeger wrote:
    Hi Ahmed,

    Am Donnerstag, den 06.07.2006, 10:47 +0200 schrieb Ahmed Mohombe:

    In a worst case, when nobody is ill or on holidays, you have at least
    one open connection per user all the time.
    >>
    >>How would one handle in the JVM and code these very long lasting connections?


    Similar to short lasting connections, I guess :-)

    Actually there is a inactivity limit of 30 minutes defined in the RFC
    and clients should be able to deal with the fact that they might be
    thrown out from time to time.

    But any command restarts this timeout. A simple NP is sufficient. This
    means, that the server is not allowed to explicitly terminate the
    connection as long as the client is pinging him. Regardless of the fact
    that there is not actually mail traffic happening.

    Maybe such idle connections could be handled by a special thread and if
    any real work is starting on a connection, this one is handed over to a
    worker thread?

    Bernd

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.10 | | 2245 bytes | |

    Am Samstag, den 08.07.2006, 12:56 +0200 schrieb Bernd Fondermann:

    For accessing a DB cluster you don't need any specialization. Any
    application that uses JDBC should be able to do it quite out of the box.

    Keeping the whole of James in mind, I think it is more worthwhile to
    think about clustering James instead of clustering IMAP servers using
    RDBMSs.

    Well, I was talking about an cluster-able Message Repository and what
    has to be considered when using it with IMAP.
    course a cluster-able Message Repository includes that it has to
    accessed by complete James.
    For example: I you would implement a MailRepository that relies on RDBMS
    locking instead of doing it locally. Central UserRepository should be no
    problem. Than you could have multiple instances of James without having
    to touch any further existing code. As a first step you could measure
    the queue length and cpu utilization to do load balancing.
    The only blocking thing is the local HashMap locking in
    JDBCMailRepository. ( course there could be other things because I
    have limited experience with James)

    The weak point would be that there are running local queues on the
    individual servers. Nobody could foresee how work-intensive a single
    queue entry would be.
    The solution would be a shared queue for all James instances. This would
    require a locking-algorithm to determine which server should pick up
    which entry. But that should be possible using RDBMS.
    James instances could be added and removed on the fly.

    I don't see a reason why James has to be cluster agnostic apart from
    that. Using RDBMS together with the dbfile strategy should be most
    effective.

    Fail-safe, clustered file-systems and RDBMSs are there, open source
    solutions exist, we could rely on that without reinventing the wheel.

    Joachim

    Bernd

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.11 | | 1628 bytes | |

    Am Samstag, den 08.07.2006, 13:08 +0200 schrieb Bernd Fondermann:

    Actually there is a inactivity limit of 30 minutes defined in the RFC
    and clients should be able to deal with the fact that they might be
    thrown out from time to time.

    But any command restarts this timeout. A simple NP is sufficient. This
    means, that the server is not allowed to explicitly terminate the
    connection as long as the client is pinging him. Regardless of the fact
    that there is not actually mail traffic happening.

    Maybe such idle connections could be handled by a special thread and if
    any real work is starting on a connection, this one is handed over to a
    worker thread?

    To allow many idle connections but limit the maximal possible server
    load, I have the idea of a central scheduler in mind.
    The scheduler keeps track of all running threads.
    If a session thread wants to run an expensive command it has to ask the
    scheduler first.
    If there are too many working threads the scheduler will queue the
    session threads and letting them sleep until a working thread has
    finished.
    The client will notice that as some kind of staccato. :-)
    Avoiding the "everybody wants to work at the same time" dead-lock, which
    causes that everything gets so slow that no one can do anything would
    require a complicated algorithm that manages a dynamic limit of the
    total number of connections.

    Joachim

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.12 | | 1601 bytes | |

    Mon, 2006-07-10 at 14:57 +0200, Joachim Draeger wrote:

    Maybe such idle connections could be handled by a special thread and if
    any real work is starting on a connection, this one is handed over to a
    worker thread?

    To allow many idle connections but limit the maximal possible server
    load, I have the idea of a central scheduler in mind.
    The scheduler keeps track of all running threads.
    If a session thread wants to run an expensive command it has to ask the
    scheduler first.

    The thread per connection model simply doesn't scale to the level that
    would be needed for a decent IMAP server. Fortunately, this is a
    problem that has already been solved by SEDA [1]. The Apache MINA
    project [2] implements the good ideas of SEDA plus adds a number of
    other good ideas providing a framework that is very easy to learn and
    use. MINA also performs and scales very well. MINA is also an Apache
    project which means that any need support is just around the corner. At
    ApacheCon there was talk about using MINA to implement the protocols
    used by JAMES [3]. Since I wrote a MINA based SMTP handler [4], I would
    be very interested in seeing JAMES move to MINA based protocol handlers.
    I think this would be a good move for the 3.0 release of JAMES.
    -Mike

    [1] http://www.eecs.harvard.edu/~mdw/proj/seda/
    [2]
    [3]

    [4] http://hausmail.safehaus.org/

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.13 | | 1997 bytes | |

    Am Montag, den 10.07.2006, 21:05 -0600 schrieb Mike Heath:
    Mon, 2006-07-10 at 14:57 +0200, Joachim Draeger wrote:

    Maybe such idle connections could be handled by a special thread and if
    any real work is starting on a connection, this one is handed over to a
    worker thread?

    To allow many idle connections but limit the maximal possible server
    load, I have the idea of a central scheduler in mind.
    The scheduler keeps track of all running threads.
    If a session thread wants to run an expensive command it has to ask the
    scheduler first.

    The thread per connection model simply doesn't scale to the level that
    would be needed for a decent IMAP server. Fortunately, this is a
    problem that has already been solved by SEDA [1]. The Apache MINA
    project [2] implements the good ideas of SEDA plus adds a number of
    other good ideas providing a framework that is very easy to learn and
    use. MINA also performs and scales very well. MINA is also an Apache
    project which means that any need support is just around the corner. At
    ApacheCon there was talk about using MINA to implement the protocols
    used by JAMES [3]. Since I wrote a MINA based SMTP handler [4], I would
    be very interested in seeing JAMES move to MINA based protocol handlers.
    I think this would be a good move for the 3.0 release of JAMES.
    -Mike

    [1] http://www.eecs.harvard.edu/~mdw/proj/seda/
    [2]
    [3]

    [4] http://hausmail.safehaus.org/

    Hi Mike,

    i allready move this to 3.0 Roadmap ;-)

    I had a look at your work in the last week. It seems to be very
    intressting. Im not sure yet if we should move to MINA before or after
    the new SMTP Handler api is finish

    What the others think ?

    BTW, i never used MINA but the speak on ApacheCon was very intressting.

    bye
    Norman

    PGP SIGNATURE
    Version: GnuPG v1.4.2.2 (GNU/Linux)

    jeSXyT9wwy4cgWLCq29ukcw=
    =R11J
    PGP SIGNATURE
  • No.14 | | 1809 bytes | |

    Am Montag, den 10.07.2006, 21:05 -0600 schrieb Mike Heath:
    Mon, 2006-07-10 at 14:57 +0200, Joachim Draeger wrote:

    Maybe such idle connections could be handled by a special thread and if
    any real work is starting on a connection, this one is handed over to a
    worker thread?

    To allow many idle connections but limit the maximal possible server
    load, I have the idea of a central scheduler in mind.
    The scheduler keeps track of all running threads.
    If a session thread wants to run an expensive command it has to ask the
    scheduler first.

    The thread per connection model simply doesn't scale to the level that
    would be needed for a decent IMAP server.

    Why? There are reasons to limit the maximal number of simultaneously
    running threads. But what are the drawbacks of having one thread per
    connection?

    Fortunately, this is a
    problem that has already been solved by SEDA [1]. The Apache MINA
    project [2] implements the good ideas of SEDA plus adds a number of
    other good ideas providing a framework that is very easy to learn and
    use. MINA also performs and scales very well.

    I just begun to read MINA docs to get an overview. How does it work
    internally? How does MINA know there is data waiting on a connection?
    Is it possible to let Java fire events in a new thread when data has
    arrived or are you polling every connection for new data?
    I had a fast look to a SEDA doc. Does MINA support prioritization?
    E.G.: NP is a cheap, LIST or APPEND are expensive commands.
    Maybe between ProtocolAcceptor and ProtocolHandler?

    Joachim

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.15 | | 841 bytes | |

    Joachim Draeger wrote:
    >The thread per connection model simply doesn't scale to the level that
    >would be needed for a decent IMAP server.


    Why? There are reasons to limit the maximal number of simultaneously
    running threads. But what are the drawbacks of having one thread per
    connection?

    I think that many of the answers you are looking for are better
    described in papers found here:
    http://www.eecs.harvard.edu/~mdw/proj/seda/#papers

    IMAP is a perfect case for SEDA because often there are a lot of idle
    collection and without SEDA you need to keep idle threads allocated.

    Stefano

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.16 | | 336 bytes | |

    Jul 14, 2006, at 10:08 AM, Joachim Draeger wrote:
    How much does it cost to keep an idle thread allocated? Is it only
    using
    memory or using a lot of memory or does the JVM need cpu time to deal
    with them?
    Do you recommend using SEDA instead of MINA?

    MINA would let you implement a SEDA-style architecture.
    -pete
  • No.17 | | 1545 bytes | |

    Am Freitag, den 14.07.2006, 15:40 +0200 schrieb Stefano Bagnara:

    >The thread per connection model simply doesn't scale to the level that
    >would be needed for a decent IMAP server.


    Why? There are reasons to limit the maximal number of simultaneously
    running threads. But what are the drawbacks of having one thread per
    connection?

    I think that many of the answers you are looking for are better
    described in papers found here:
    http://www.eecs.harvard.edu/~mdw/proj/seda/#papers

    dear, it would take me a hundred years for sure to go through all of
    that :-), although it looks very interesting.
    It's quite theoretical and I guess my question is more JVM related. When
    I was looking for the maximal number of threads Java is able to run I
    found the answer that it is only limited by memory
    I always try to challenge propositions. Is it even a problem to have a
    sleeping thread per connection?

    IMAP is a perfect case for SEDA because often there are a lot of idle
    collection and without SEDA you need to keep idle threads allocated.

    How much does it cost to keep an idle thread allocated? Is it only using
    memory or using a lot of memory or does the JVM need cpu time to deal
    with them?
    Do you recommend using SEDA instead of MINA?

    Joachim

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.18 | | 1545 bytes | |

    Am Freitag, den 14.07.2006, 15:40 +0200 schrieb Stefano Bagnara:

    >The thread per connection model simply doesn't scale to the level that
    >would be needed for a decent IMAP server.


    Why? There are reasons to limit the maximal number of simultaneously
    running threads. But what are the drawbacks of having one thread per
    connection?

    I think that many of the answers you are looking for are better
    described in papers found here:
    http://www.eecs.harvard.edu/~mdw/proj/seda/#papers

    dear, it would take me a hundred years for sure to go through all of
    that :-), although it looks very interesting.
    It's quite theoretical and I guess my question is more JVM related. When
    I was looking for the maximal number of threads Java is able to run I
    found the answer that it is only limited by memory
    I always try to challenge propositions. Is it even a problem to have a
    sleeping thread per connection?

    IMAP is a perfect case for SEDA because often there are a lot of idle
    collection and without SEDA you need to keep idle threads allocated.

    How much does it cost to keep an idle thread allocated? Is it only using
    memory or using a lot of memory or does the JVM need cpu time to deal
    with them?
    Do you recommend using SEDA instead of MINA?

    Joachim

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.19 | | 1264 bytes | |

    Joachim Draeger wrote:
    >IMAP is a perfect case for SEDA because often there are a lot of idle
    >collection and without SEDA you need to keep idle threads allocated.


    How much does it cost to keep an idle thread allocated? Is it only using
    memory or using a lot of memory or does the JVM need cpu time to deal
    with them?

    IIRC each jvm thread needs 512K memory for the stack inside the JVM:
    some application, like resin, bump this value to 2MB at startup (to
    avoid M under deep calls, I guess). (-Xss is the jvm parameter to tune it)

    Furthermore each JVM implements threads differently on various Ses.
    Again IIRC jvm 1.4 under linux uses linux processes for threads and this
    means more memory and resouces for the machine (default stack for a
    linux process should be 4-32K)

    I don't know enought the JVM threading model to say how much does the
    number of threads impact on the performances.

    Do you recommend using SEDA instead of MINA?

    At a glance MINA "is" SEDA for JAVA.

    Stefano

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.20 | | 1721 bytes | |

    2006. 07. 14, 16.08-kor Joachim Draeger ezt :
    I was looking for the maximal number of threads Java is able to run I
    found the answer that it is only limited by memory
    I always try to challenge propositions. Is it even a problem to have a
    sleeping thread per connection?

    Several years ago, several jobs far away, I had some experience with the
    non-blocking socket handling vs one-thread-per-connection strategy
    issue. Please consider the followings to see my experiences in context:
    a, that was a Linux machine with a kernel from the 2.4 series (so every
    java thread handled as a process by the kernel)
    b, the best JVM of that time was the IBM 1.3.1. (java.nio was only a
    dream, and a weak promise)
    c, the application was a simple chat server, minimal input, a lots of
    output.

    With the old, one-thread-per-connection strategy we had:
    a, after 600-700 connection the system becamed very unstable and
    irresponsible, the load was between 10 and 20, the VM had grown more
    then 512 MB in the RAM.
    b, after ~800 connection the VM was unable to create any additional
    threads, and silently died.

    After I rewrote in a non-blocking fashion the same machine and VM could
    serve 1200-1300 connections, in a much more responsible manner.

    I know, that with kernels from the 2.6 series able to handle much more
    threads, but every context switch has a cost penalty, and if you have
    hundreds or thousands of threads, it can (and will) cause some pain.

    BR,
    Zsombor

    To unsubscribe, e-mail: server-dev-unsubscribe (AT) james (DOT) apache.org
    For additional commands, e-mail: server-dev-help (AT) james (DOT) apache.org
  • No.21 | | 1849 bytes | |

    Am Montag, den 17.07.2006, 03:44 +0200 schrieb Zsombor:
    2006. 07. 14, 16.08-kor Joachim Draeger ezt :
    I was looking for the maximal number of threads Java is able to run I
    found the answer that it is only limited by memory
    I always try to challenge propositions. Is it even a problem to have a
    sleeping thread per connection?

    Several years ago, several jobs far away, I had some experience with the
    non-blocking socket handling vs one-thread-per-connection strategy
    issue. Please consider the followings to see my experiences in context:
    a, that was a Linux machine with a kernel from the 2.4 series (so every
    java thread handled as a process by the kernel)
    b, the best JVM of that time was the IBM 1.3.1. (java.nio was only a
    dream, and a weak promise)
    c, the application was a simple chat server, minimal input, a lots of
    output.

    With the old, one-thread-per-connection strategy we had:
    a, after 600-700 connection the system becamed very unstable and
    irresponsible, the load was between 10 and 20, the VM had grown more
    then 512 MB in the RAM.
    b, after ~800 connection the VM was unable to create any additional
    threads, and silently died.

    After I rewrote in a non-blocking fashion the same machine and VM could
    serve 1200-1300 connections, in a much more responsible manner.

    I know, that with kernels from the 2.6 series able to handle much more
    threads, but every context switch has a cost penalty, and if you have
    hundreds or thousands of threads, it can (and will) cause some pain.

    BR,
    Zsombor

    BTW, we want switch to MINA for connectionhandling. So this should be no
    problem then

    bye
    Norman

    PGP SIGNATURE
    Version: GnuPG v1.4.2.2 (GNU/Linux)

    n27Hnp2mglHfy88DPXctvak=
    =PiDR
    PGP SIGNATURE

Re: IMAP Draft: Cluster


max 4000 letters.
Your nickname that display:
In order to stop the spam: 5 + 4 =
QUESTION ON "DSM"

EMSDN.COM