Fri, May 19, 2006 at 12:49:19PM +0300, Teemu Takanen wrote:
This is a valid concern. However at least I find it practically impossible
to get PF firewall machine under any practical CPU/memory load before
packet loss starts to happen because of state expiration sweeps.
So yes, this might be expensive, but it still improves practical
performance in my opinnion.
For any setup which has a high throughput with an average number of states
this would add a noticable overhead. For bigger state tables, but still
long running connections, it can even trash the caches a lot. That makes
it IM less useful for the default code.
The sawchain effect causes jitter and lost forwarded packets when
connection initiation rate through a PF firewall is high. This causes
serious problems with voice applications where lost packets can be heard
easily. The only real solution is to smooth the expiration load.
Exactly. Two important parts here: the expiration itself is not the
problem, but the large time the state table is locked. When the locking
is made more fine-grained, the peak latency is reduced and with that the
likeliness of package drops.
However, a bigger problem is that at least the BSD 3.8 code can't
keep the backup firewall state table synchronized with the master firewall
because of this effect, even though CPU is 80% idle.
This is not really surprising, since the current behaviour results in
large flushes of updates, which can in themselve content the connection
to the backup firewall.
>Replace this with a number of linked list and one of them
>every second / half second whatever. This adds very low additional
>overhead (one or two pointers per state, a few more instructions for
>creation and removal of a state), but the runtime costs are not changed
>at all.
I though of keeping one LRU TAILQ for each timeout value. The
heads of these few lists could be checked for expiration on every
processed packet. This structure would be rather cheap and cache
friendly to keep. However I forgot that every rule can have separate
timeout values which makes the number of lists too huge to check at every
frame.
Don't try to reinvent the wheel. The code doesn't use per-state timeouts
for a reason. The assumption is that states normally don't timeout, e.g.
at least one packet modifies the state before the timeout. This means
that scanning all states is cheaper than individual timers.
The 3.9 code seems to split the sweep into smaller pieces much like you
propose, but even that does not really solve the actual problem, just
moves it to a little bit higher state creation frequency.
The code I have doesn't indicate that. It has a limit to the number of
state entries processed in a single run, but it doesn't split the table
into individual pieces to distribute the load over multiple timer calls.
Joerg