Importing PaX features to NetBSD
29 answers - 6128 bytes -

Hi,
There's a Linux hardening project called PaX. Few years ago it
introduced some new interesting approaches to security, that over time
proved themselves to be of help when facing mostly unknown threats. The
idea behind this approach is that you design the security of the system
to prevent the exploit, and not the vulnerability.
The PaX project can be found on the web at
http://pax.grsecurity.net
And detailed documentation about the variety of features it contains,
including some performance benchmarks, can be found at
I decided to download the paxtest tool, used to check if the various
features of PaX are working or not. What this tool does is run a few
tests to see if techniques that are commonly used in exploits are
working or not. More on this inline below, in square brackets.
The PaX test-suite can be downloaded from
Extract it and type "gmake netbsd" to build the paxtest tool, then use
it as "./paxtest blackhat".
The attack techniques this test suite checks for are
1. If anonymous mappings, or the bss/data/stack/heap sections of a
binary, are executable by default. If they are, it means the most
basic buffer overflow exploits will work.
2. If it's possible to make previously executable mappings writable,
or previously writable mappings executable. While this is standard
operation, it's a technique used to bypass W^X and the likes, and
can be seen in several modern exploits.
3. Amount of randomization in mappings each time a program starts.
The more predictable the mappings are, the easier it is to perform
attacks that require jumping to a known address - like ret-to-libc
for example, used also to bypass W^X.
Note: While SSP may also help in preventing these ret-to-libc
attacks, this is another layer that should be considered, where the
two can provide more coverage against this type of attack.
4. Return-to-lib attacks via string manipulation routines. If shared
libraries are mapped where the address of functions in them have
a NULL byte it means that the attacker can't (easily) use a
return-to-lib attack to get to them, because the input used to
overflow must be ASCII armored.
5. Same as above, only this time the overflow is made in a routine
that isn't affected by a NULL byte. SSP should be able to stop
these attacks.
6. Checks if code in the data/bss segments of a shared library can be
executed.
7. Checks if it's possible to map text segments as writable. Another
form of the attack in #2.
The goal, of course, is to pass all tests in this suite.
Here's output from a (slightly modified version of) paxtest run on my
NetBSD machine:
phyre:paxtest-0.9.7-pre4 {319} ./paxtest blackhat
PaXtest - Copyright(c) 2003,2004 by Peter Busser <peter (AT) adamantix (DOT) org>
Released under the GNU Public Licence version 2 or later
Writing output to paxtest.log
It may take a while for the tests to complete
Test results:
PaXtest - Copyright(c) 2003,2004 by Peter Busser <peter (AT) adamantix (DOT) org>
Released under the GNU Public Licence version 2 or later
Mode: blackhat
NetBSD phyre.bsd.org.il 3.99.14 NetBSD 3.99.14 (GENERIC) #14: Sun Dec 11
23:00:39 IST 2005
elad@
amd64
Executable anonymous mapping : Killed
Executable bss : Killed
Executable data : Killed
Executable heap : Killed
Executable stack : Killed
[ #1. That's good. architectures where there's no NX bit,
these would be vulnerable too. i386, for example. ]
Executable anonymous mapping (mprotect) : Vulnerable
Executable bss (mprotect) : Vulnerable
Executable data (mprotect) : Vulnerable
Executable heap (mprotect) : Vulnerable
Executable shared library bss (mprotect) : Vulnerable
Executable shared library data (mprotect): Vulnerable
Executable stack (mprotect) : Vulnerable
[ #2. While to not be vulnerable to this attack we need to
change semantics of mprotect(), doing this based on a sysctl
knob might be a good idea.
The performance cost of adding this protection is ~zero,
because it's a matter of testing for flags. ]
Anonymous mapping randomisation test : No randomisation
Heap randomisation test (ET_EXEC) : No randomisation
Main executable randomisation (ET_EXEC) : No randomisation
Shared library randomisation test : No randomisation
Stack randomisation test (SEGMEXEC) : No randomisation
Stack randomisation test (PAGEEXEC) : No randomisation
[ #3. Randomization hooks could fix these. ]
Return to function (strcpy) : paxtest: return address
contains a NULL byte.
Return to function (strcpy, RANDEXEC) : paxtest: return address
contains a NULL byte.
[ #4. That's good. Is this on purpose? :) (no) ]
Return to function (memcpy) : Vulnerable
Return to function (memcpy, RANDEXEC) : Vulnerable
[ #5. SSP fixes this one, as well as #4. ]
Executable shared library bss : Killed
Executable shared library data : Killed
[ #6. That's good, but again, not on all architectures. ]
Writable text segments : Vulnerable
[ #7. Same note as #2. ]
phyre:paxtest-0.9.7-pre4 {320}
As you can see, NetBSD is not doing too well in these tests. While there
could certainly be a performance/functionality penalty in fixing some of
these -- namely, the randomization and mprotect() ones -- I feel we
should leave the choice to the users to decide (whether to enable or
not).
What I would like to discuss is the possibility of importing two
important features from PaX -- ASLR and MPRTECT.
I've CC'd this mail to the PaX author, who knows about this stuff a lot
more than I do (and is in general a very cool dude :) so he could give
his comments during the discussion; please CC him on replies as well.
Thanks,
-e.
No.1 | | 2848 bytes |
| 
Elad Efrat wrote:
The attack techniques this test suite checks for are
1. If anonymous mappings, or the bss/data/stack/heap sections of a
binary, are executable by default. If they are, it means the most
basic buffer overflow exploits will work.
Some architectures (powerpc) have ABIs that mandate executable bss/data.
There is no way to turn that off.
2. If it's possible to make previously executable mappings writable,
or previously writable mappings executable. While this is standard
operation, it's a technique used to bypass W^X and the likes, and
can be seen in several modern exploits.
If you have textrel relocations, you have to allow this. I think the
right way to have add a PRT_LCKED that mprotect can understand. If
this is supplied, the protection on such pages can't be downgraded.
munmap would still be allowed (it might be advisable to make munmap
change the region to PRT_NNE but leave it in place so it can't be
remapped.)
3. Amount of randomization in mappings each time a program starts.
The more predictable the mappings are, the easier it is to perform
attacks that require jumping to a known address - like ret-to-libc
for example, used also to bypass W^X.
The program itself is linked at a base address. the libraries and stacks
could be vary but on machines with virtual caches, that might be very painful.
Note: While SSP may also help in preventing these ret-to-libc
attacks, this is another layer that should be considered, where the
two can provide more coverage against this type of attack.
4. Return-to-lib attacks via string manipulation routines. If shared
libraries are mapped where the address of functions in them have
a NULL byte it means that the attacker can't (easily) use a
return-to-lib attack to get to them, because the input used to
overflow must be ASCII armored.
So you are saying that most libraries should be mapped at < 16MB? That's
just not practical for most architectures. I suppose that libraries with
<= 64KB of text can just be mapped on a 16MB boundary. LP64 you can map
libraries into the first 16MB of each 4GB region. course, you'll fragment
your memory space that way.
5. Same as above, only this time the overflow is made in a routine
that isn't affected by a NULL byte. SSP should be able to stop
these attacks.
6. Checks if code in the data/bss segments of a shared library can be
executed.
See above for ABI requirements.
7. Checks if it's possible to map text segments as writable. Another
form of the attack in #2.
The goal, of course, is to pass all tests in this suite.
I don't think that really practical for most platforms.
No.2 | | 451 bytes |
| 
Sun, 18 Dec 2005, Elad Efrat wrote:
What I would like to discuss is the possibility of importing two
important features from PaX -- ASLR and MPRTECT.
Questions that come to mind are:
* under what license is the code to-be-imported (seeing GPL in the ouput
of that test), and
* what is the performance impact when this is enabled?
I guess some more benchmarks would be useful to establish that.
- Hubert
No.3 | | 2488 bytes |
| 
Matt Thomas wrote:
Some architectures (powerpc) have ABIs that mandate executable bss/data.
There is no way to turn that off.
True, but two things:
1. How many of our users run i386, amd64, sparc, sparc64? how many of
them run powerpc?
2. We are already providing this feature where we can, so it's not
this that is up for discussion. :)
If you have textrel relocations, you have to allow this.
True. However, PaX is aware of text relocations, and addresses them.
I think the
right way to have add a PRT_LCKED that mprotect can understand. If
this is supplied, the protection on such pages can't be downgraded.
munmap would still be allowed (it might be advisable to make munmap
change the region to PRT_NNE but leave it in place so it can't be
remapped.)
We should look at the two solutions and decide which fits best; or
maybe combine them.
The program itself is linked at a base address. the libraries and stacks
could be vary but on machines with virtual caches, that might be very painful.
Like I said, let's leave the choice to the user. I'm sure there are
plenty of people with 3ghz machines that will be happy to have this
feature at the given cost
So you are saying that most libraries should be mapped at < 16MB?
No, I'm saying that SSP (in gcc 4.1) should address both problems in
a more elegant manner than a NULL byte in the return address. This is
also why I didn't ask this feature to be discussed in the mail.
That's
just not practical for most architectures. I suppose that libraries with
<= 64KB of text can just be mapped on a 16MB boundary. LP64 you can map
libraries into the first 16MB of each 4GB region. course, you'll fragment
your memory space that way.
It's also a very ugly solution. It's not up for discussion. :)
I don't think that really practical for most platforms.
This mail has two intentions:
1. Present the paxtest tool and provide some explanantion about its
tests.
2. Ask to discuss the importing of two features from PaX: ASLR
(Address Space Layout Randomization) and MPRTECT (the mprotect()
restrictions).
ASLR is practical. It may add a performance hit, but like I already
said, let's leave that choice to the user. MPRTECT is also practical
but needs special care.
-e.
No.4 | | 1380 bytes |
| 
Hubert Feyrer wrote:
* under what license is the code to-be-imported (seeing GPL in the ouput
of that test), and
There is no licensing problem.
* what is the performance impact when this is enabled?
I guess some more benchmarks would be useful to establish that.
I haven't done any native benchmarks, but I can point you to (some)
benchmarks already done, available at
, although I'm not
sure these are fair wrt/to the features in question.
Some information that might help figuring what would be the costs of
ASLR and MPRTECT:
ASLR calculates 3 random values on execution and saves these as offsets
to be used when a random value is needed. How expensive are 3
arc4random() calls in the context of an entire sys_execve()?
MPRTECT, when not talking about text relocations, is very simple and
a matter of playing with flags. There's some more work involved when
handling textrels, but we have both the PaX way and Matt's way to look
into and decide what would be best for us.
For these two features, the last case of handling textrels *may* present
the only visible performance hit.
However, like I already mentioned these are security features, and we
should let the user do the trade-off if desired. We can always have
these turned off by default.
-e.
No.5 | | 706 bytes |
| 
Sun, Dec 18, 2005 at 12:04:35PM +0200, Elad Efrat wrote:
Matt Thomas wrote:
Some architectures (powerpc) have ABIs that mandate executable bss/data.
There is no way to turn that off.
True, but two things:
1. How many of our users run i386, amd64, sparc, sparc64? how many of
them run powerpc?
In general we don't count the number of users of a given arch before
looking at a feature and it's costs. Things need to fit into a cross
architecture framework that don't negatively impact other archs.
If you want that, there are other *BSD distributions aimed at "best
performance for the x86 arch" or other such goals
James
No.6 | | 1926 bytes |
| 
Sun, Dec 18, 2005 at 12:12:50PM +0200, Elad Efrat wrote:
Hubert Feyrer wrote:
* under what license is the code to-be-imported (seeing GPL in the ouput
of that test), and
There is no licensing problem.
* what is the performance impact when this is enabled?
I guess some more benchmarks would be useful to establish that.
I haven't done any native benchmarks, but I can point you to (some)
benchmarks already done, available at
, although I'm not
sure these are fair wrt/to the features in question.
If they weren't done on NetBSD systems they likely aren't useful.
Some information that might help figuring what would be the costs of
ASLR and MPRTECT:
ASLR calculates 3 random values on execution and saves these as offsets
to be used when a random value is needed. How expensive are 3
arc4random() calls in the context of an entire sys_execve()?
Who knows overallDo it and measure it.
MPRTECT, when not talking about text relocations, is very simple and
a matter of playing with flags. There's some more work involved when
handling textrels, but we have both the PaX way and Matt's way to look
into and decide what would be best for us.
In both cases they need measurements. It's adding overhead so quantifying it
is important.
For these two features, the last case of handling textrels *may* present
the only visible performance hit.
However, like I already mentioned these are security features, and we
should let the user do the trade-off if desired. We can always have
these turned off by default.
If they're in the S as options it's a responsbility of ours to be able
to also tell the user the impact these will have on their performance. This
means doing some measurements and seeing what their impact is.
James
No.7 | | 831 bytes |
| 
Hi,
Even though you ignored #2 that says we already done this
James Chacon wrote:
In general we don't count the number of users of a given arch before
looking at a feature and it's costs. Things need to fit into a cross
architecture framework that don't negatively impact other archs.
Some things, like the NX bit for example, are MD. Do you think we should
not provide a security feature because we can't provide it for all
architectures?
If you want that, there are other *BSD distributions aimed at "best
performance for the x86 arch" or other such goals
Again: it should be the end-user's choice what security features to
use, and it should be NetBSD's goal to take full advantage of the
features offered to us by the hardware.
-e.
No.8 | | 532 bytes |
| 
Sun, Dec 18, 2005 at 04:27:09AM -0600, James Chacon wrote:
Things need to fit into a cross
architecture framework that don't negatively impact other archs.
The mprotect changes are arch dependend, and as gimpy pointed out would
violate ABI on some archs. I don't see how implementing them (which would
be MD anyway) would cause any negative impact on other archs.
If some archs can not do it, fine (as Elad mentioned, we do the same for
non-executable stack already).
Martin
No.9 | | 403 bytes |
| 
Do you know any serious benchmark results especially for ASLR on any S?
I realy have no idea what magnitude we are talking about here (and it's
certainly arch dependend).
I understand you are saying (a) make it optional and (b) let the user
decide if he is willing to take the hit - but I think knowing the effects
a bit better will help discussing this.
Martin
No.10 | | 587 bytes |
| 
Dec 18, 2005, at 4:03 AM, Martin Husemann wrote:
Sun, Dec 18, 2005 at 04:27:09AM -0600, James Chacon wrote:
>Things need to fit into a cross
>architecture framework that don't negatively impact other archs.
>
The mprotect changes are arch dependend, and as gimpy pointed out
would
violate ABI on some archs. I don't see how implementing them (which
would
be MD anyway) would cause any negative impact on other archs.
Yah, these would have to be implemented using new pmap hooks.
-- thorpej
No.11 | | 549 bytes |
| 
Sun, Dec 18, 2005 at 12:12:50PM +0200, Elad Efrat wrote:
ASLR calculates 3 random values on execution and saves these as offsets
to be used when a random value is needed. How expensive are 3
arc4random() calls in the context of an entire sys_execve()?
I think you're missing the point. I've repeatedly asked about the
performance impact of these changes on machines with virtually-addressed
caches -- and AFAICT you haven't responded.
Would you please respond to that question?
Thor
No.12 | | 608 bytes |
| 
Jason Thorpe wrote:
Dec 18, 2005, at 4:03 AM, Martin Husemann wrote:
>The mprotect changes are arch dependend, and as gimpy pointed out would
>violate ABI on some archs. I don't see how implementing them (which
>would
>be MD anyway) would cause any negative impact on other archs.
Yah, these would have to be implemented using new pmap hooks.
Why wouuld these require pmap changes? they (should) be pure
mmap/mprotect changes. Do you want to implement them in pmap because
they are MD, or is there a technical need?
-e.
No.13 | | 3322 bytes |
| 
18 Dec 2005 at 12:04, Elad Efrat wrote:
Matt Thomas wrote:
Some architectures (powerpc) have ABIs that mandate executable bss/data.
There is no way to turn that off.
True, but two things:
actually, half-true. the ABI in question i guess is that of SysV, and
it has some sillyness in it (blrl in GT[-1] that's not even marked as
executable). the bigger problem is that it's runtime generated
(like many other archs), and that's the security problem. recognizing
this, some folks at Red Hat earlier this year had developed a better
mechanism they call secureplt, it's available for ppc and alpha already,
and similar can (and should) be implemented on other archs as well
(an item that's been also on my todo list for too long). see [1]-[4]
for more info.
If you have textrel relocations, you have to allow this.
True. However, PaX is aware of text relocations, and addresses them.
if you have textrels, it's best to fix them. see [5] why they're bad,
especially for security (basically, they need the privilege for runtime
code generation).
I think the
right way to have add a PRT_LCKED that mprotect can understand. If
this is supplied, the protection on such pages can't be downgraded.
munmap would still be allowed (it might be advisable to make munmap
change the region to PRT_NNE but leave it in place so it can't be
remapped.)
a better (more generic) way would be to extend mmap/mprotect to be
able to specify what becomes maxprot, in addition to prot, so that
after textrels (or any kind of runtime code generation) are done,
userland can revoke maxprot bits, effectively sealing the protection
on those pages. this requires userland changes (in ld.so at least),
so i didn't implement it in linux this way, instead some heuristics
is used in the kernel to do the same (this is just FYI, you can do
it the proper way in NetBSD as you control userland as well).
The program itself is linked at a base address.
FYI, it has always been possible to create executables in the ET_DYN
ELF format and have them load at random addresses, i've done this more
than 4 years ago under linux, and for something like 2 years now Red
Hat added official toolchain support for it. they call it PIE (as in,
Position Independent Executable), and it's available in gcc 3.3+ and
binutils 2.15+. of course your kernel's ELF loader has to support
ET_DYN executables as well, which afaik, it doesn't at the moment.
the libraries and stacks
could be vary but on machines with virtual caches, that might be very painful.
since this virtually indexed cache problem came up more than once, i'd
like to ask what exactly you're referring to. i assume that it's something
to do with address space switches and avoiding cache flushes (someone also
mentioned TLB, but i don't see how that figures in here), but i need some
insights as to what you see as a problem. be as technical as you can be
(pointers to code are appreciated, i'm not too familiar with UVM and pmaps).
more comments in other followups.
[1]
[2]
[3]
[4]
[5]
No.14 | | 827 bytes |
| 
Elad Efrat wrote:
Jason Thorpe wrote:
>Dec 18, 2005, at 4:03 AM, Martin Husemann wrote:
The mprotect changes are arch dependend, and as gimpy pointed out would
violate ABI on some archs. I don't see how implementing them (which
would
be MD anyway) would cause any negative impact on other archs.
>>
>>
>>Yah, these would have to be implemented using new pmap hooks.
Why wouuld these require pmap changes? they (should) be pure
mmap/mprotect changes. Do you want to implement them in pmap because
they are MD, or is there a technical need?
My proposal doesn't need pmap hooks. However, VM placement does (see
PMAP_PREFER() for instance) for virtual caches.
No.15 | | 1028 bytes |
| 
18 Dec 2005 at 23:05, matthew green wrote:
ASLR calculates 3 random values on execution and saves these as offsets
to be used when a random value is needed. How expensive are 3
arc4random() calls in the context of an entire sys_execve()?
you fail to understand the performance issue here. when, eg, libc is
not mapped at the same address as other processes, the performance hit
is in the range of 30-40% on some platforms. it's not about start up
it is about the MMU being constantly trashed.
(while waiting for more details on this VI cache issue), i'd like to
point out that the way i implemented ASLR on linux is that the generic
(arch independent) kernel code uses per-arch defined constants to derive
the amount and position of randomization that is to be applied to the
given memory regions. on archs that can't use randomization you'd simply
set these constants to 0 (or whatever that disables it) and be done
with it, a win-win situation.
No.16 | | 4041 bytes |
| 
pageexec (AT) freemail (DOT) hu wrote:
18 Dec 2005 at 12:04, Elad Efrat wrote:
>>Matt Thomas wrote:
>>
>>
Some architectures (powerpc) have ABIs that mandate executable bss/data.
There is no way to turn that off.
>>
>>True, but two things:
actually, half-true. the ABI in question i guess is that of SysV, and
it has some sillyness in it (blrl in GT[-1] that's not even marked as
executable). the bigger problem is that it's runtime generated
(like many other archs), and that's the security problem. recognizing
this, some folks at Red Hat earlier this year had developed a better
mechanism they call secureplt, it's available for ppc and alpha already,
and similar can (and should) be implemented on other archs as well
(an item that's been also on my todo list for too long). see [1]-[4]
for more info.
I know about secureplt. But that doesn't help for backwards compatibility.
If you have textrel relocations, you have to allow this.
>>
>>True. However, PaX is aware of text relocations, and addresses them.
if you have textrels, it's best to fix them. see [5] why they're bad,
especially for security (basically, they need the privilege for runtime
code generation).
I know why they are bad. But they
I think the
right way to have add a PRT_LCKED that mprotect can understand. If
this is supplied, the protection on such pages can't be downgraded.
munmap would still be allowed (it might be advisable to make munmap
change the region to PRT_NNE but leave it in place so it can't be
remapped.)
a better (more generic) way would be to extend mmap/mprotect to be
able to specify what becomes maxprot, in addition to prot, so that
after textrels (or any kind of runtime code generation) are done,
userland can revoke maxprot bits, effectively sealing the protection
on those pages. this requires userland changes (in ld.so at least),
so i didn't implement it in linux this way, instead some heuristics
is used in the kernel to do the same (this is just FYI, you can do
it the proper way in NetBSD as you control userland as well).
That's an ABI change (if you are talking extra arguments).
it reduces down to the same thing as my proposal.
The program itself is linked at a base address.
FYI, it has always been possible to create executables in the ET_DYN
ELF format and have them load at random addresses, i've done this more
than 4 years ago under linux, and for something like 2 years now Red
Hat added official toolchain support for it. they call it PIE (as in,
Position Independent Executable), and it's available in gcc 3.3+ and
binutils 2.15+. of course your kernel's ELF loader has to support
ET_DYN executables as well, which afaik, it doesn't at the moment.
PIE? Ewww. :) PIE was primarily intended for small embedded systems.
the libraries and stacks
could be vary but on machines with virtual caches, that might be very painful.
since this virtually indexed cache problem came up more than once, i'd
like to ask what exactly you're referring to. i assume that it's something
to do with address space switches and avoiding cache flushes (someone also
mentioned TLB, but i don't see how that figures in here), but i need some
insights as to what you see as a problem. be as technical as you can be
(pointers to code are appreciated, i'm not too familiar with UVM and pmaps).
Some machines require all mappings of a physical page be to compatible virtual
addresses where the low N (16-24) bits have same value. That retricts where
you can map a page in the virtual address space.
No.17 | | 641 bytes |
| 
Sun, Dec 18, 2005 at 12:12:50PM +0200, Elad Efrat wrote:
Some information that might help figuring what would be the costs of
ASLR and MPRTECT:
ASLR calculates 3 random values on execution and saves these as offsets
to be used when a random value is needed. How expensive are 3
arc4random() calls in the context of an entire sys_execve()?
Wouldn't it prevent future optimizations of the dynamic linker, which
might require constant and known addresses of dynamic libraries? I think
IRIX does that (don't know how RelCache was designed, maybe it applies
there too).
Pavel Cahyna
No.18 | | 2514 bytes |
| 
18 Dec 2005 at 13:49, Matt Thomas wrote:
I know about secureplt. But that doesn't help for backwards compatibility.
secureplt doesn't as it wasn't designed for that. however i'm not sure
if it's impossible to come up with a way that can be used on non-recompiled
binaries as well. some years ago when i looked at sparc, i could think of
ways to avoid runtime plt stub generation for binaries that weren't designed
for it, maybe it's possible for ppc and others as well. in any case, this
is just a tangent on how you deploy non-executable pages, not about whether
we (well, you) need/want it or not. if you want to keep ppc use executable
bss/data/heap/whatever, so be it, that should surely not affect other archs
that can do with the non-executable variant.
if you have textrels, it's best to fix them. see [5] why they're bad,
especially for security (basically, they need the privilege for runtime
code generation).
I know why they are bad. But they
i think something's missing here?
That's an ABI change (if you are talking extra arguments).
it reduces down to the same thing as my proposal.
i was thinking of extra flags passed in along with prot or flags,
similar to your proposal, but the difference is that i'd allow more
fine grained control over the protection bits, feels a bit more
unix-like to me, but it's a detail i'll let you guys work out ;-).
PIE? Ewww. :) PIE was primarily intended for small embedded systems.
i think you're mixing it up with something else, PIE was explicitly
created to address the main executable randomization problem [1]:
"This option creates something between a shared library and normal
executable, which can be used for security exposed binaries so that their
base address can be randomized (either a constant address different on
each box through prelink -R (support for PIEs in prelink will be comming),
or totally random address)."
Some machines require all mappings of a physical page be to compatible virtual
addresses where the low N (16-24) bits have same value. That retricts where
you can map a page in the virtual address space.
oh, this problem, now i see it. and it's not a problem at all, as it
is already taken into account (speaking of PaX/linux), randomization
simply does not affect these bits.
[1]
No.19 | | 342 bytes |
| 
Pavel Cahyna wrote:
Wouldn't it prevent future optimizations of the dynamic linker, which
might require constant and known addresses of dynamic libraries? I think
IRIX does that (don't know how RelCache was designed, maybe it applies
there too).
Let's leave this decision for the end-user to make.
-e.
No.20 | | 845 bytes |
| 
Mon, Dec 19, 2005 at 12:23:17AM +0200, Elad Efrat wrote:
Pavel Cahyna wrote:
Wouldn't it prevent future optimizations of the dynamic linker, which
might require constant and known addresses of dynamic libraries? I think
IRIX does that (don't know how RelCache was designed, maybe it applies
there too).
Let's leave this decision for the end-user to make.
Fine. If you implement this, can you please make the decision controllable
per-process, rather than per-system? E. g. with some proc.<pid>.xxx
sysctl. Because if any such optimizatoon appears, it will make sense to
enable randomization for processes where exec time is not a bottleneck and
are exposed to attacks (like sshd, bind, or setuid executables) but disable
it for other processes.
Pavel Cahyna
No.21 | | 1008 bytes |
| 
Sun, Dec 18, 2005 at 10:46:57PM +0100, pageexec (AT) freemail (DOT) hu wrote:
18 Dec 2005 at 23:05, matthew green wrote:
ASLR calculates 3 random values on execution and saves these as offsets
to be used when a random value is needed. How expensive are 3
arc4random() calls in the context of an entire sys_execve()?
you fail to understand the performance issue here. when, eg, libc is
not mapped at the same address as other processes, the performance hit
is in the range of 30-40% on some platforms. it's not about start up
it is about the MMU being constantly trashed.
(while waiting for more details on this VI cache issue)
There are several issues. The most obvious one, it seems to me, is that
this is going to repeatedly flush and reload libc when it ought to stay
resident in the cache, since it will be at a different virtual address in
each process. If that's not the case, I'd like to know why it's not the
case.
No.22 | | 602 bytes |
| 
Pavel Cahyna wrote:
Fine. If you implement this, can you please make the decision controllable
per-process, rather than per-system? E. g. with some proc.<pid>.xxx
sysctl. Because if any such optimizatoon appears, it will make sense to
enable randomization for processes where exec time is not a bottleneck and
are exposed to attacks (like sshd, bind, or setuid executables) but disable
it for other processes.
Sure. PaX already does something similar using its own ELF program
header to store related flags; I'll look into doing the same for
NetBSD.
-e.
No.23 | | 458 bytes |
| 
18 Dec 2005 at 13:09, Martin Husemann wrote:
Do you know any serious benchmark results especially for ASLR on any S?
I realy have no idea what magnitude we are talking about here (and it's
certainly arch dependend).
on linux/i386/amd64 and for a kernel compilation (say 20 minutes of
gcc/ld/as traffic) it's basically in the noise, hard to produce
reliable numbers that show a definitive impact one way or another.
No.24 | | 864 bytes |
| 
18 Dec 2005 at 23:20, Pavel Cahyna wrote:
Wouldn't it prevent future optimizations of the dynamic linker, which
might require constant and known addresses of dynamic libraries? I think
IRIX does that (don't know how RelCache was designed, maybe it applies
there too).
you're right, randomization is in direct conflict with prelinking
(as it's called in the linux world), so you can have only one or
the other, not both (actually, with some extra logic you could
use a mix but it still wouldn't get back the full benefit of
prelinking). on the other hand, solaris has -Bdirect which is
another (and randomization compatible) way of speeding up runtime
linking, you might want to explore that path instead. there was
also a recent proposal for binutils to include -Bdirect [1].
[1]
No.25 | | 803 bytes |
| 
18 Dec 2005 at 17:38, Thor Lancelot Simon wrote:
There are several issues. The most obvious one, it seems to me, is that
this is going to repeatedly flush and reload libc when it ought to stay
resident in the cache, since it will be at a different virtual address in
each process. If that's not the case, I'd like to know why it's not the
case.
let's turn the question around: what makes you think that a VIVT
cache does not need to be flushed on a context switch? it's an
instant local root if it's not (i modify my libc, incoming suid
app happily executes my code). and as i said before, if an arch
cannot have randomization (e.g., transmeta/CMS is known to suffer),
then it won't, still nothing is lost for the rest.
No.26 | | 568 bytes |
| 
Mon, Dec 19, 2005 at 12:23:17AM +0200, Elad Efrat wrote:
Pavel Cahyna wrote:
Wouldn't it prevent future optimizations of the dynamic linker, which
might require constant and known addresses of dynamic libraries? I think
IRIX does that (don't know how RelCache was designed, maybe it applies
there too).
Let's leave this decision for the end-user to make.
Since when does the end user design linker optmizations? This is important
to people doing future development work on the system as well.
James
No.27 | | 769 bytes |
| 
Sun, Dec 18, 2005 at 11:44:07PM +0100, pageexec (AT) freemail (DOT) hu wrote:
18 Dec 2005 at 13:09, Martin Husemann wrote:
Do you know any serious benchmark results especially for ASLR on any S?
I realy have no idea what magnitude we are talking about here (and it's
certainly arch dependend).
on linux/i386/amd64 and for a kernel compilation (say 20 minutes of
gcc/ld/as traffic) it's basically in the noise, hard to produce
reliable numbers that show a definitive impact one way or another.
I beleive he asked for benchmarks. "Compiling up a kernel" is not a serious
benchmark. Has this had a real benchmark suite run across a system with
it not compiled into the system vs running on a system?
James
No.28 | | 1617 bytes |
| 
18 Dec 2005 at 17:42, James Chacon wrote:
on linux/i386/amd64 and for a kernel compilation (say 20 minutes of
gcc/ld/as traffic) it's basically in the noise, hard to produce
reliable numbers that show a definitive impact one way or another.
I beleive he asked for benchmarks. "Compiling up a kernel" is not a serious
benchmark.
is it not? do you know what a benchmark is? [1]
benchmark measures things you're interested in. in this case,
you want to know the performance impact of randomized address
space layouts. that means your benchmark has to create address
spaces (the more the better) and execute code in them (the more
address manipulation and memory accesses the better). tell me
what satisfies both better than compiling thousands of files
(all cached after the first run, to eliminate disk i/o by the
way)? you get lots of address space creation (that's where the
randomization related kernel changes play a role), and lots of
userland pointer activity (if you know what gcc does internally).
if you want a different macrobenchmark, feel free to suggest one
(as long as it's open source). and in any case, the question was
'order of magnitude' of performance impact, not impact with ppm
precision, you get the latter after you will have implemented
this code on NetBSD.
Has this had a real benchmark suite run across a system with
it not compiled into the system vs running on a system?
define 'real' (and open source) and i'll give you numbers.
[1]
No.29 | | 1026 bytes |
| 
pageexec (AT) freemail (DOT) hu wrote:
18 Dec 2005 at 13:49, Matt Thomas wrote:
>>PIE? Ewww. :) PIE was primarily intended for small embedded systems.
i think you're mixing it up with something else, PIE was explicitly
created to address the main executable randomization problem [1]:
"This option creates something between a shared library and normal
executable, which can be used for security exposed binaries so that their
base address can be randomized (either a constant address different on
each box through prelink -R (support for PIEs in prelink will be comming),
or totally random address)."
PIE also forces a portion of .text to be nonshared (any relative relocations
that could be fixed in a based image will no longer be shared among multiple
processes). It will increase the complexity of program loading which is
already very complex.
Are all programs built/linked at PIE, or just a subset?