Interface to change NFS exports
25 answers - 3009 bytes -

Hi everybody,
while adding NFS support to tmpfs, I found that the current way to
change NFS export information is how could I say it very ugly.
It seems to be a bandaid over what was used in the past to mount
FFS systems, which makes it confusing and difficult to extend.
(Don't we aim for clean design? ;-)
First of all, the mountd(8) code expects that all file system's
mount arguments structures start with the following fields:
char *fspec;
struct export_args *export;
This is because it defines a big union of all known file systems,
assuming that all of them have these two fields at the very
beginning. Furthermore, mountd(8) must know about all
NFS-aware file systems, needing patching every time you add
a new one. (There are XXX marks in the code saying that this
should be improved.)
Then we get to the kernel part. The MNT_UPDATE flag is
abused, together with the fspec field, to change the export
information. A file system has to assume that, if MNT_UPDATE
is given and fspec is NULL, then it has to update the export
data.
Updating NFS export information is common to all NFS-aware
file systems, so I believe this can be abstracted into the VFS
layer to simplify the code in each file system.
What I have done is the following:
- Define two new mount flags, MNT_GETEXPRT and MNT_SETEXPRT.
These change the semantics of the data parameter passed to
mount(2), making it receive a struct export_args structure instead
of the file system's custom one. The purpose of each of them is
clear from their name.
- Change the mount syscall routine to recognize these two flags
and handle them completely on its own (the underlying fs never
sees them). It only needs to fetch a structure from userland and
parse it or viceversa.
- As a side effect, as the information is now available in the generic
mount point structure, define a vfs_stdcheckexp routine that can
be used in most file systems (haven't checked yet if this will be
useful in all cases), thus removing redundancy from them all.
- Simplify mountd(8) a lot by removing al fs-specific details from it,
using struct export_args exclusively as a way to communicate
between userland and the kernel.
- Remove NFS-specific bits from the mount_* utilities, as this can
now be done as a default operation during the initial mount.
I don't know if this is the best way to do this (and the patch is
not yet "clean"[*]), so comments will be appreciated. Even though,
I feel this is far better than the current approach.
As a proof of concept, I've converted ffs and tmpfs to follow these
ideas. Note how the code is simpler overall. The patch can be
found here:
[*] I specially don't like very much the new mnt_nfs boolean flag,
but couldn't find a better place for it.
Please comment.
Thank you,
No.1 | | 680 bytes |
| 
Julio M. Merino Vidal wrote:
>Hi everybody,
>
>while adding NFS support to tmpfs, I found that the current way to
>change NFS export information is how could I say it very ugly.
>It seems to be a bandaid over what was used in the past to mount
>FFS systems, which makes it confusing and difficult to extend.
>(Don't we aim for clean design? ;-)
Hi Julio,
I agree with most of these changes, but I'd like to see an additional
change: overloading the mount system call with export functionality is
just wrong. There should be a seperate exportfs system call.
- Frank
No.2 | | 537 bytes |
| 
Is the notion of 'exporting' NFS-specific still? While I'm not aware
of another NFS-like protocol in wide use (afs and coda don't serve
local filesystems), it seems the notion of controlling what parts of
the filesystem tree can be served over NFS-like protocols is more
general than just NFS. Then, one would perhaps want to control
exports separately for separate protocols.
That said, I don't object to what you are doing now, which seems
orthogonal to the multiprotocol export issue.
No.3 | | 634 bytes |
| 
Sun, 11 Sep 2005 10:16:09 -0400, Greg Troxel wrote:
Is the notion of 'exporting' NFS-specific still? While I'm not aware
of another NFS-like protocol in wide use (afs and coda don't serve
local filesystems), it seems the notion of controlling what parts of
the filesystem tree can be served over NFS-like protocols is more
general than just NFS. Then, one would perhaps want to control
exports separately for separate protocols.
Is there any other NFS-like protocol except NFS? That is, which would
be stateless and would use filehandles to access files?
ByePavel
No.4 | | 811 bytes |
| 
[ originally sent to tech-net@ by mistake ]
Sun, 11 Sep 2005 10:16:09 -0400, Greg Troxel wrote:
Is the notion of 'exporting' NFS-specific still? While I'm not aware
of another NFS-like protocol in wide use (afs and coda don't serve
local filesystems), it seems the notion of controlling what parts of
the filesystem tree can be served over NFS-like protocols is more
general than just NFS. Then, one would perhaps want to control
exports separately for separate protocols.
That said, I don't object to what you are doing now, which seems
orthogonal to the multiprotocol export issue.
Is there any other NFS-like protocol except NFS? That is, which would
be stateless and would use filehandles to access files?
ByePavel
No.5 | | 998 bytes |
| 
Sun, Sep 11, 2005 at 01:20:58PM +0200, Julio M. Merino Vidal wrote:
Hi everybody,
while adding NFS support to tmpfs, I found that the current way to
change NFS export information is how could I say it very ugly.
It seems to be a bandaid over what was used in the past to mount
FFS systems, which makes it confusing and difficult to extend.
(Don't we aim for clean design? ;-)
[]
Hi,
while you're at it, could you look at fixing a very long outstanding
problem ? A /etc/rc.d/mountd reload isn't atomic, there is a window
in which no filesystems are exported at all, and if a request comes in
at this time, nfsd replies with a "permission denied".
At first glance, we would need to keep 2 export list in kernel and switch
from one to the other, much like what IPF does with the filters.
I don't ask you implement this, but as you're planning to change the interface,
please think about it in the new one :)
No.6 | | 1828 bytes |
| 
Mon, Sep 12, 2005 at 01:08:07AM +0200, Manuel Bouyer wrote:
Sun, Sep 11, 2005 at 01:20:58PM +0200, Julio M. Merino Vidal wrote:
Hi everybody,
while adding NFS support to tmpfs, I found that the current way to
change NFS export information is how could I say it very ugly.
It seems to be a bandaid over what was used in the past to mount
FFS systems, which makes it confusing and difficult to extend.
(Don't we aim for clean design? ;-)
[]
Hi,
while you're at it, could you look at fixing a very long outstanding
problem ? A /etc/rc.d/mountd reload isn't atomic, there is a window
in which no filesystems are exported at all, and if a request comes in
at this time, nfsd replies with a "permission denied".
At first glance, we would need to keep 2 export list in kernel and switch
from one to the other, much like what IPF does with the filters.
Not necessarily.
I think part of the problem is how mountd does things, though to be
honest, I have avoided looking the code. :-) I think if mountd were
changed to build up state then apply it, we could achieve an atomic update
w/o multiple lists in the kernel.
I don't ask you implement this, but as you're planning to change the
interface, please think about it in the new one :)
I think all that would be needed would be for there to be a way to upload
multiple export entries at once. That way we can say, "here, this is the
new export list."
I agree that all Julio would need to do now is think about how we add
multiple entries at once, and we'd be prepared for this in the future.
Take care,
Bill
PGP SIGNATURE
Version: GnuPG v1.2.3 (NetBSD)
9FL/E3rbxLF/D2N=
=QDd3
PGP SIGNATURE
No.7 | | 365 bytes |
| 
while you're at it, could you look at fixing a very long outstanding
problem ? A /etc/rc.d/mountd reload isn't atomic, there is a window
in which no filesystems are exported at all, and if a request comes in
at this time, nfsd replies with a "permission denied".
fyi, openbsd guys seem to be working on it.
YAMAMT Takashi
No.8 | | 2233 bytes |
| 
Sun, Sep 11, 2005 at 08:48:14PM -0600, Greg wrote:
I don't ask you implement this, but as you're planning to change the
interface, please think about it in the new one :)
I think all that would be needed would be for there to be a way to upload=
=20
multiple export entries at once. That way we can say, "here, this is the=20
new export list."
I agree that all Julio would need to do now is think about how we add=20
multiple entries at once, and we'd be prepared for this in the future.
If we're making a "shopping list" of changes we'd like to see here ;)
When checking to see if an NFS export is allowed, Solaris appears
to do a lookup of the IP address at the time the mount request is made,
rather than building a table of IP addresses for the hosts at the
time mountd is run (as NetBSD does). Ignoring the fact that Dynamic
DNS may be evil, this means that Solaris behaves much better with
hosts that happen to be down (and have lost their lease) when mountd
is restarted, than does NetBSD. (NetBSD gets incredibly unhappy
because it can't find an IP address for the host at the time mountd
is run, and so then refuses to run mountd, shutting all hosts out,
not just the one that might be temporarily off-line. This is
arguably a security feature, but, well, if you're running NFS, you
may have Security Issues anyway :-} )
But IM it'd be way cool if NetBSD could do the same as Solaris and
delay the lookup of the IP address until the point where the mount
request is made
I think solaris works in a different way than NetBSD does (at last it used to):
there is no check done in the kenrel at the NFS level, only by mountd
when a client requests a filehandle at mount time. This means that once you
know a filehandle (and you could find one by trying random values), you
can access a filesystem on the server, even if your IP is not allowed.
Now it would be possible to allow dynamic names with an export list in the
kernel, this just means that mountd would have to install a new export list
in the kernel each time a new name->ip translation is discovered.
No.9 | | 2608 bytes |
| 
9/11/05, Frank van der Linden <fvdl (AT) netbsd (DOT) orgwrote:
Julio M. Merino Vidal wrote:
>Hi everybody,
>
>while adding NFS support to tmpfs, I found that the current way to
>change NFS export information is how could I say it very ugly.
>It seems to be a bandaid over what was used in the past to mount
>FFS systems, which makes it confusing and difficult to extend.
>(Don't we aim for clean design? ;-)
>
>
Hi Julio,
I agree with most of these changes, but I'd like to see an additional
change: overloading the mount system call with export functionality is
just wrong. There should be a seperate exportfs system call.
We all agree in that a new system call is needed. Some also want this
new interface to not only manage NFS exports but also to allow changing
other settings from a mount point. I think this is a good idea too.
Given these comments, I've started the implementation of a fsctl(2)
function call, with the following signature:
int fsctl(const char *path, enum fsctl_command command, void *data);
At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
based on the given path. (Minor question: can an enum be used as a
system call argument, or should I better use an integer? If not, why?)
The problem with this interface is that it doesn't let you change
multiple mount points atomically, as some others have suggested.
I also agree that having this feature could be nice.
In order to allow this, I think the following could make more sense
(none of the stuff below is implemented, so it may contain errors):
int fsctl(enum fsctl_command command, struct fsctl_data *data,
size_t ndata);
where 'data' is a map between paths and structures, as in:
struct fsctl_data {
const char fd_path[MAXPATHLEN];
void *fd_data;
};
Given this interface, each 'command' can provide its own structure
for the fd_data parameter. E.g., the FSCTL_EXPRT_NFS_{GET,SET}
calls could return or receive, respectively, a struct export_args.
Also, each command can operate on multiple mount points atomically.
In the (near) future, we could migrate MNT_GETARGS and MNT_UPDATE
to this new system call, as well as other stuff like the quota
management.
Do you think this is correct and flexible enough for the current and
future purposes?
Thanks,
No.10 | | 3322 bytes |
| 
Mon, Sep 12, 2005 at 11:05:21AM +0200, Julio M. Merino Vidal wrote:
9/11/05, Frank van der Linden <fvdl (AT) netbsd (DOT) orgwrote:
I agree with most of these changes, but I'd like to see an additional
change: overloading the mount system call with export functionality is
just wrong. There should be a seperate exportfs system call.
We all agree in that a new system call is needed. Some also want this
new interface to not only manage NFS exports but also to allow changing
other settings from a mount point. I think this is a good idea too.
Given these comments, I've started the implementation of a fsctl(2)
function call, with the following signature:
int fsctl(const char *path, enum fsctl_command command, void *data);
At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
based on the given path. (Minor question: can an enum be used as a
system call argument, or should I better use an integer? If not, why?)
Use and integer and steal the implementation from ioctl() and fcntl().
we could decide to implement all of this as fcntl()s on a file in the file
system (which is what HPUX does AFAICT, though they have a special fsctl()
call).
Actualy, why _can't_ we just use fcntl() to do it? Then we don't have to
add a new system call.
The problem with this interface is that it doesn't let you change
multiple mount points atomically, as some others have suggested.
I also agree that having this feature could be nice.
I don't think we want to change multiple mount points at once. I think
what we really want to do is change multiple exports on one mount point at
once. Thus we want multiple data payloads to one destination, not multiple
destinations.
In order to allow this, I think the following could make more sense
(none of the stuff below is implemented, so it may contain errors):
int fsctl(enum fsctl_command command, struct fsctl_data *data,
size_t ndata);
where 'data' is a map between paths and structures, as in:
struct fsctl_data {
const char fd_path[MAXPATHLEN];
void *fd_data;
};
Given this interface, each 'command' can provide its own structure
for the fd_data parameter. E.g., the FSCTL_EXPRT_NFS_{GET,SET}
calls could return or receive, respectively, a struct export_args.
Also, each command can operate on multiple mount points atomically.
I think our lives will be much simpler if we don't do this. If we try to
change multiple mount points at once, we could get into all sorts of
update and locking issues. I think one mount point per call will work
quite well and will be very solid.
In the (near) future, we could migrate MNT_GETARGS and MNT_UPDATE
to this new system call, as well as other stuff like the quota
management.
Sounds good.
Do you think this is correct and flexible enough for the current and
future purposes?
As above, I like the idea but I want to change only one mount point at
once.
PGP SIGNATURE
Version: GnuPG v1.2.3 (NetBSD)
5BAox/zyp2pHnNYukfucnNo=
=4BcJ
PGP SIGNATURE
No.11 | | 2144 bytes |
| 
Mon, Sep 12, 2005 at 11:00:35AM +0200, Manuel Bouyer wrote:
Sun, Sep 11, 2005 at 08:48:14PM -0600, Greg wrote:
If we're making a "shopping list" of changes we'd like to see here ;)
When checking to see if an NFS export is allowed, Solaris appears
to do a lookup of the IP address at the time the mount request is made,
rather than building a table of IP addresses for the hosts at the
time mountd is run (as NetBSD does). Ignoring the fact that Dynamic
DNS may be evil, this means that Solaris behaves much better with
hosts that happen to be down (and have lost their lease) when mountd
is restarted, than does NetBSD. (NetBSD gets incredibly unhappy
because it can't find an IP address for the host at the time mountd
is run, and so then refuses to run mountd, shutting all hosts out,
not just the one that might be temporarily off-line. This is
arguably a security feature, but, well, if you're running NFS, you
may have Security Issues anyway :-} )
But IM it'd be way cool if NetBSD could do the same as Solaris and
delay the lookup of the IP address until the point where the mount
request is made
I think solaris works in a different way than NetBSD does (at last it used to):
there is no check done in the kenrel at the NFS level, only by mountd
when a client requests a filehandle at mount time. This means that once you
know a filehandle (and you could find one by trying random values), you
can access a filesystem on the server, even if your IP is not allowed.
Now it would be possible to allow dynamic names with an export list in the
kernel, this just means that mountd would have to install a new export list
in the kernel each time a new name->ip translation is discovered.
I think that would be fine. I think if we teach mountd about new and old
export lists, then we may eventually also add a "delete this, add this"
update operation.
Take care,
Bill
PGP SIGNATURE
Version: GnuPG v1.2.3 (NetBSD)
TvvYuVjxq3ojnNQG7F9+TI8=
=5TVW
PGP SIGNATURE
No.12 | | 3699 bytes |
| 
9/12/05, Bill Studenmund <wrstuden (AT) netbsd (DOT) orgwrote:
Mon, Sep 12, 2005 at 11:05:21AM +0200, Julio M. Merino Vidal wrote:
9/11/05, Frank van der Linden <fvdl (AT) netbsd (DOT) orgwrote:
I agree with most of these changes, but I'd like to see an additional
change: overloading the mount system call with export functionality is
just wrong. There should be a seperate exportfs system call.
We all agree in that a new system call is needed. Some also want this
new interface to not only manage NFS exports but also to allow changing
other settings from a mount point. I think this is a good idea too.
Given these comments, I've started the implementation of a fsctl(2)
function call, with the following signature:
int fsctl(const char *path, enum fsctl_command command, void *data);
At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
based on the given path. (Minor question: can an enum be used as a
system call argument, or should I better use an integer? If not, why?)
Use and integer and steal the implementation from ioctl() and fcntl().
K, but is it anything wrong in using an enum? (Such as the type to
represent it may change or something like that?)
we could decide to implement all of this as fcntl()s on a file in the file
system (which is what HPUX does AFAICT, though they have a special fsctl()
call).
Actualy, why _can't_ we just use fcntl() to do it? Then we don't have to
add a new system call.
I think we can, but IMH, it looks wired. Why do I have to open a file
when what I really want to do is get or set the properties of a mount
point? I think that passing a path and letting the kernel do what it needs
looks better (as in statvfs(2)). (We'd later discuss if the call should
enforce the path to refer exactly to a mount point or any file within it.)
Also, if we used fcntl, we'd have two "authentication" levels in the
trace. First, open the file using the corresponding permissions, flags,
etc. Then, do the fcntl, which will fail in most cases if the user is not
root.
The problem with this interface is that it doesn't let you change
multiple mount points atomically, as some others have suggested.
I also agree that having this feature could be nice.
I don't think we want to change multiple mount points at once. I think
what we really want to do is change multiple exports on one mount point at
once. Thus we want multiple data payloads to one destination, not multiple
destinations.
Ah, K! I misunderstood the initial suggestion. Although it seemed
nice to me at first, I can't think of a good reason about why it could be
useful (I mean, the ability to change multiple paths at once).
Given this interface, each 'command' can provide its own structure
for the fd_data parameter. E.g., the FSCTL_EXPRT_NFS_{GET,SET}
calls could return or receive, respectively, a struct export_args.
Also, each command can operate on multiple mount points atomically.
I think our lives will be much simpler if we don't do this. If we try to
change multiple mount points at once, we could get into all sorts of
update and locking issues. I think one mount point per call will work
quite well and will be very solid.
True. I was trying to work around the locking issues by sorting the
mount points, avoiding duplicates and locking them in order (which I think
should work). But this won't be needed at all.
No.13 | | 2052 bytes |
| 
Modulo the "new system call" that's fallen out of this
thread, I did exactly what you originally reported
you'd done (cleaning up the exporting code, centralizing
export checks, etc) some years back. I've been meaning
to try to get the company for whom I'd done that work
to release it, butwellthey went out of business,
and I didn't want to do the legwork to free the intellectual
property, as it were.
Anyhoo, some wis-dumb I'd like to pass along: you may want to keep the
export-checking mechanism itself pluggable, so you can swap in/out
that part of the code that decides whether any given filehandle can
be referenced by a client. This lets you drop in Solaris-style
exporting (on a per-directory tree, per CIDR block) alongside some
Kerberos-based authorization mechanism alongside the traditional
BSD mechanism of exporting (per mountpoint, per CIDR block).
My implementation did the obvious and hung another vtbl-like
vector of function pointers off each mount point. From memory,
the entry points were things like "check this file handle for
client access," "here's a chunk of data, figure out how to
parse it into your special rules for allowing client access,"
and so on.
Mind you, I never did actually *implement* an auth mech other
than one like Solaris's, but my thinking was that a mount
option would pick which mechanism to use. In the new world,
I guess that'd be just another call to the proposed fsctl().
If you're already grotting around in that part of the code,
I think it'd be easy enough to make what you've done be
pluggable, so that different sets of exporting rules could
be used. I only mention it as something to consider. Too,
I guess I also claim an existence proof that it can be done,
and that doing it was once thought to be sensible by folks
who've been where you are now.
Chris <jepeway (AT) blasted-heath (DOT) com>.
No.14 | | 1662 bytes |
| 
In article <6b2d1e1905091212206c59c5ea (AT) mail (DOT) gmail.comyou write:
9/12/05, Bill Studenmund <wrstuden (AT) netbsd (DOT) orgwrote:
>Mon, Sep 12, 2005 at 11:05:21AM +0200, Julio M. Merino Vidal wrote:
>We all agree in that a new system call is needed. Some also want this
>new interface to not only manage NFS exports but also to allow changing
>other settings from a mount point. I think this is a good idea too.
>>
>Given these comments, I've started the implementation of a fsctl(2)
>function call, with the following signature:
>>
>int fsctl(const char *path, enum fsctl_command command, void *data);
>>
>At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
>FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
>based on the given path. (Minor question: can an enum be used as a
>system call argument, or should I better use an integer? If not, why?)
>>
>Use and integer and steal the implementation from ioctl() and fcntl().
>
>K, but is it anything wrong in using an enum? (Such as the type to
>represent it may change or something like that?)
Precisely that. The ARM EABI, for instance (not that NetBSD follows it
yet), requires that enums be stored in the smallest integer type that will
hold them. This kind of thing makes them dangerous to use in any interface,
since they're liable to change size when new values are introduced.
No.15 | | 4638 bytes |
| 
Mon, Sep 12, 2005 at 09:20:13PM +0200, Julio M. Merino Vidal wrote:
9/12/05, Bill Studenmund <wrstuden (AT) netbsd (DOT) orgwrote:
Mon, Sep 12, 2005 at 11:05:21AM +0200, Julio M. Merino Vidal wrote:
Given these comments, I've started the implementation of a fsctl(2)
function call, with the following signature:
int fsctl(const char *path, enum fsctl_command command, void *data);
At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
based on the given path. (Minor question: can an enum be used as a
system call argument, or should I better use an integer? If not, why?)
Use and integer and steal the implementation from ioctl() and fcntl().
K, but is it anything wrong in using an enum? (Such as the type to
represent it may change or something like that?)
Yes. I think the problem with an enum is that you have to expose the enum
to all places where you use the prototype, and once exposed, I don't think
you can change the enum. The compiler has to know how big the enum is, so
it has to know all the elements to know the largest. Thus it turns into a
mess.
With an int, we hide all the details.
we could decide to implement all of this as fcntl()s on a file in the file
system (which is what HPUX does AFAICT, though they have a special fsctl()
call).
Actualy, why _can't_ we just use fcntl() to do it? Then we don't have to
add a new system call.
I think we can, but IMH, it looks wired. Why do I have to open a file
when what I really want to do is get or set the properties of a mount
point? I think that passing a path and letting the kernel do what it needs
looks better (as in statvfs(2)). (We'd later discuss if the call should
enforce the path to refer exactly to a mount point or any file within it.)
I can think of two reasons. The first is we already have ioctl and fcntl,
both of which have the ability to handle arbitrary operations. We're now
proposing a third. Do we really need a third such call?
Second, will we be doing one operation, as in statvfs() where we get info,
or will we be doing a sequence of operations? If it's the latter, the
fcntl() approach is a little easier as we only do path resolution once.
Granted, I don't think we will be doing any fs operations which need to
happen in the fast-path, so it doesn't make a TN of difference, but it
can be a bit easier to just do the lookup once.
HP-UX's fsctl(2) call uses a file descriptor, so we have prior art here.
I'm not saying we MUST do this, I just want us to explore it.
It could be we want length and command separate, so we do want something
other than fcntl().
Also, if we used fcntl, we'd have two "authentication" levels in the
trace. First, open the file using the corresponding permissions, flags,
etc. Then, do the fcntl, which will fail in most cases if the user is not
root.
So?
The problem with this interface is that it doesn't let you change
multiple mount points atomically, as some others have suggested.
I also agree that having this feature could be nice.
I don't think we want to change multiple mount points at once. I think
what we really want to do is change multiple exports on one mount point at
once. Thus we want multiple data payloads to one destination, not multiple
destinations.
Ah, K! I misunderstood the initial suggestion. Although it seemed
nice to me at first, I can't think of a good reason about why it could be
useful (I mean, the ability to change multiple paths at once).
I can think of places where it would be "useful," but as someone who
programs both userland and kernel, none of them are worth the effort the
kernel would need to put out to make the change.
I think our lives will be much simpler if we don't do this. If we try to
change multiple mount points at once, we could get into all sorts of
update and locking issues. I think one mount point per call will work
quite well and will be very solid.
True. I was trying to work around the locking issues by sorting the
mount points, avoiding duplicates and locking them in order (which I think
should work). But this won't be needed at all.
Take care,
Bill
PGP SIGNATURE
Version: GnuPG v1.2.3 (NetBSD)
0WhbgyeHR/qWomGhyktg=
=LpgZ
PGP SIGNATURE
No.16 | | 1946 bytes |
| 
9/13/05, Bill Studenmund <wrstuden (AT) netbsd (DOT) orgwrote:
Mon, Sep 12, 2005 at 09:20:13PM +0200, Julio M. Merino Vidal wrote:
I can think of two reasons. The first is we already have ioctl and fcntl,
both of which have the ability to handle arbitrary operations. We're now
proposing a third. Do we really need a third such call?
Second, will we be doing one operation, as in statvfs() where we get info,
or will we be doing a sequence of operations? If it's the latter, the
fcntl() approach is a little easier as we only do path resolution once.
Granted, I don't think we will be doing any fs operations which need to
happen in the fast-path, so it doesn't make a TN of difference, but it
can be a bit easier to just do the lookup once.
HP-UX's fsctl(2) call uses a file descriptor, so we have prior art here.
I'm not saying we MUST do this, I just want us to explore it.
It could be we want length and command separate, so we do want something
other than fcntl().
I've just "explored" the fcntl(2) route and it seems suitable for the task.
In fact, it is somewhat easier to add the code because the "framework"
to copyin/copyout parameters of different sizes is already there. Also,
there is some unused support for file system specific calls, which I've
used to implement this.
I'm not very fond yet on the idea of mixing "file descriptor control
operations" with "general file system operations", but given that fcntl
is already overloaded with calls that do not operate on a single file
descriptor (M_CLSEM, M_MAXFD), we could add the functionality there.
I've put an updated patch in place of the other one:
Would be nice if anyone could answer the "XXX-questions" in there (or,
of course, any other stuff that may be incorrect) ;-)
Thanks for all the comments!
No.17 | | 3421 bytes |
| 
Sep 12, 2005, at 2:05 AM, Julio M. Merino Vidal wrote:
We all agree in that a new system call is needed. Some also want this
new interface to not only manage NFS exports but also to allow
changing
other settings from a mount point. I think this is a good idea too.
Given these comments, I've started the implementation of a fsctl(2)
function call, with the following signature:
int fsctl(const char *path, enum fsctl_command command, void
*data);
I haven't read the HP-UX fsctl(2) manual page, but I'll point out
that S X 10.4 also has a fsctl(2) system call (although I don't see
a manual page for it).
The 10.4 fsctl(2) basically has ioctl(2) semantics (including the
size field and direction bits in the command argument), and the
signature looks like this:
intfsctl(const char *path, u_long cmd, void *data, int options);
"options" is a flags word that currently has one option --
FSPT_NFLLW, which means "don't follow symbolic links". That flag
is used in several VFS syscalls in 10.4.
In 10.4, all fsctl(2) commands are currently file system-specific,
but that doesn't mean we can't have generic ones that either all file
systems implement or that are handled at the VFS layer (in general, I
would like to see us move a LT more stuff out of individual file
systems and into the VFS layer).
At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
based on the given path. (Minor question: can an enum be used as a
system call argument, or should I better use an integer? If not,
why?)
Use an ioctl-style command argument :-) It has the nice property of
handling versioning for you, if the size of the argument were to
change for some reason.
The problem with this interface is that it doesn't let you change
multiple mount points atomically, as some others have suggested.
I also agree that having this feature could be nice.
I don't see the value of changing multiple mount points atomically
most important is that an individual mount point's export list is
updated atomically.
In the (near) future, we could migrate MNT_GETARGS and MNT_UPDATE
to this new system call, as well as other stuff like the quota
management.
I don't see anything wrong with keeping MNT_UPDATE as-is. Its
semantics are "update the mount", i.e. change from r/w to r/o or
whatever. MNT_GETARGS well, I have other opinions on that, as
well I would rather we had string-based mount arguments, rather
than the binary blobs we have now.
Do you think this is correct and flexible enough for the current and
future purposes?
I think I would like to have an fsctl(2), sure. But going back to
the original discussion about NFS exports, I think that we should
switch to a model where the export list is not maintained by the
kernel, but rather NLY by mountd(8). I believe someone else
mentioned this as what is done by Solaris
In this model, the kernel would make an upcall to mountd(8), which
would either approve or deny, and the kernel would cache the result.
Updating the export list then becomes a matter of simply flushing the
kernel's "export cache".
-- thorpej
No.18 | | 2086 bytes |
| 
9/14/05, Jason Thorpe <thorpej (AT) shagadelic (DOT) orgwrote:
I don't see anything wrong with keeping MNT_UPDATE as-is. Its
semantics are "update the mount", i.e. change from r/w to r/o or
whatever. MNT_GETARGS well, I have other opinions on that, as
well I would rather we had string-based mount arguments, rather
than the binary blobs we have now.
But these are all different than mounting a file system, thus IMH,
mount(2) is the wrong place to handle them. I really find the current
approach of flags to change behavior weird. (Specially having to
execute completely different operations inside the vfs_mount hook,
where one could use independent and smaller hooks.)
Basically, I see three operations over a mount point, all of which
should be handled independently:
1) Mount it, using mount(2).
2) Change any of its properties, through the new fsctl(2) or whatever.
3) Unmount it, using unmount(2).
Do you think this is correct and flexible enough for the current and
future purposes?
I think I would like to have an fsctl(2), sure.
I'm interested in what you think about adding these features in
fcntl(2). (Note that the current implementation of fcntl(2) seems to
have been designed leaving room for file system specific operations,
which is what we want.) Any comments?
But going back to
the original discussion about NFS exports, I think that we should
switch to a model where the export list is not maintained by the
kernel, but rather NLY by mountd(8). I believe someone else
mentioned this as what is done by Solaris
In this model, the kernel would make an upcall to mountd(8), which
would either approve or deny, and the kernel would cache the result.
Updating the export list then becomes a matter of simply flushing the
kernel's "export cache".
That indeed sounds nice but it's a bigger project than what I intended
at first (clean up existing stuff). Maybe we can leave this for a later
step?
Thanks,
No.19 | | 1460 bytes |
| 
Sep 14, 2005, at 3:49 AM, Julio M. Merino Vidal wrote:
But these are all different than mounting a file system, thus IMH,
mount(2) is the wrong place to handle them. I really find the current
approach of flags to change behavior weird. (Specially having to
execute completely different operations inside the vfs_mount hook,
where one could use independent and smaller hooks.)
I'm ambivalent on the MNT_UPDATE thing, really. MNT_UPDATE does have
"replace all previous mount options with these new ones" semantics,
so it seems sort of "natural" to leave it where it is but I don't
really have a strong feeling either way.
I'm interested in what you think about adding these features in
fcntl(2). (Note that the current implementation of fcntl(2) seems to
have been designed leaving room for file system specific operations,
which is what we want.) Any comments?
I don't think it should be in fcntl(2). fcntl(2) operates on
individual files / directories. fsctl(2) operates on the file system
instance.
That indeed sounds nice but it's a bigger project than what I
intended
at first (clean up existing stuff). Maybe we can leave this for a
later
step?
Why change it twice? It seems like it's actually less work to do the
heavy-lifting-in-mountd scheme, because it doesn't require you to
implement fsctl(2).
-- thorpej
No.20 | | 2579 bytes |
| 
9/14/05, Jason Thorpe <thorpej (AT) shagadelic (DOT) orgwrote:
Sep 14, 2005, at 3:49 AM, Julio M. Merino Vidal wrote:
But these are all different than mounting a file system, thus IMH,
mount(2) is the wrong place to handle them. I really find the current
approach of flags to change behavior weird. (Specially having to
execute completely different operations inside the vfs_mount hook,
where one could use independent and smaller hooks.)
I'm ambivalent on the MNT_UPDATE thing, really. MNT_UPDATE does have
"replace all previous mount options with these new ones" semantics,
so it seems sort of "natural" to leave it where it is but I don't
really have a strong feeling either way.
I'm interested in what you think about adding these features in
fcntl(2). (Note that the current implementation of fcntl(2) seems to
have been designed leaving room for file system specific operations,
which is what we want.) Any comments?
I don't think it should be in fcntl(2). fcntl(2) operates on
individual files / directories. fsctl(2) operates on the file system
instance.
But we already have functionality in fcntl that does not operate on
individual files/directories (F_CLSEM, F_MAXFD or all the LCFN*
commands in lfs). I'm not saying this is right -- and IMVH, it's
not -- but it's already there.
Also I think I've just found an "advantage" of using fcntl; dunno if
it's really a good thing or not (or even if it's useful). This one is:
you'd lock the file from userland to later do a set of fs specific
operations without interferences.
That indeed sounds nice but it's a bigger project than what I
intended
at first (clean up existing stuff). Maybe we can leave this for a
later
step?
Why change it twice? It seems like it's actually less work to do the
heavy-lifting-in-mountd scheme, because it doesn't require you to
implement fsctl(2).
The thing is that I have no idea about how to do this at the moment.
Do we have any example in the kernel on how to cummunicate it
with an userland process so that the latter has to return a value to
the former? Maybe the userfs SoC project has a decent way to
do this? What happens if the kernel requests a mount export entry
to mountd and mountd crashes/locks?
And even in this case, wouldn't we still need fsctl(2) to tell the kernel
to clear its export list cache you mentioned?
Thanks,
No.21 | | 4099 bytes |
| 
Wed, Sep 14, 2005 at 10:12:44PM +0200, Julio M. Merino Vidal wrote:
9/14/05, Jason Thorpe <thorpej (AT) shagadelic (DOT) orgwrote:
Sep 14, 2005, at 3:49 AM, Julio M. Merino Vidal wrote:
But these are all different than mounting a file system, thus IMH,
mount(2) is the wrong place to handle them. I really find the current
approach of flags to change behavior weird. (Specially having to
execute completely different operations inside the vfs_mount hook,
where one could use independent and smaller hooks.)
I'm ambivalent on the MNT_UPDATE thing, really. MNT_UPDATE does have
"replace all previous mount options with these new ones" semantics,
so it seems sort of "natural" to leave it where it is but I don't
really have a strong feeling either way.
I'm interested in what you think about adding these features in
fcntl(2). (Note that the current implementation of fcntl(2) seems to
have been designed leaving room for file system specific operations,
which is what we want.) Any comments?
I don't think it should be in fcntl(2). fcntl(2) operates on
individual files / directories. fsctl(2) operates on the file system
instance.
But we already have functionality in fcntl that does not operate on
individual files/directories (F_CLSEM, F_MAXFD or all the LCFN*
commands in lfs). I'm not saying this is right -- and IMVH, it's
not -- but it's already there.
(taking a little side-trip to talk about fcntl())
more precisely, fcntl() operates on file *descriptors*; ioctl() operates
on files. there should not even need to be a VP for fcntl() since
manipulating a file descriptor should not affect the file that the
descriptor points to, but I see that we started using this as mechanism
for LFS to do arbitrary stuff in the kernel at some point (replacing the
LFS-specific syscalls). we really should have used ioctl() for LFS instead.
as I recall, VP_FCNTL() was originally added by bill, I think as a mechanism
to control an HSM-type layered file system. (at least, I think it was for
a control channel, I'm sure he'll correct me if I'm misremembering.)
I believe it was recommended at the time that he use ioctl() instead of
fcntl() for this purpose, but he added the fs-specific fcntl() stuff anyway,
for reasons that I don't quite remember but that I recall seemed bogus.
in short, I don't think having file-system-specific stuff in an interface
that's intended to control file descriptors makes much sense. we certainly
shouldn't move further in that direction, and it would be good to eventually
replace our existing use of that mechanism with something else instead,
either ioctl() or possibly this fsctl() thing.
one mechanism that has been used before in commercial products to get
the effect of an fsctl() without adding a syscall is to just use ioctl()
on the root directory of a file system. this was mostly for fs-specific
fs operations, though, and it doesn't seem very good to put fs-neutral
operations into the ioctl() morass as well.
so what was the original question again, just where to put the NFS export
control stuff?
the NFS export control info is not really controlling the file system being
exported, but rather it's controlling the behaviour of the NFS server.
the NFS server is somewhat unique, it's not a device and it's not a
file system, so none of the interfaces for talking to devices or files
or file systems really seems appropriate. perhaps creating a /dev/nfsd
psuedo-device and using ioctls on that would be the cleanest way to wedge
it into the existing API model. on the other hand, we already have an
"nfssvc" syscall, so we can add other NFS server control stuff there.
I'm with jason on wanting the mountargs stuff to become string-based.
was there any more to the original question? I've lost track.
-Chuck
No.22 | | 574 bytes |
| 
Fri, Sep 16, 2005 at 07:35:22 -0700, Chuck Silvers wrote:
more precisely, fcntl() operates on file *descriptors*; ioctl()
operates on files.
[]
in short, I don't think having file-system-specific stuff in an
interface that's intended to control file descriptors makes much
sense. we certainly shouldn't move further in that direction, and
it would be good to eventually replace our existing use of that
mechanism with something else instead, either ioctl() or possibly
this fsctl() thing.
I totally agree.
SY, Uwe
No.23 | | 1250 bytes |
| 
9/16/05, Chuck Silvers <chuq (AT) chuq (DOT) comwrote:
so what was the original question again, just where to put the NFS export
control stuff?
the NFS export control info is not really controlling the file system being
exported, but rather it's controlling the behaviour of the NFS server.
the NFS server is somewhat unique, it's not a device and it's not a
file system, so none of the interfaces for talking to devices or files
or file systems really seems appropriate. perhaps creating a /dev/nfsd
psuedo-device and using ioctls on that would be the cleanest way to wedge
it into the existing API model. on the other hand, we already have an
"nfssvc" syscall, so we can add other NFS server control stuff there.
Thanks for the long explanation. I wasn't aware of this nfssvc system call,
but I'll certainly look at it. It sounds more reasonable to add the export
control there than in a new system call, and possibly make it conditional on
NFSSERVER; I like the idea.
I'm with jason on wanting the mountargs stuff to become string-based.
That too, but it's a completely different thing than what I'm working on now ;-)
Kind regards,
No.24 | | 5019 bytes |
| 
Fri, Sep 16, 2005 at 07:35:22AM -0700, Chuck Silvers wrote:
(taking a little side-trip to talk about fcntl())
more precisely, fcntl() operates on file *descriptors*; ioctl() operates
on files. there should not even need to be a VP for fcntl() since
That's not fully correct. ioctl() operates on the devices underlying a
file. To quote the man page:
The ioctl() function manipulates the underlying device parameters of
special files.
And that's why we felt the need for a fcntl() VP.
Also, we have extended fcntl() to operate on more than just the passed-in
file descriptor. Yes, the F_CLSEM and F_MAXFD operations have to do with
_other_ file descriptors, but they are an example of not operating on just
the passed-in descriptor. So we've got (IMH reasonable) prior-art for
having fcntl() do more that operate exclusively on the passed-in fd.
manipulating a file descriptor should not affect the file that the
descriptor points to, but I see that we started using this as mechanism
for LFS to do arbitrary stuff in the kernel at some point (replacing the
LFS-specific syscalls). we really should have used ioctl() for LFS instead.
I disagree. While an fsctl() call may be a better fit, I do not think an
ioctl() ever will be a clean match.
as I recall, VP_FCNTL() was originally added by bill, I think as a mechanism
to control an HSM-type layered file system. (at least, I think it was for
a control channel, I'm sure he'll correct me if I'm misremembering.)
I believe it was recommended at the time that he use ioctl() instead of
fcntl() for this purpose, but he added the fs-specific fcntl() stuff anyway,
for reasons that I don't quite remember but that I recall seemed bogus.
Well, as above, using an ioctl() for this would be even more bogus. :-)
The question was between overloading fcntl() and adding a new system call.
While I certainly objected to ioctl(), my feelings were not as strong
between a new system call and extending fcntl(), though fcntl() seemed
cleaner and more general-purpose. It already had the desired parameter
structure (file indicator, operation, data), so it seemed reasonable.
The problem for this with ioctl() is that it goes to different places for
regular files, device nodes, and pipes (see ffs_vnodeop_entries,
ffs_specop_entries, and ffs_fifoop_entries). It has to; that's its point.
However what we needed at the time was a way to send a control request to
the file system holding the file, not to the file itself. We needed
a call that would semanticly not branch out the way the vop_ioctl_desc
operators do.
Also, at the time, it was felt fcntl() used in this way could help
implement ACL operations. ACLs need to operate at exactly the same
semantic level as the call our HSM needed; for a pipe or device node, you
want to operate on the underlying inode, not the device or pipe. I admit
that our ACL implementation may be taking a different approach, so I'm not
sure how strong this motivation will turn out to be.
in short, I don't think having file-system-specific stuff in an interface
that's intended to control file descriptors makes much sense. we certainly
shouldn't move further in that direction, and it would be good to eventually
replace our existing use of that mechanism with something else instead,
either ioctl() or possibly this fsctl() thing.
We decide what the different interfaces are intended to do, so we can
fully decide we are happy with fcntl() doing what it does.
If we are going to stick to existing interface definitions exclusively,
then ioctl() is "control device" and it is as wrong for doing these things
as is fcntl(). :-)
The problem is that we have now described operations that take place on
one of three different semantic levels. You can want to issue operations
on the internals of a file (ioctl() operating on the device backing a
device node), operations on the inode/vnode (what fcntl() is doing now),
and operations on the file system containing a node (what fsctl() would
do). While it may be a bit of an overload to do fsctl() work in fcntl()
(if we wanted to save the system call), we at least would be cleanly
talking to the file system we wanted to manipulate.
one mechanism that has been used before in commercial products to get
the effect of an fsctl() without adding a syscall is to just use ioctl()
on the root directory of a file system. this was mostly for fs-specific
fs operations, though, and it doesn't seem very good to put fs-neutral
operations into the ioctl() morass as well.
I agree that'd be gross.
Take care,
Bill
PGP SIGNATURE
Version: GnuPG v1.2.3 (NetBSD)
gLz1JCxrV0fAbaF5B3wX+Ko=
=iVUl
PGP SIGNATURE
No.25 | | 745 bytes |
| 
Why change it twice? It seems like it's actually less work to do the
heavy-lifting-in-mountd scheme, because it doesn't require you to
implement fsctl(2).
The thing is that I have no idea about how to do this at the moment.
Do we have any example in the kernel on how to cummunicate it
with an userland process so that the latter has to return a value to
the former? Maybe the userfs SoC project has a decent way to
do this? What happens if the kernel requests a mount export entry
to mountd and mountd crashes/locks?
a blocking system call to wait requests should be enough.
we have similar structure in nfsd for kerberos.
check rick's nfsv4 code as well.
YAMAMT Takashi