BSD

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Interface to change NFS exports

    25 answers - 3009 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi everybody,
    while adding NFS support to tmpfs, I found that the current way to
    change NFS export information is how could I say it very ugly.
    It seems to be a bandaid over what was used in the past to mount
    FFS systems, which makes it confusing and difficult to extend.
    (Don't we aim for clean design? ;-)
    First of all, the mountd(8) code expects that all file system's
    mount arguments structures start with the following fields:
    char *fspec;
    struct export_args *export;
    This is because it defines a big union of all known file systems,
    assuming that all of them have these two fields at the very
    beginning. Furthermore, mountd(8) must know about all
    NFS-aware file systems, needing patching every time you add
    a new one. (There are XXX marks in the code saying that this
    should be improved.)
    Then we get to the kernel part. The MNT_UPDATE flag is
    abused, together with the fspec field, to change the export
    information. A file system has to assume that, if MNT_UPDATE
    is given and fspec is NULL, then it has to update the export
    data.
    Updating NFS export information is common to all NFS-aware
    file systems, so I believe this can be abstracted into the VFS
    layer to simplify the code in each file system.
    What I have done is the following:
    - Define two new mount flags, MNT_GETEXPRT and MNT_SETEXPRT.
    These change the semantics of the data parameter passed to
    mount(2), making it receive a struct export_args structure instead
    of the file system's custom one. The purpose of each of them is
    clear from their name.
    - Change the mount syscall routine to recognize these two flags
    and handle them completely on its own (the underlying fs never
    sees them). It only needs to fetch a structure from userland and
    parse it or viceversa.
    - As a side effect, as the information is now available in the generic
    mount point structure, define a vfs_stdcheckexp routine that can
    be used in most file systems (haven't checked yet if this will be
    useful in all cases), thus removing redundancy from them all.
    - Simplify mountd(8) a lot by removing al fs-specific details from it,
    using struct export_args exclusively as a way to communicate
    between userland and the kernel.
    - Remove NFS-specific bits from the mount_* utilities, as this can
    now be done as a default operation during the initial mount.
    I don't know if this is the best way to do this (and the patch is
    not yet "clean"[*]), so comments will be appreciated. Even though,
    I feel this is far better than the current approach.
    As a proof of concept, I've converted ffs and tmpfs to follow these
    ideas. Note how the code is simpler overall. The patch can be
    found here:
    [*] I specially don't like very much the new mnt_nfs boolean flag,
    but couldn't find a better place for it.
    Please comment.
    Thank you,
  • No.1 | | 680 bytes | |

    Julio M. Merino Vidal wrote:

    >Hi everybody,
    >
    >while adding NFS support to tmpfs, I found that the current way to
    >change NFS export information is how could I say it very ugly.
    >It seems to be a bandaid over what was used in the past to mount
    >FFS systems, which makes it confusing and difficult to extend.
    >(Don't we aim for clean design? ;-)


    Hi Julio,

    I agree with most of these changes, but I'd like to see an additional
    change: overloading the mount system call with export functionality is
    just wrong. There should be a seperate exportfs system call.
    - Frank
  • No.2 | | 537 bytes | |

    Is the notion of 'exporting' NFS-specific still? While I'm not aware
    of another NFS-like protocol in wide use (afs and coda don't serve
    local filesystems), it seems the notion of controlling what parts of
    the filesystem tree can be served over NFS-like protocols is more
    general than just NFS. Then, one would perhaps want to control
    exports separately for separate protocols.

    That said, I don't object to what you are doing now, which seems
    orthogonal to the multiprotocol export issue.
  • No.3 | | 634 bytes | |

    Sun, 11 Sep 2005 10:16:09 -0400, Greg Troxel wrote:

    Is the notion of 'exporting' NFS-specific still? While I'm not aware
    of another NFS-like protocol in wide use (afs and coda don't serve
    local filesystems), it seems the notion of controlling what parts of
    the filesystem tree can be served over NFS-like protocols is more
    general than just NFS. Then, one would perhaps want to control
    exports separately for separate protocols.

    Is there any other NFS-like protocol except NFS? That is, which would
    be stateless and would use filehandles to access files?

    ByePavel
  • No.4 | | 811 bytes | |

    [ originally sent to tech-net@ by mistake ]

    Sun, 11 Sep 2005 10:16:09 -0400, Greg Troxel wrote:

    Is the notion of 'exporting' NFS-specific still? While I'm not aware
    of another NFS-like protocol in wide use (afs and coda don't serve
    local filesystems), it seems the notion of controlling what parts of
    the filesystem tree can be served over NFS-like protocols is more
    general than just NFS. Then, one would perhaps want to control
    exports separately for separate protocols.

    That said, I don't object to what you are doing now, which seems
    orthogonal to the multiprotocol export issue.

    Is there any other NFS-like protocol except NFS? That is, which would
    be stateless and would use filehandles to access files?

    ByePavel
  • No.5 | | 998 bytes | |

    Sun, Sep 11, 2005 at 01:20:58PM +0200, Julio M. Merino Vidal wrote:
    Hi everybody,

    while adding NFS support to tmpfs, I found that the current way to
    change NFS export information is how could I say it very ugly.
    It seems to be a bandaid over what was used in the past to mount
    FFS systems, which makes it confusing and difficult to extend.
    (Don't we aim for clean design? ;-)

    []

    Hi,
    while you're at it, could you look at fixing a very long outstanding
    problem ? A /etc/rc.d/mountd reload isn't atomic, there is a window
    in which no filesystems are exported at all, and if a request comes in
    at this time, nfsd replies with a "permission denied".

    At first glance, we would need to keep 2 export list in kernel and switch
    from one to the other, much like what IPF does with the filters.

    I don't ask you implement this, but as you're planning to change the interface,
    please think about it in the new one :)
  • No.6 | | 1828 bytes | |

    Mon, Sep 12, 2005 at 01:08:07AM +0200, Manuel Bouyer wrote:
    Sun, Sep 11, 2005 at 01:20:58PM +0200, Julio M. Merino Vidal wrote:
    Hi everybody,

    while adding NFS support to tmpfs, I found that the current way to
    change NFS export information is how could I say it very ugly.
    It seems to be a bandaid over what was used in the past to mount
    FFS systems, which makes it confusing and difficult to extend.
    (Don't we aim for clean design? ;-)

    []

    Hi,
    while you're at it, could you look at fixing a very long outstanding
    problem ? A /etc/rc.d/mountd reload isn't atomic, there is a window
    in which no filesystems are exported at all, and if a request comes in
    at this time, nfsd replies with a "permission denied".

    At first glance, we would need to keep 2 export list in kernel and switch
    from one to the other, much like what IPF does with the filters.

    Not necessarily.

    I think part of the problem is how mountd does things, though to be
    honest, I have avoided looking the code. :-) I think if mountd were
    changed to build up state then apply it, we could achieve an atomic update
    w/o multiple lists in the kernel.

    I don't ask you implement this, but as you're planning to change the
    interface, please think about it in the new one :)

    I think all that would be needed would be for there to be a way to upload
    multiple export entries at once. That way we can say, "here, this is the
    new export list."

    I agree that all Julio would need to do now is think about how we add
    multiple entries at once, and we'd be prepared for this in the future.

    Take care,

    Bill

    PGP SIGNATURE
    Version: GnuPG v1.2.3 (NetBSD)

    9FL/E3rbxLF/D2N=
    =QDd3
    PGP SIGNATURE
  • No.7 | | 365 bytes | |

    while you're at it, could you look at fixing a very long outstanding
    problem ? A /etc/rc.d/mountd reload isn't atomic, there is a window
    in which no filesystems are exported at all, and if a request comes in
    at this time, nfsd replies with a "permission denied".

    fyi, openbsd guys seem to be working on it.

    YAMAMT Takashi
  • No.8 | | 2233 bytes | |

    Sun, Sep 11, 2005 at 08:48:14PM -0600, Greg wrote:
    I don't ask you implement this, but as you're planning to change the
    interface, please think about it in the new one :)

    I think all that would be needed would be for there to be a way to upload=
    =20
    multiple export entries at once. That way we can say, "here, this is the=20
    new export list."

    I agree that all Julio would need to do now is think about how we add=20
    multiple entries at once, and we'd be prepared for this in the future.

    If we're making a "shopping list" of changes we'd like to see here ;)

    When checking to see if an NFS export is allowed, Solaris appears
    to do a lookup of the IP address at the time the mount request is made,
    rather than building a table of IP addresses for the hosts at the
    time mountd is run (as NetBSD does). Ignoring the fact that Dynamic
    DNS may be evil, this means that Solaris behaves much better with
    hosts that happen to be down (and have lost their lease) when mountd
    is restarted, than does NetBSD. (NetBSD gets incredibly unhappy
    because it can't find an IP address for the host at the time mountd
    is run, and so then refuses to run mountd, shutting all hosts out,
    not just the one that might be temporarily off-line. This is
    arguably a security feature, but, well, if you're running NFS, you
    may have Security Issues anyway :-} )

    But IM it'd be way cool if NetBSD could do the same as Solaris and
    delay the lookup of the IP address until the point where the mount
    request is made

    I think solaris works in a different way than NetBSD does (at last it used to):
    there is no check done in the kenrel at the NFS level, only by mountd
    when a client requests a filehandle at mount time. This means that once you
    know a filehandle (and you could find one by trying random values), you
    can access a filesystem on the server, even if your IP is not allowed.

    Now it would be possible to allow dynamic names with an export list in the
    kernel, this just means that mountd would have to install a new export list
    in the kernel each time a new name->ip translation is discovered.
  • No.9 | | 2608 bytes | |

    9/11/05, Frank van der Linden <fvdl (AT) netbsd (DOT) orgwrote:
    Julio M. Merino Vidal wrote:

    >Hi everybody,
    >
    >while adding NFS support to tmpfs, I found that the current way to
    >change NFS export information is how could I say it very ugly.
    >It seems to be a bandaid over what was used in the past to mount
    >FFS systems, which makes it confusing and difficult to extend.
    >(Don't we aim for clean design? ;-)
    >
    >

    Hi Julio,

    I agree with most of these changes, but I'd like to see an additional
    change: overloading the mount system call with export functionality is
    just wrong. There should be a seperate exportfs system call.

    We all agree in that a new system call is needed. Some also want this
    new interface to not only manage NFS exports but also to allow changing
    other settings from a mount point. I think this is a good idea too.

    Given these comments, I've started the implementation of a fsctl(2)
    function call, with the following signature:

    int fsctl(const char *path, enum fsctl_command command, void *data);

    At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
    FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
    based on the given path. (Minor question: can an enum be used as a
    system call argument, or should I better use an integer? If not, why?)

    The problem with this interface is that it doesn't let you change
    multiple mount points atomically, as some others have suggested.
    I also agree that having this feature could be nice.

    In order to allow this, I think the following could make more sense
    (none of the stuff below is implemented, so it may contain errors):

    int fsctl(enum fsctl_command command, struct fsctl_data *data,
    size_t ndata);

    where 'data' is a map between paths and structures, as in:

    struct fsctl_data {
    const char fd_path[MAXPATHLEN];
    void *fd_data;
    };

    Given this interface, each 'command' can provide its own structure
    for the fd_data parameter. E.g., the FSCTL_EXPRT_NFS_{GET,SET}
    calls could return or receive, respectively, a struct export_args.
    Also, each command can operate on multiple mount points atomically.

    In the (near) future, we could migrate MNT_GETARGS and MNT_UPDATE
    to this new system call, as well as other stuff like the quota
    management.

    Do you think this is correct and flexible enough for the current and
    future purposes?

    Thanks,
  • No.10 | | 3322 bytes | |

    Mon, Sep 12, 2005 at 11:05:21AM +0200, Julio M. Merino Vidal wrote:
    9/11/05, Frank van der Linden <fvdl (AT) netbsd (DOT) orgwrote:

    I agree with most of these changes, but I'd like to see an additional
    change: overloading the mount system call with export functionality is
    just wrong. There should be a seperate exportfs system call.

    We all agree in that a new system call is needed. Some also want this
    new interface to not only manage NFS exports but also to allow changing
    other settings from a mount point. I think this is a good idea too.

    Given these comments, I've started the implementation of a fsctl(2)
    function call, with the following signature:

    int fsctl(const char *path, enum fsctl_command command, void *data);

    At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
    FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
    based on the given path. (Minor question: can an enum be used as a
    system call argument, or should I better use an integer? If not, why?)

    Use and integer and steal the implementation from ioctl() and fcntl().
    we could decide to implement all of this as fcntl()s on a file in the file
    system (which is what HPUX does AFAICT, though they have a special fsctl()
    call).

    Actualy, why _can't_ we just use fcntl() to do it? Then we don't have to
    add a new system call.

    The problem with this interface is that it doesn't let you change
    multiple mount points atomically, as some others have suggested.
    I also agree that having this feature could be nice.

    I don't think we want to change multiple mount points at once. I think
    what we really want to do is change multiple exports on one mount point at
    once. Thus we want multiple data payloads to one destination, not multiple
    destinations.

    In order to allow this, I think the following could make more sense
    (none of the stuff below is implemented, so it may contain errors):

    int fsctl(enum fsctl_command command, struct fsctl_data *data,
    size_t ndata);

    where 'data' is a map between paths and structures, as in:

    struct fsctl_data {
    const char fd_path[MAXPATHLEN];
    void *fd_data;
    };

    Given this interface, each 'command' can provide its own structure
    for the fd_data parameter. E.g., the FSCTL_EXPRT_NFS_{GET,SET}
    calls could return or receive, respectively, a struct export_args.
    Also, each command can operate on multiple mount points atomically.

    I think our lives will be much simpler if we don't do this. If we try to
    change multiple mount points at once, we could get into all sorts of
    update and locking issues. I think one mount point per call will work
    quite well and will be very solid.

    In the (near) future, we could migrate MNT_GETARGS and MNT_UPDATE
    to this new system call, as well as other stuff like the quota
    management.

    Sounds good.

    Do you think this is correct and flexible enough for the current and
    future purposes?

    As above, I like the idea but I want to change only one mount point at
    once.

    PGP SIGNATURE
    Version: GnuPG v1.2.3 (NetBSD)

    5BAox/zyp2pHnNYukfucnNo=
    =4BcJ
    PGP SIGNATURE
  • No.11 | | 2144 bytes | |

    Mon, Sep 12, 2005 at 11:00:35AM +0200, Manuel Bouyer wrote:
    Sun, Sep 11, 2005 at 08:48:14PM -0600, Greg wrote:

    If we're making a "shopping list" of changes we'd like to see here ;)

    When checking to see if an NFS export is allowed, Solaris appears
    to do a lookup of the IP address at the time the mount request is made,
    rather than building a table of IP addresses for the hosts at the
    time mountd is run (as NetBSD does). Ignoring the fact that Dynamic
    DNS may be evil, this means that Solaris behaves much better with
    hosts that happen to be down (and have lost their lease) when mountd
    is restarted, than does NetBSD. (NetBSD gets incredibly unhappy
    because it can't find an IP address for the host at the time mountd
    is run, and so then refuses to run mountd, shutting all hosts out,
    not just the one that might be temporarily off-line. This is
    arguably a security feature, but, well, if you're running NFS, you
    may have Security Issues anyway :-} )

    But IM it'd be way cool if NetBSD could do the same as Solaris and
    delay the lookup of the IP address until the point where the mount
    request is made

    I think solaris works in a different way than NetBSD does (at last it used to):
    there is no check done in the kenrel at the NFS level, only by mountd
    when a client requests a filehandle at mount time. This means that once you
    know a filehandle (and you could find one by trying random values), you
    can access a filesystem on the server, even if your IP is not allowed.

    Now it would be possible to allow dynamic names with an export list in the
    kernel, this just means that mountd would have to install a new export list
    in the kernel each time a new name->ip translation is discovered.

    I think that would be fine. I think if we teach mountd about new and old
    export lists, then we may eventually also add a "delete this, add this"
    update operation.

    Take care,

    Bill

    PGP SIGNATURE
    Version: GnuPG v1.2.3 (NetBSD)

    TvvYuVjxq3ojnNQG7F9+TI8=
    =5TVW
    PGP SIGNATURE
  • No.12 | | 3699 bytes | |

    9/12/05, Bill Studenmund <wrstuden (AT) netbsd (DOT) orgwrote:
    Mon, Sep 12, 2005 at 11:05:21AM +0200, Julio M. Merino Vidal wrote:
    9/11/05, Frank van der Linden <fvdl (AT) netbsd (DOT) orgwrote:

    I agree with most of these changes, but I'd like to see an additional
    change: overloading the mount system call with export functionality is
    just wrong. There should be a seperate exportfs system call.

    We all agree in that a new system call is needed. Some also want this
    new interface to not only manage NFS exports but also to allow changing
    other settings from a mount point. I think this is a good idea too.

    Given these comments, I've started the implementation of a fsctl(2)
    function call, with the following signature:

    int fsctl(const char *path, enum fsctl_command command, void *data);

    At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
    FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
    based on the given path. (Minor question: can an enum be used as a
    system call argument, or should I better use an integer? If not, why?)

    Use and integer and steal the implementation from ioctl() and fcntl().

    K, but is it anything wrong in using an enum? (Such as the type to
    represent it may change or something like that?)

    we could decide to implement all of this as fcntl()s on a file in the file
    system (which is what HPUX does AFAICT, though they have a special fsctl()
    call).

    Actualy, why _can't_ we just use fcntl() to do it? Then we don't have to
    add a new system call.

    I think we can, but IMH, it looks wired. Why do I have to open a file
    when what I really want to do is get or set the properties of a mount
    point? I think that passing a path and letting the kernel do what it needs
    looks better (as in statvfs(2)). (We'd later discuss if the call should
    enforce the path to refer exactly to a mount point or any file within it.)

    Also, if we used fcntl, we'd have two "authentication" levels in the
    trace. First, open the file using the corresponding permissions, flags,
    etc. Then, do the fcntl, which will fail in most cases if the user is not
    root.

    The problem with this interface is that it doesn't let you change
    multiple mount points atomically, as some others have suggested.
    I also agree that having this feature could be nice.

    I don't think we want to change multiple mount points at once. I think
    what we really want to do is change multiple exports on one mount point at
    once. Thus we want multiple data payloads to one destination, not multiple
    destinations.

    Ah, K! I misunderstood the initial suggestion. Although it seemed
    nice to me at first, I can't think of a good reason about why it could be
    useful (I mean, the ability to change multiple paths at once).

    Given this interface, each 'command' can provide its own structure
    for the fd_data parameter. E.g., the FSCTL_EXPRT_NFS_{GET,SET}
    calls could return or receive, respectively, a struct export_args.
    Also, each command can operate on multiple mount points atomically.

    I think our lives will be much simpler if we don't do this. If we try to
    change multiple mount points at once, we could get into all sorts of
    update and locking issues. I think one mount point per call will work
    quite well and will be very solid.

    True. I was trying to work around the locking issues by sorting the
    mount points, avoiding duplicates and locking them in order (which I think
    should work). But this won't be needed at all.
  • No.13 | | 2052 bytes | |

    Modulo the "new system call" that's fallen out of this
    thread, I did exactly what you originally reported
    you'd done (cleaning up the exporting code, centralizing
    export checks, etc) some years back. I've been meaning
    to try to get the company for whom I'd done that work
    to release it, butwellthey went out of business,
    and I didn't want to do the legwork to free the intellectual
    property, as it were.

    Anyhoo, some wis-dumb I'd like to pass along: you may want to keep the
    export-checking mechanism itself pluggable, so you can swap in/out
    that part of the code that decides whether any given filehandle can
    be referenced by a client. This lets you drop in Solaris-style
    exporting (on a per-directory tree, per CIDR block) alongside some
    Kerberos-based authorization mechanism alongside the traditional
    BSD mechanism of exporting (per mountpoint, per CIDR block).

    My implementation did the obvious and hung another vtbl-like
    vector of function pointers off each mount point. From memory,
    the entry points were things like "check this file handle for
    client access," "here's a chunk of data, figure out how to
    parse it into your special rules for allowing client access,"
    and so on.

    Mind you, I never did actually *implement* an auth mech other
    than one like Solaris's, but my thinking was that a mount
    option would pick which mechanism to use. In the new world,
    I guess that'd be just another call to the proposed fsctl().

    If you're already grotting around in that part of the code,
    I think it'd be easy enough to make what you've done be
    pluggable, so that different sets of exporting rules could
    be used. I only mention it as something to consider. Too,
    I guess I also claim an existence proof that it can be done,
    and that doing it was once thought to be sensible by folks
    who've been where you are now.

    Chris <jepeway (AT) blasted-heath (DOT) com>.
  • No.14 | | 1662 bytes | |

    In article <6b2d1e1905091212206c59c5ea (AT) mail (DOT) gmail.comyou write:
    9/12/05, Bill Studenmund <wrstuden (AT) netbsd (DOT) orgwrote:
    >Mon, Sep 12, 2005 at 11:05:21AM +0200, Julio M. Merino Vidal wrote:
    >We all agree in that a new system call is needed. Some also want this
    >new interface to not only manage NFS exports but also to allow changing
    >other settings from a mount point. I think this is a good idea too.
    >>

    >Given these comments, I've started the implementation of a fsctl(2)
    >function call, with the following signature:
    >>

    >int fsctl(const char *path, enum fsctl_command command, void *data);
    >>

    >At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
    >FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
    >based on the given path. (Minor question: can an enum be used as a
    >system call argument, or should I better use an integer? If not, why?)
    >>

    >Use and integer and steal the implementation from ioctl() and fcntl().
    >
    >K, but is it anything wrong in using an enum? (Such as the type to
    >represent it may change or something like that?)


    Precisely that. The ARM EABI, for instance (not that NetBSD follows it
    yet), requires that enums be stored in the smallest integer type that will
    hold them. This kind of thing makes them dangerous to use in any interface,
    since they're liable to change size when new values are introduced.
  • No.15 | | 4638 bytes | |

    Mon, Sep 12, 2005 at 09:20:13PM +0200, Julio M. Merino Vidal wrote:
    9/12/05, Bill Studenmund <wrstuden (AT) netbsd (DOT) orgwrote:
    Mon, Sep 12, 2005 at 11:05:21AM +0200, Julio M. Merino Vidal wrote:

    Given these comments, I've started the implementation of a fsctl(2)
    function call, with the following signature:

    int fsctl(const char *path, enum fsctl_command command, void *data);

    At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
    FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
    based on the given path. (Minor question: can an enum be used as a
    system call argument, or should I better use an integer? If not, why?)

    Use and integer and steal the implementation from ioctl() and fcntl().

    K, but is it anything wrong in using an enum? (Such as the type to
    represent it may change or something like that?)

    Yes. I think the problem with an enum is that you have to expose the enum
    to all places where you use the prototype, and once exposed, I don't think
    you can change the enum. The compiler has to know how big the enum is, so
    it has to know all the elements to know the largest. Thus it turns into a
    mess.

    With an int, we hide all the details.

    we could decide to implement all of this as fcntl()s on a file in the file
    system (which is what HPUX does AFAICT, though they have a special fsctl()
    call).

    Actualy, why _can't_ we just use fcntl() to do it? Then we don't have to
    add a new system call.

    I think we can, but IMH, it looks wired. Why do I have to open a file
    when what I really want to do is get or set the properties of a mount
    point? I think that passing a path and letting the kernel do what it needs
    looks better (as in statvfs(2)). (We'd later discuss if the call should
    enforce the path to refer exactly to a mount point or any file within it.)

    I can think of two reasons. The first is we already have ioctl and fcntl,
    both of which have the ability to handle arbitrary operations. We're now
    proposing a third. Do we really need a third such call?

    Second, will we be doing one operation, as in statvfs() where we get info,
    or will we be doing a sequence of operations? If it's the latter, the
    fcntl() approach is a little easier as we only do path resolution once.

    Granted, I don't think we will be doing any fs operations which need to
    happen in the fast-path, so it doesn't make a TN of difference, but it
    can be a bit easier to just do the lookup once.

    HP-UX's fsctl(2) call uses a file descriptor, so we have prior art here.

    I'm not saying we MUST do this, I just want us to explore it.

    It could be we want length and command separate, so we do want something
    other than fcntl().

    Also, if we used fcntl, we'd have two "authentication" levels in the
    trace. First, open the file using the corresponding permissions, flags,
    etc. Then, do the fcntl, which will fail in most cases if the user is not
    root.

    So?

    The problem with this interface is that it doesn't let you change
    multiple mount points atomically, as some others have suggested.
    I also agree that having this feature could be nice.

    I don't think we want to change multiple mount points at once. I think
    what we really want to do is change multiple exports on one mount point at
    once. Thus we want multiple data payloads to one destination, not multiple
    destinations.

    Ah, K! I misunderstood the initial suggestion. Although it seemed
    nice to me at first, I can't think of a good reason about why it could be
    useful (I mean, the ability to change multiple paths at once).

    I can think of places where it would be "useful," but as someone who
    programs both userland and kernel, none of them are worth the effort the
    kernel would need to put out to make the change.

    I think our lives will be much simpler if we don't do this. If we try to
    change multiple mount points at once, we could get into all sorts of
    update and locking issues. I think one mount point per call will work
    quite well and will be very solid.

    True. I was trying to work around the locking issues by sorting the
    mount points, avoiding duplicates and locking them in order (which I think
    should work). But this won't be needed at all.

    Take care,

    Bill

    PGP SIGNATURE
    Version: GnuPG v1.2.3 (NetBSD)

    0WhbgyeHR/qWomGhyktg=
    =LpgZ
    PGP SIGNATURE
  • No.16 | | 1946 bytes | |

    9/13/05, Bill Studenmund <wrstuden (AT) netbsd (DOT) orgwrote:
    Mon, Sep 12, 2005 at 09:20:13PM +0200, Julio M. Merino Vidal wrote:
    I can think of two reasons. The first is we already have ioctl and fcntl,
    both of which have the ability to handle arbitrary operations. We're now
    proposing a third. Do we really need a third such call?

    Second, will we be doing one operation, as in statvfs() where we get info,
    or will we be doing a sequence of operations? If it's the latter, the
    fcntl() approach is a little easier as we only do path resolution once.

    Granted, I don't think we will be doing any fs operations which need to
    happen in the fast-path, so it doesn't make a TN of difference, but it
    can be a bit easier to just do the lookup once.

    HP-UX's fsctl(2) call uses a file descriptor, so we have prior art here.

    I'm not saying we MUST do this, I just want us to explore it.

    It could be we want length and command separate, so we do want something
    other than fcntl().

    I've just "explored" the fcntl(2) route and it seems suitable for the task.
    In fact, it is somewhat easier to add the code because the "framework"
    to copyin/copyout parameters of different sizes is already there. Also,
    there is some unused support for file system specific calls, which I've
    used to implement this.

    I'm not very fond yet on the idea of mixing "file descriptor control
    operations" with "general file system operations", but given that fcntl
    is already overloaded with calls that do not operate on a single file
    descriptor (M_CLSEM, M_MAXFD), we could add the functionality there.

    I've put an updated patch in place of the other one:

    Would be nice if anyone could answer the "XXX-questions" in there (or,
    of course, any other stuff that may be incorrect) ;-)

    Thanks for all the comments!
  • No.17 | | 3421 bytes | |

    Sep 12, 2005, at 2:05 AM, Julio M. Merino Vidal wrote:

    We all agree in that a new system call is needed. Some also want this
    new interface to not only manage NFS exports but also to allow
    changing
    other settings from a mount point. I think this is a good idea too.

    Given these comments, I've started the implementation of a fsctl(2)
    function call, with the following signature:

    int fsctl(const char *path, enum fsctl_command command, void
    *data);

    I haven't read the HP-UX fsctl(2) manual page, but I'll point out
    that S X 10.4 also has a fsctl(2) system call (although I don't see
    a manual page for it).

    The 10.4 fsctl(2) basically has ioctl(2) semantics (including the
    size field and direction bits in the command argument), and the
    signature looks like this:

    intfsctl(const char *path, u_long cmd, void *data, int options);

    "options" is a flags word that currently has one option --
    FSPT_NFLLW, which means "don't follow symbolic links". That flag
    is used in several VFS syscalls in 10.4.

    In 10.4, all fsctl(2) commands are currently file system-specific,
    but that doesn't mean we can't have generic ones that either all file
    systems implement or that are handled at the VFS layer (in general, I
    would like to see us move a LT more stuff out of individual file
    systems and into the VFS layer).

    At the moment, command can be one of FSCTL_EXPRT_NFS_GET or
    FSCTL_EXPRT_NFS_SET, to query or set NFS export lists respectively
    based on the given path. (Minor question: can an enum be used as a
    system call argument, or should I better use an integer? If not,
    why?)

    Use an ioctl-style command argument :-) It has the nice property of
    handling versioning for you, if the size of the argument were to
    change for some reason.

    The problem with this interface is that it doesn't let you change
    multiple mount points atomically, as some others have suggested.
    I also agree that having this feature could be nice.

    I don't see the value of changing multiple mount points atomically
    most important is that an individual mount point's export list is
    updated atomically.

    In the (near) future, we could migrate MNT_GETARGS and MNT_UPDATE
    to this new system call, as well as other stuff like the quota
    management.

    I don't see anything wrong with keeping MNT_UPDATE as-is. Its
    semantics are "update the mount", i.e. change from r/w to r/o or
    whatever. MNT_GETARGS well, I have other opinions on that, as
    well I would rather we had string-based mount arguments, rather
    than the binary blobs we have now.

    Do you think this is correct and flexible enough for the current and
    future purposes?

    I think I would like to have an fsctl(2), sure. But going back to
    the original discussion about NFS exports, I think that we should
    switch to a model where the export list is not maintained by the
    kernel, but rather NLY by mountd(8). I believe someone else
    mentioned this as what is done by Solaris

    In this model, the kernel would make an upcall to mountd(8), which
    would either approve or deny, and the kernel would cache the result.
    Updating the export list then becomes a matter of simply flushing the
    kernel's "export cache".
    -- thorpej
  • No.18 | | 2086 bytes | |

    9/14/05, Jason Thorpe <thorpej (AT) shagadelic (DOT) orgwrote:

    I don't see anything wrong with keeping MNT_UPDATE as-is. Its
    semantics are "update the mount", i.e. change from r/w to r/o or
    whatever. MNT_GETARGS well, I have other opinions on that, as
    well I would rather we had string-based mount arguments, rather
    than the binary blobs we have now.

    But these are all different than mounting a file system, thus IMH,
    mount(2) is the wrong place to handle them. I really find the current
    approach of flags to change behavior weird. (Specially having to
    execute completely different operations inside the vfs_mount hook,
    where one could use independent and smaller hooks.)

    Basically, I see three operations over a mount point, all of which
    should be handled independently:
    1) Mount it, using mount(2).
    2) Change any of its properties, through the new fsctl(2) or whatever.
    3) Unmount it, using unmount(2).

    Do you think this is correct and flexible enough for the current and
    future purposes?

    I think I would like to have an fsctl(2), sure.

    I'm interested in what you think about adding these features in
    fcntl(2). (Note that the current implementation of fcntl(2) seems to
    have been designed leaving room for file system specific operations,
    which is what we want.) Any comments?

    But going back to
    the original discussion about NFS exports, I think that we should
    switch to a model where the export list is not maintained by the
    kernel, but rather NLY by mountd(8). I believe someone else
    mentioned this as what is done by Solaris

    In this model, the kernel would make an upcall to mountd(8), which
    would either approve or deny, and the kernel would cache the result.
    Updating the export list then becomes a matter of simply flushing the
    kernel's "export cache".

    That indeed sounds nice but it's a bigger project than what I intended
    at first (clean up existing stuff). Maybe we can leave this for a later
    step?

    Thanks,
  • No.19 | | 1460 bytes | |

    Sep 14, 2005, at 3:49 AM, Julio M. Merino Vidal wrote:

    But these are all different than mounting a file system, thus IMH,
    mount(2) is the wrong place to handle them. I really find the current
    approach of flags to change behavior weird. (Specially having to
    execute completely different operations inside the vfs_mount hook,
    where one could use independent and smaller hooks.)

    I'm ambivalent on the MNT_UPDATE thing, really. MNT_UPDATE does have
    "replace all previous mount options with these new ones" semantics,
    so it seems sort of "natural" to leave it where it is but I don't
    really have a strong feeling either way.

    I'm interested in what you think about adding these features in
    fcntl(2). (Note that the current implementation of fcntl(2) seems to
    have been designed leaving room for file system specific operations,
    which is what we want.) Any comments?

    I don't think it should be in fcntl(2). fcntl(2) operates on
    individual files / directories. fsctl(2) operates on the file system
    instance.

    That indeed sounds nice but it's a bigger project than what I
    intended
    at first (clean up existing stuff). Maybe we can leave this for a
    later
    step?

    Why change it twice? It seems like it's actually less work to do the
    heavy-lifting-in-mountd scheme, because it doesn't require you to
    implement fsctl(2).
    -- thorpej
  • No.20 | | 2579 bytes | |

    9/14/05, Jason Thorpe <thorpej (AT) shagadelic (DOT) orgwrote:

    Sep 14, 2005, at 3:49 AM, Julio M. Merino Vidal wrote:

    But these are all different than mounting a file system, thus IMH,
    mount(2) is the wrong place to handle them. I really find the current
    approach of flags to change behavior weird. (Specially having to
    execute completely different operations inside the vfs_mount hook,
    where one could use independent and smaller hooks.)

    I'm ambivalent on the MNT_UPDATE thing, really. MNT_UPDATE does have
    "replace all previous mount options with these new ones" semantics,
    so it seems sort of "natural" to leave it where it is but I don't
    really have a strong feeling either way.

    I'm interested in what you think about adding these features in
    fcntl(2). (Note that the current implementation of fcntl(2) seems to
    have been designed leaving room for file system specific operations,
    which is what we want.) Any comments?

    I don't think it should be in fcntl(2). fcntl(2) operates on
    individual files / directories. fsctl(2) operates on the file system
    instance.

    But we already have functionality in fcntl that does not operate on
    individual files/directories (F_CLSEM, F_MAXFD or all the LCFN*
    commands in lfs). I'm not saying this is right -- and IMVH, it's
    not -- but it's already there.

    Also I think I've just found an "advantage" of using fcntl; dunno if
    it's really a good thing or not (or even if it's useful). This one is:
    you'd lock the file from userland to later do a set of fs specific
    operations without interferences.

    That indeed sounds nice but it's a bigger project than what I
    intended
    at first (clean up existing stuff). Maybe we can leave this for a
    later
    step?

    Why change it twice? It seems like it's actually less work to do the
    heavy-lifting-in-mountd scheme, because it doesn't require you to
    implement fsctl(2).

    The thing is that I have no idea about how to do this at the moment.
    Do we have any example in the kernel on how to cummunicate it
    with an userland process so that the latter has to return a value to
    the former? Maybe the userfs SoC project has a decent way to
    do this? What happens if the kernel requests a mount export entry
    to mountd and mountd crashes/locks?

    And even in this case, wouldn't we still need fsctl(2) to tell the kernel
    to clear its export list cache you mentioned?

    Thanks,
  • No.21 | | 4099 bytes | |

    Wed, Sep 14, 2005 at 10:12:44PM +0200, Julio M. Merino Vidal wrote:
    9/14/05, Jason Thorpe <thorpej (AT) shagadelic (DOT) orgwrote:

    Sep 14, 2005, at 3:49 AM, Julio M. Merino Vidal wrote:

    But these are all different than mounting a file system, thus IMH,
    mount(2) is the wrong place to handle them. I really find the current
    approach of flags to change behavior weird. (Specially having to
    execute completely different operations inside the vfs_mount hook,
    where one could use independent and smaller hooks.)

    I'm ambivalent on the MNT_UPDATE thing, really. MNT_UPDATE does have
    "replace all previous mount options with these new ones" semantics,
    so it seems sort of "natural" to leave it where it is but I don't
    really have a strong feeling either way.

    I'm interested in what you think about adding these features in
    fcntl(2). (Note that the current implementation of fcntl(2) seems to
    have been designed leaving room for file system specific operations,
    which is what we want.) Any comments?

    I don't think it should be in fcntl(2). fcntl(2) operates on
    individual files / directories. fsctl(2) operates on the file system
    instance.

    But we already have functionality in fcntl that does not operate on
    individual files/directories (F_CLSEM, F_MAXFD or all the LCFN*
    commands in lfs). I'm not saying this is right -- and IMVH, it's
    not -- but it's already there.

    (taking a little side-trip to talk about fcntl())

    more precisely, fcntl() operates on file *descriptors*; ioctl() operates
    on files. there should not even need to be a VP for fcntl() since
    manipulating a file descriptor should not affect the file that the
    descriptor points to, but I see that we started using this as mechanism
    for LFS to do arbitrary stuff in the kernel at some point (replacing the
    LFS-specific syscalls). we really should have used ioctl() for LFS instead.

    as I recall, VP_FCNTL() was originally added by bill, I think as a mechanism
    to control an HSM-type layered file system. (at least, I think it was for
    a control channel, I'm sure he'll correct me if I'm misremembering.)
    I believe it was recommended at the time that he use ioctl() instead of
    fcntl() for this purpose, but he added the fs-specific fcntl() stuff anyway,
    for reasons that I don't quite remember but that I recall seemed bogus.

    in short, I don't think having file-system-specific stuff in an interface
    that's intended to control file descriptors makes much sense. we certainly
    shouldn't move further in that direction, and it would be good to eventually
    replace our existing use of that mechanism with something else instead,
    either ioctl() or possibly this fsctl() thing.

    one mechanism that has been used before in commercial products to get
    the effect of an fsctl() without adding a syscall is to just use ioctl()
    on the root directory of a file system. this was mostly for fs-specific
    fs operations, though, and it doesn't seem very good to put fs-neutral
    operations into the ioctl() morass as well.

    so what was the original question again, just where to put the NFS export
    control stuff?

    the NFS export control info is not really controlling the file system being
    exported, but rather it's controlling the behaviour of the NFS server.
    the NFS server is somewhat unique, it's not a device and it's not a
    file system, so none of the interfaces for talking to devices or files
    or file systems really seems appropriate. perhaps creating a /dev/nfsd
    psuedo-device and using ioctls on that would be the cleanest way to wedge
    it into the existing API model. on the other hand, we already have an
    "nfssvc" syscall, so we can add other NFS server control stuff there.

    I'm with jason on wanting the mountargs stuff to become string-based.

    was there any more to the original question? I've lost track.
    -Chuck
  • No.22 | | 574 bytes | |

    Fri, Sep 16, 2005 at 07:35:22 -0700, Chuck Silvers wrote:

    more precisely, fcntl() operates on file *descriptors*; ioctl()
    operates on files.
    []
    in short, I don't think having file-system-specific stuff in an
    interface that's intended to control file descriptors makes much
    sense. we certainly shouldn't move further in that direction, and
    it would be good to eventually replace our existing use of that
    mechanism with something else instead, either ioctl() or possibly
    this fsctl() thing.

    I totally agree.

    SY, Uwe
  • No.23 | | 1250 bytes | |

    9/16/05, Chuck Silvers <chuq (AT) chuq (DOT) comwrote:
    so what was the original question again, just where to put the NFS export
    control stuff?

    the NFS export control info is not really controlling the file system being
    exported, but rather it's controlling the behaviour of the NFS server.
    the NFS server is somewhat unique, it's not a device and it's not a
    file system, so none of the interfaces for talking to devices or files
    or file systems really seems appropriate. perhaps creating a /dev/nfsd
    psuedo-device and using ioctls on that would be the cleanest way to wedge
    it into the existing API model. on the other hand, we already have an
    "nfssvc" syscall, so we can add other NFS server control stuff there.

    Thanks for the long explanation. I wasn't aware of this nfssvc system call,
    but I'll certainly look at it. It sounds more reasonable to add the export
    control there than in a new system call, and possibly make it conditional on
    NFSSERVER; I like the idea.

    I'm with jason on wanting the mountargs stuff to become string-based.

    That too, but it's a completely different thing than what I'm working on now ;-)

    Kind regards,
  • No.24 | | 5019 bytes | |

    Fri, Sep 16, 2005 at 07:35:22AM -0700, Chuck Silvers wrote:

    (taking a little side-trip to talk about fcntl())

    more precisely, fcntl() operates on file *descriptors*; ioctl() operates
    on files. there should not even need to be a VP for fcntl() since

    That's not fully correct. ioctl() operates on the devices underlying a
    file. To quote the man page:

    The ioctl() function manipulates the underlying device parameters of
    special files.

    And that's why we felt the need for a fcntl() VP.

    Also, we have extended fcntl() to operate on more than just the passed-in
    file descriptor. Yes, the F_CLSEM and F_MAXFD operations have to do with
    _other_ file descriptors, but they are an example of not operating on just
    the passed-in descriptor. So we've got (IMH reasonable) prior-art for
    having fcntl() do more that operate exclusively on the passed-in fd.

    manipulating a file descriptor should not affect the file that the
    descriptor points to, but I see that we started using this as mechanism
    for LFS to do arbitrary stuff in the kernel at some point (replacing the
    LFS-specific syscalls). we really should have used ioctl() for LFS instead.

    I disagree. While an fsctl() call may be a better fit, I do not think an
    ioctl() ever will be a clean match.

    as I recall, VP_FCNTL() was originally added by bill, I think as a mechanism
    to control an HSM-type layered file system. (at least, I think it was for
    a control channel, I'm sure he'll correct me if I'm misremembering.)
    I believe it was recommended at the time that he use ioctl() instead of
    fcntl() for this purpose, but he added the fs-specific fcntl() stuff anyway,
    for reasons that I don't quite remember but that I recall seemed bogus.

    Well, as above, using an ioctl() for this would be even more bogus. :-)

    The question was between overloading fcntl() and adding a new system call.
    While I certainly objected to ioctl(), my feelings were not as strong
    between a new system call and extending fcntl(), though fcntl() seemed
    cleaner and more general-purpose. It already had the desired parameter
    structure (file indicator, operation, data), so it seemed reasonable.

    The problem for this with ioctl() is that it goes to different places for
    regular files, device nodes, and pipes (see ffs_vnodeop_entries,
    ffs_specop_entries, and ffs_fifoop_entries). It has to; that's its point.

    However what we needed at the time was a way to send a control request to
    the file system holding the file, not to the file itself. We needed
    a call that would semanticly not branch out the way the vop_ioctl_desc
    operators do.

    Also, at the time, it was felt fcntl() used in this way could help
    implement ACL operations. ACLs need to operate at exactly the same
    semantic level as the call our HSM needed; for a pipe or device node, you
    want to operate on the underlying inode, not the device or pipe. I admit
    that our ACL implementation may be taking a different approach, so I'm not
    sure how strong this motivation will turn out to be.

    in short, I don't think having file-system-specific stuff in an interface
    that's intended to control file descriptors makes much sense. we certainly
    shouldn't move further in that direction, and it would be good to eventually
    replace our existing use of that mechanism with something else instead,
    either ioctl() or possibly this fsctl() thing.

    We decide what the different interfaces are intended to do, so we can
    fully decide we are happy with fcntl() doing what it does.

    If we are going to stick to existing interface definitions exclusively,
    then ioctl() is "control device" and it is as wrong for doing these things
    as is fcntl(). :-)

    The problem is that we have now described operations that take place on
    one of three different semantic levels. You can want to issue operations
    on the internals of a file (ioctl() operating on the device backing a
    device node), operations on the inode/vnode (what fcntl() is doing now),
    and operations on the file system containing a node (what fsctl() would
    do). While it may be a bit of an overload to do fsctl() work in fcntl()
    (if we wanted to save the system call), we at least would be cleanly
    talking to the file system we wanted to manipulate.

    one mechanism that has been used before in commercial products to get
    the effect of an fsctl() without adding a syscall is to just use ioctl()
    on the root directory of a file system. this was mostly for fs-specific
    fs operations, though, and it doesn't seem very good to put fs-neutral
    operations into the ioctl() morass as well.

    I agree that'd be gross.

    Take care,

    Bill

    PGP SIGNATURE
    Version: GnuPG v1.2.3 (NetBSD)

    gLz1JCxrV0fAbaF5B3wX+Ko=
    =iVUl
    PGP SIGNATURE
  • No.25 | | 745 bytes | |

    Why change it twice? It seems like it's actually less work to do the
    heavy-lifting-in-mountd scheme, because it doesn't require you to
    implement fsctl(2).

    The thing is that I have no idea about how to do this at the moment.
    Do we have any example in the kernel on how to cummunicate it
    with an userland process so that the latter has to return a value to
    the former? Maybe the userfs SoC project has a decent way to
    do this? What happens if the kernel requests a mount export entry
    to mountd and mountd crashes/locks?

    a blocking system call to wait requests should be enough.
    we have similar structure in nfsd for kerberos.
    check rick's nfsv4 code as well.

    YAMAMT Takashi

Re: Interface to change NFS exports


max 4000 letters.
Your nickname that display:
In order to stop the spam: 9 + 8 =
QUESTION ON "BSD"

EMSDN.COM