BSD

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • RFC: addition of B_MANAGED and B_NESTED to buf.h and vfs_bio.c

    25 answers - 2196 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Dear folks,
    i'd like to add two flags to buf.h. They add new functionality to the
    buffer structure without changing current behaviour.
    B_MANAGED
    marks that its buffer contents memory is managed by the caller and should
    not be put on a list for reuse or freed. Normally useable by say a
    filingsystem that has a private memory buffer for say a descriptor or disc
    structure thats not kept in a buffer and wants to write it out without the
    need to manipulate the VM or to copy stuff around.
    Managed buffers are claimed and released directly from/to the bufpool to
    not destroy valueable data. Managed buffers can be brelse()'d.
    B_NESTED
    marks the buffer as a nested buffer. Nested buffers are currently used in
    genfs and lfs (get/put pages) though implemented as two special case
    generated buffers with call-backs.
    A nested buffer is basicly describing the action of the origional buffer
    over a piece of its buffer space. A series of nested buffers created from
    the master buffer describe together the action. Nested buffers can have
    their own call-back and variables too extending the idea used in genfs and
    lfs.
    Buffer nesting is transparant. A callee doesn't need to know anything of
    the mechanism and can easily nest it again if it wants too.
    biowait/biodone/brelse work as expected.
    When a nested buffer is biodone()'d its master buffer will be updated. When
    the b_resid value of the master buffer then reaches zero, biodone() will be
    called on the master buffer.
    Nested buffers are claimed and released directly from/to the bufpool to not
    destroy valueable data. Nested buffers can be brelse()'d.
    I've appended the patch. Note that this code has been tested for normal
    operations but that the managed and nested buffers haven't actually been
    used yet. I'd rather have feedback first before i change UDF to use the
    buffer types. Genfs and lfs could prolly best be done by their authors.
    With regards,
    Reinoud
    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)
    +
    =6R6f
    PGP SIGNATURE
  • No.1 | | 9187 bytes | |

    Tue, Jan 03, 2006 at 05:34:27PM +0100, Reinoud Zandijk wrote:
    Dear folks,

    i'd like to add two flags to buf.h. They add new functionality to the
    buffer structure without changing current behaviour.

    B_MANAGED
    marks that its buffer contents memory is managed by the caller and should
    not be put on a list for reuse or freed. Normally useable by say a
    filingsystem that has a private memory buffer for say a descriptor or disc
    structure thats not kept in a buffer and wants to write it out without the
    need to manipulate the VM or to copy stuff around.

    If I understand what you're describing, B_EXTERN may be a better term.

    Managed buffers are claimed and released directly from/to the bufpool to
    not destroy valueable data. Managed buffers can be brelse()'d.

    How is this different from how LFS uses B_LCKED to manage buffers itself?

    B_NESTED
    marks the buffer as a nested buffer. Nested buffers are currently used in
    genfs and lfs (get/put pages) though implemented as two special case
    generated buffers with call-backs.

    A nested buffer is basicly describing the action of the origional buffer
    over a piece of its buffer space. A series of nested buffers created from
    the master buffer describe together the action. Nested buffers can have
    their own call-back and variables too extending the idea used in genfs and
    lfs.

    Buffer nesting is transparant. A callee doesn't need to know anything of
    the mechanism and can easily nest it again if it wants too.
    biowait/biodone/brelse work as expected.

    When a nested buffer is biodone()'d its master buffer will be updated. When
    the b_resid value of the master buffer then reaches zero, biodone() will be
    called on the master buffer.

    Nested buffers are claimed and released directly from/to the bufpool to not
    destroy valueable data. Nested buffers can be brelse()'d.

    Why do we need a flag? Why not just continue using the callback method? We
    already have B_CALL, which lets us do anything. Why do we need a special
    flag?

    If multiple call paths are doing the same buffer batching operations, then
    I can certainly see adding a buffer cache callback that all these uses can
    share.

    I've appended the patch. Note that this code has been tested for normal
    operations but that the managed and nested buffers haven't actually been
    used yet. I'd rather have feedback first before i change UDF to use the
    buffer types. Genfs and lfs could prolly best be done by their authors.

    While it means more work, I think you should change both of them over to
    the new scheme. As it is, you're adding code to handle a special case,
    UDF. If, however, your patch shows us adding common code and then removing
    duplicates in both genfs and lfs, then it is VERY CLEAR what the
    advantages of the change are.

    With regards,
    Reinoud

    Index: kern/vfs_bio.c

    RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
    retrieving revision 1.148
    diff -u -r1.148 vfs_bio.c
    kern/vfs_bio.c24 Dec 2005 19:12:23 -00001.148
    kern/vfs_bio.c3 Jan 2006 16:05:47 -0000
    @@ -904,6 +904,16 @@
    wakeup(bp);
    }

    +/*
    + * If it's a managed buffer, only recycle struct buf since it doesn't
    + * own its data block and should not be cached or reused in the normal
    + * way.
    + */
    +if (ISSET(bp->b_flags, B_MANAGED)) {
    +bp->b_bufsize = 0;
    +goto clearup;
    +};
    +

    Yeah, something like B_EXTERN sounds right. Though there should be a
    release callback. Just because it's not buffer cache data doesn't mean
    there shouldn't be any cleanup. :-)

    /*
    * Determine which queue the buffer should be on, then put it there.
    */
    @@ -990,6 +1000,7 @@
    CLR(bp->b_flags, B_AGE|B_ASYNC|B_BUSY|B_NCACHE);
    SET(bp->b_flags, B_CACHE);

    +clearup:
    /* Allow disk interrupts. */
    simple_unlock(&bp->b_interlock);
    simple_unlock(&bqueue_slock);
    @@ -1003,6 +1014,85 @@
    }

    /*
    + * Get a new managed buffer. Memory assigned to it is NT owned by the buffer
    + * cache but by the caller, typically a filingsystem.

    file system.

    + */
    +struct buf *
    +getmanagedbuf(int blkno, caddr_t base, int size)
    +{
    +int s;
    +struct buf *bp;
    +
    +KASSERT(base != NULL);
    +KASSERT(size 0);
    +
    +s = splbio();
    +simple_lock(&bqueue_slock);
    +
    +/* please don't destroy valueable data */
    +bp = pool_get(&bufpool, PR_WAITK);
    +
    +/* Initialise buffer */
    +memset(bp, 0, sizeof(struct buf));
    +BUF_INIT(bp);
    +bp->b_flags |= B_MANAGED;
    +
    +bp->b_blkno = blkno;
    +bp->b_data = base;
    +bp->b_bufsize = bp->b_bcount = bp->b_resid = size;
    +
    +simple_unlock(&bqueue_slock);
    +splx(s);
    +
    +return bp;
    +}
    +
    +/*
    + * Get a new buffer nested into the specified buffer at the given offset and
    + * length. N read/write actions ought to be caried out on the master buffer
    + * anymore only on the nested buffers as they effectively split up the master
    + * buffer's action.
    + *
    + * Bug alert: make sure all nested buffers cover the complete mbp->resid
    + * space. If space is to be skipped, mbp->resid needs to be accounted for or
    + * the biodone on mbp won't be called!
    + */
    +struct buf *
    +nestbuf(struct buf *mbp, int blkno, caddr_t base, int size)
    +{
    +int s;
    +struct buf *bp;
    +
    +KASSERT(base != NULL);
    +KASSERT(size 0);
    +KASSERT((mpb==NULL) || (mpb && mbp->b_count < offset+size));
    +
    +s = splbio();
    +simple_lock(&bqueue_slock);
    +
    +/* please don't destroy valueable data */
    +bp = pool_get(&bufpool, PR_WAITK);
    +
    +/*
    + * Adjust relevant information from the master buffer. set nested info
    + * and clear callback info and softdep info.
    + */
    +memcpy(bp, mbp, sizeof(struct buf));
    +bp->b_flags &= ~B_CALL;
    +bp->b_flags |= B_MANAGED | B_NESTED;
    +bp->b_masterbuf = mbp;
    +
    +bp->b_blkno = blkno;
    +bp->b_data = base;
    +bp->b_bufsize = bp->b_bcount = bp->b_resid = size;
    +
    +/* avoid confusion for softdep? */
    +LIST_INIT(&bp->b_dep);
    +
    +return bp;
    +}
    +
    +/*
    * Determine if a block is in the cache.
    * Just look on what would be its hash chain. If it's there, return
    * a pointer to it, unless it's marked invalid. If it's marked invalid,
    @@ -1384,6 +1474,8 @@
    biodone(struct buf *bp)
    {
    int s = splbio();
    +int isnested;
    +struct buf *mbp;

    simple_lock(&bp->b_interlock);
    if (ISSET(bp->b_flags, B_DNE))
    @@ -1397,6 +1489,18 @@
    if (!ISSET(bp->b_flags, B_READ))/* wake up reader */
    vwakeup(bp);

    +/* Propagate status to master buffer if its a nested buffer */
    +isnested = ISSET(bp->b_flags, B_NESTED);
    +mbp = bp->b_masterbuf;
    +if (isnested) {
    +KASSERT(mbp);
    +if (bp->b_flags & B_ERRR) {
    +mbp->b_flags |= B_ERRR;
    +mbp->b_error = bp->b_error;
    +};
    +mbp->b_resid -= bp->b_bcount;
    +};
    +
    /*
    * If necessary, call out. Unlock the buffer before calling
    * iodone() as the buffer isn't valid any more when it return.
    @@ -1417,6 +1521,12 @@
    }

    splx(s);
    +
    +/* call biodone on master buffer if its completed by this op */
    +if (isnested) {
    +if (mbp->b_resid == 0)
    +biodone(mbp);
    +};
    }

    /*
    Index: sys/buf.h

    RCS file: /cvsroot/src/sys/sys/buf.h,v
    retrieving revision 1.84
    diff -u -r1.84 buf.h
    sys/buf.h11 Dec 2005 12:25:20 -00001.84
    sys/buf.h3 Jan 2006 16:05:47 -0000
    @@ -145,6 +145,7 @@
    structvnode *b_vp;/* File vnode. */
    struct workhead b_dep;/* List of filesystem dependencies. */
    void*b_saveaddr;/* b_addr for physio. */
    +struct buf *b_masterbuf;/* If nested, points to masterbuf */

    /*
    * private data for owner.
    @@ -213,11 +214,14 @@
    #defineB_WRITE0x00000000/* Write buffer (pseudo flag). */
    #defineB_XXX0x02000000/* Debugging flag. */
    #defineB_VFLUSH0x04000000/* Buffer is being synced. */
    +#define B_MANAGED0x08000000/* Buffer's memory is not its own */
    +#define B_NESTED0x10000000/* Buffer is not standalone */

    #define BUF_FLAGBITS \
    "\20\1AGE\3ASYNC\4BAD\5BUSY\6SCANNED\7CALL\10DELWRI " \
    "\11DIRTY\12DNE\13EINTR\14ERRR\15GATHERED\16INVAL\1 7LCKED\20NCACHE" \
    - "\22CACHE\23PHYS\24RAW\25READ\26TAPE\30WANTED\32XXX \33VFLUSH"
    + "\22CACHE\23PHYS\24RAW\25READ\26TAPE\30WANTED\32XXX \33VFLUSH\34MANAGED" \
    + "\35NESTED"

    /*
    @@ -289,6 +293,8 @@
    struct buf *geteblk(int);
    struct buf *getnewbuf(int, int, int);
    struct buf *incore(struct vnode *, daddr_t);
    +struct buf *getmanagedbuf(int, caddr_t, int);
    +struct buf *nestbuf(struct buf *, int, caddr_t, int);

    voidminphys(struct buf *);
    intphysio(void (*)(struct buf *), struct buf *, dev_t, int,

    Take care,

    Bill

    PGP SIGNATURE
    Version: GnuPG v1.2.3 (NetBSD)

    ZM1SMrxxfyiunr3Hk=
    =8YfN
    PGP SIGNATURE
  • No.2 | | 435 bytes | |

    Tue, Jan 03, 2006 at 05:34:27PM +0100, Reinoud Zandijk wrote:
    Dear folks,

    i'd like to add two flags to buf.h. They add new functionality to the
    buffer structure without changing current behaviour.

    B_MANAGED
    marks that its buffer contents memory is managed by the caller and should
    not be put on a list for reuse or freed.

    Isn't this precisely what B_LCKED is used for?

    Thor
  • No.3 | | 5625 bytes | |

    Tue, Jan 03, 2006 at 12:05:16PM -0800, Bill Studenmund wrote:
    B_MANAGED
    marks that its buffer contents memory is managed by the caller and should
    not be put on a list for reuse or freed. Normally useable by say a
    filingsystem that has a private memory buffer for say a descriptor or disc
    structure thats not kept in a buffer and wants to write it out without the
    need to manipulate the VM or to copy stuff around.

    If I understand what you're describing, B_EXTERN may be a better term.

    B_EXTERN? why named extern? FreeBSD uses the term B_MANAGED too for this
    kind of buffers and since its a `managed' buffer by some other party and
    not a part of the normal queues. The name B_EXTERN doesn't ring a bell with
    me really.

    Managed buffers are claimed and released directly from/to the bufpool to
    not destroy valueable data. Managed buffers can be brelse()'d.

    How is this different from how LFS uses B_LCKED to manage buffers itself?

    Say a filingsystem has a malloc'd space in wich it keeps for example a FAT
    or some other filing system structure. The only thing to write it out in
    one go is then:

    buf = getmanagedbuf(xxx->yyyloc, structp, structlen);
    buf->b_flags |= B_WRITE;
    VP_STRATEGY(vp, buf)
    brelse(buf);

    it would need either a custom made callback system with a custom
    created buffer and a custom cleanup afterwards in the callback or create a
    buffer and copy stuff in. Reduces complexity a lot for such situations.

    LFS use of locked buffers is different since that only keeps buffers around
    and prevents reuse. Its also handy for the nested case since its also
    managed by definition.

    B_NESTED
    marks the buffer as a nested buffer. Nested buffers are currently used in
    genfs and lfs (get/put pages) though implemented as two special case
    generated buffers with call-backs.

    Why do we need a flag? Why not just continue using the callback method? We
    already have B_CALL, which lets us do anything. Why do we need a special
    flag?

    the call back method sure is a very generic way and very handy indeed. To
    overload/abuse this to provide other functionality like for the managed and
    for the nested buffers is IMH clumsy. The call back method is also
    hardcoded to be for `managed' buffers only since the call back handler has
    to free the buffers themselves.

    vfs_bio.c states: (biodone)
    /*
    * Mark I/ complete on a buffer.
    *
    * If a callback has been requested, e.g. the pageout
    * daemon, do so. , awaken waiting processes.
    *
    * [ Leffler, et al., says on p.247:
    * "This routine wakes up the blocked process, frees the buffer
    * for an asynchronous write, or, for a request by the pagedaemon
    * process, invokes a procedure specified in the buffer structure" ]
    *
    * In real life, the pagedaemon (or other system processes) wants
    * to do async stuff to, and doesn't want the buffer brelse()'d.
    * (for swap pager, that puts swap buffers on the free lists (!!!),
    * for the vn device, that puts malloc'd buffers on the free lists!)
    */

    /*
    * If necessary, call out. Unlock the buffer before calling
    * iodone() as the buffer isn't valid any more when it return.
    */
    if (ISSET(bp->b_flags, B_CALL)) {
    CLR(bp->b_flags, B_CALL); /* but note callout done */
    simple_unlock(&bp->b_interlock);
    (*bp->b_iodone)(bp);
    } else {
    if (ISSET(bp->b_flags, B_ASYNC)) { /* if async, release */
    simple_unlock(&bp->b_interlock);
    brelse(bp);
    } else { /* or just wakeup the buffer */
    CLR(bp->b_flags, B_WANTED);
    wakeup(bp);
    simple_unlock(&bp->b_interlock);
    }
    }

    The callback facility ought to be available to all kinds of buffer actions
    without them also needing to clean up afterwards but thats another patch
    and would change current behaviour. That'll be a lot larger a patch :)

    The pagedeamon could use B_MANAGED buffers :)

    If multiple call paths are doing the same buffer batching operations, then
    I can certainly see adding a buffer cache callback that all these uses can
    share.

    A buffer could be split up for various reasons and all sheduled
    independently. The call-back could then optionally be used for whatever
    reason the function had to split up the buffer.

    While it means more work, I think you should change both of them over to
    the new scheme. As it is, you're adding code to handle a special case,
    UDF. If, however, your patch shows us adding common code and then removing
    duplicates in both genfs and lfs, then it is VERY CLEAR what the
    advantages of the change are.

    Its not for UDF only really its just that i saw it come by in genfs and
    lfs and figured out a more generic way would be preferable too. UDF sure
    could benefit a lot from it too i'll have to admit.

    Yeah, something like B_EXTERN sounds right. Though there should be a
    release callback. Just because it's not buffer cache data doesn't mean
    there shouldn't be any cleanup. :-)

    If one needs cleanup, biodone() has allready called the callback function
    for the part. The brelse is only for the struct buf itself, not the data in
    it. This brelse() will normally be called within this callback.

    Hope this answers some of your questions,

    With regards,
    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    =8pfZ
    PGP SIGNATURE
  • No.4 | | 1498 bytes | |

    Hiya Greg,

    Tue, Jan 03, 2006 at 02:43:39PM -0600, Greg wrote:
    Reinoud Zandijk writes:
    is locking/unlocking bqueue_slock really needed here? Isn't that
    just for mucking with bufqueues[] and friends? Also, are you allowed
    to PR_WAITK w/ bqueue_slock locked? (I don't think so) If nothing
    else, can the unlock and splx() be moved up to right after the
    pool_get() ? (i.e. only block things for as long as really necessary? )

    (sys/kern_physio.c:getphysbuf() seems to indicate that the
    simple_lock/simple_unlock for pool_get(&bufpool,_) isn't needed.)

    You are right i think that was left from an earlier solution silly me. I
    used to claim a buffer normally until i decided that would kill off data
    unnessisarily!

    Thanks for pointing out!

    Again, I don't think you can PR_WAITK w/ a lock held. A LCKDEBUG
    kernel would have a fit :)

    i noticed before ;)

    +/* avoid confusion for softdep? */
    +LIST_INIT(&bp->b_dep);

    Missing a:

    simple_unlock(&bqueue_slock);
    splx(s);

    here? (or, preferably, right after the pool_get() if that would be
    sufficient.

    ARGH! shows i haven't executed the code :-) My idea of delaying the splx
    is prolly paranoia to be sure all is set up correctly just in case an
    interrupt comes along.

    Fixed the places, thanks :)

    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    =Xsmc
    PGP SIGNATURE
  • No.5 | | 830 bytes | |

    B_EXTERN? why named extern? FreeBSD uses the term B_MANAGED too for this
    kind of buffers and since its a `managed' buffer by some other party and
    not a part of the normal queues. The name B_EXTERN doesn't ring a bell with
    me really.

    i don't think freebsd's B_MANAGED is such a thing.
    afaik, it's something like B_LCKED.

    Say a filingsystem has a malloc'd space in wich it keeps for example a FAT
    or some other filing system structure. The only thing to write it out in
    one go is then:

    buf = getmanagedbuf(xxx->yyyloc, structp, structlen);
    buf->b_flags |= B_WRITE;
    VP_STRATEGY(vp, buf)
    brelse(buf);

    struct buf is used for two different purposes. ie. buffer cache and i/o.
    i guess that you are mixing them.

    YAMAMT Takashi
  • No.6 | | 1081 bytes | |

    Jan 3, 2006, at 1:40 PM, Reinoud Zandijk wrote:

    B_EXTERN? why named extern? FreeBSD uses the term B_MANAGED too for
    this
    kind of buffers and since its a `managed' buffer by some other
    party and
    not a part of the normal queues. The name B_EXTERN doesn't ring a
    bell with
    me really.

    Why are you using the bio cache at all? File data in UDF should be
    managed by UBC.

    Say a filingsystem has a malloc'd space in wich it keeps for
    example a FAT
    or some other filing system structure. The only thing to write it
    out in
    one go is then:

    buf = getmanagedbuf(xxx->yyyloc, structp, structlen);
    buf->b_flags |= B_WRITE;
    VP_STRATEGY(vp, buf)
    brelse(buf);

    it would need either a custom made callback system with a
    custom
    created buffer and a custom cleanup afterwards in the callback or
    create a
    buffer and copy stuff in. Reduces complexity a lot for such
    situations.

    Why would you do that at all? Why not just use getbuf() or bread()
    or whatever?
    -- thorpej
  • No.7 | | 2011 bytes | |

    Dear Jason,

    Tue, Jan 03, 2006 at 09:14:45PM -0800, Jason Thorpe wrote:
    >B_EXTERN? why named extern? FreeBSD uses the term B_MANAGED too for
    >this
    >kind of buffers and since its a `managed' buffer by some other
    >party and
    >not a part of the normal queues. The name B_EXTERN doesn't ring a
    >bell with
    >me really.


    Why are you using the bio cache at all? File data in UDF should be
    managed by UBC.

    All file data in UDF is indeed managed by UBC and thats ok with me. This
    patch has nothing to do with my UDF implementation though it could use
    such functionality.

    Also in the example i'm refering to descriptors (inodes) and other meta
    information that is read in from the disc. Its part of the vnode's v_data.
    Such information could be stored in buffers with a B_LCKED but that would
    be abusing the bio cache.

    >Say a filingsystem has a malloc'd space in wich it keeps for
    >example a FAT
    >or some other filing system structure. The only thing to write it
    >out in
    >one go is then:
    >
    >buf = getmanagedbuf(xxx->yyyloc, structp, structlen);
    >buf->b_flags |= B_WRITE;
    >VP_STRATEGY(vp, buf)
    >brelse(buf);
    >

    it would need either a custom made callback system with a
    >custom
    >created buffer and a custom cleanup afterwards in the callback or
    >create a
    >buffer and copy stuff in. Reduces complexity a lot for such
    >situations.


    Why would you do that at all? Why not just use getbuf() or bread()
    or whatever?

    For its for metadata only like file allocation lists etc? that is not part
    of directory contents and not part of file data. See my explanation on the
    example above.

    With regards,
    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    =j0Cs
    PGP SIGNATURE
  • No.8 | | 1836 bytes | |

    Jan 4, 2006, at 5:15 AM, Reinoud Zandijk wrote:

    Also in the example i'm refering to descriptors (inodes) and other
    meta
    information that is read in from the disc. Its part of the vnode's
    v_data.
    Such information could be stored in buffers with a B_LCKED but
    that would
    be abusing the bio cache.

    So, you're reading a structure from disk and storing it directly in
    the udf_inode (or whatever you call the v_data structure) I guess
    that's one way to do it UFS does it a different way it bread
    ()'s the data, then copies the data from the on-disk representation
    into a (more convenient) in-core representation, possibly byte-
    swapping it.

    Anyway, what you're really looking for is "struct buf as an I/
    descriptor". Lots of parts of the kernel do what you're doing
    already. They don't use getbuf() / brelse() at all.

    Some of them do this:

    int
    dkwedge_read(struct disk *pdk, struct vnode *vp, daddr_t blkno, void
    *tbuf,
    size_t len)
    {
    struct buf b;

    BUF_INIT(&b);

    b.b_vp = vp;
    b.b_dev = vp->v_rdev;
    b.b_blkno = blkno;
    b.b_bcount = b.b_resid = len;
    b.b_flags = B_READ;
    b.b_proc = curproc;
    b.b_data = tbuf;

    VP_STRATEGY(vp, &b);
    return (biowait(&b));
    }

    Some of them dynamically allocate the buf from the bufpool using
    pool_get()/pool_put().

    I don't see any reason to muddy the buffer cache waters anymore than
    they already are by adding B_MANAGED (or B_EXTERN or whatever).

    Eventually, what we probably ought to do is break "struct buf" into
    two different types:

    struct bio -- This is the I/ descriptor.

    struct buf -- A "subclass" of "struct bio" that handles the caching
    aspects.
    -- thorpej
  • No.9 | | 1369 bytes | |

    Wed, Jan 04, 2006 at 09:42:30AM -0800, jonathan (AT) dsg (DOT) stanford.edu wrote:
    In message <@shagadelic.org>,
    Which tends to suggest that the current UDF code probably won't work
    on a wrong-endian host. Has UDFit been tested by moving a single
    physical media device between big- and little-endian hosts?
    Ideally, both write-on-big, read-on-little, and write-on-littel,read-on-big.

    If it hasn't, that should be tested and verified to work beforte the
    code is imported.

    The current code is still read-only :-/ and allthough it doesn't byswap all
    structures in advance it byteswaps the values retrieved from them when
    requested. This has to do with the quite `flexible' structures with lots of
    concatinated tables and variable offsets etc. Translating them all in one
    go would require quite a lot of code and is error prone. I've seen it done
    before and believe me, its not a nice thing to do.

    My in-kernel UDF implemenation is a re-implementation of
    pkgsrc/misc/udfclient only a lot cleaner and simpler. UDFclient has and is
    run on 32 or 64 bits machines and is endian-proof.

    My in-kernel UDF implementation has been given special care to be
    endian-proof.

    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    =FW
    PGP SIGNATURE
  • No.10 | | 2225 bytes | |

    Dear Jason,

    Wed, Jan 04, 2006 at 09:09:57AM -0800, Jason Thorpe wrote:
    So, you're reading a structure from disk and storing it directly in
    the udf_inode (or whatever you call the v_data structure) I guess
    that's one way to do it UFS does it a different way it bread
    ()'s the data, then copies the data from the on-disk representation
    into a (more convenient) in-core representation, possibly byte-
    swapping it.

    this copying is what i tried to avoid too though argumently it could be
    considered more atomically read/write and take the copy for granted.

    Anyway, what you're really looking for is "struct buf as an I/
    descriptor". Lots of parts of the kernel do what you're doing
    already. They don't use getbuf() / brelse() at all.

    Some of them dynamically allocate the buf from the bufpool using
    pool_get()/pool_put().

    I don't see any reason to muddy the buffer cache waters anymore than
    they already are by adding B_MANAGED (or B_EXTERN or whatever).

    one of the goals was rather to un-muddy the buffer stuff i.e. cleaning up
    by trying to orthogonise buffer flags and actions. Splitting up the struct
    buf came to my mind too but i considered that to be too big a change for
    now.

    of the results of the orthogonising efford was the introduction of the
    B_MANAGED. i only intended to put the RFC on the B_NESTED
    proposal but since part of what was needed also included the not owning its
    memory space i introduced the B_MANAGED.

    Eventually, what we probably ought to do is break "struct buf" into
    two different types:

    struct bio -- This is the I/ descriptor.

    struct buf -- A "subclass" of "struct bio" that handles the caching
    aspects.

    maybe with a generic callback that takes as 1st argument a reason for
    calling ? But speaking of wich if there is no plan laid out for it,
    shouldn't we then at least try to arthogonise the functionality and clean
    up the special cases first so we end of with easier to replace items?

    Cheers,
    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    +
    =GMf9
    PGP SIGNATURE
  • No.11 | | 683 bytes | |

    Jan 4, 2006, at 2:05 PM, Reinoud Zandijk wrote:

    one of the goals was rather to un-muddy the buffer stuff i.e.
    cleaning up
    by trying to orthogonise buffer flags and actions.

    Except that getbuf() / brelse() imply the use of the caching
    mechanism. You then set a flag that says "oh, I'm just using this as
    an I/ descriptor", which basically makes brelse() mean nothing other
    than "pool_put()".

    So, why not just remain consistent with the other parts of the kernel
    that already use a buf as an I/ descriptor rather than adding
    something that isn't actually needed just so it can be ripped out
    later anyway?
    -- thorpej
  • No.12 | | 1160 bytes | |

    Wed, Jan 04, 2006 at 02:16:19PM -0800, Jason Thorpe wrote:
    Jan 4, 2006, at 2:05 PM, Reinoud Zandijk wrote:
    >one of the goals was rather to un-muddy the buffer stuff i.e.
    >cleaning up
    >by trying to orthogonise buffer flags and actions.


    Except that getbuf() / brelse() imply the use of the caching
    mechanism. You then set a flag that says "oh, I'm just using this as
    an I/ descriptor", which basically makes brelse() mean nothing other
    than "pool_put()".

    So, why not just remain consistent with the other parts of the kernel
    that already use a buf as an I/ descriptor rather than adding
    something that isn't actually needed just so it can be ripped out
    later anyway?

    maybe i'm too focused on performace or rather see stuff changed the sooner
    the better.

    What do you think about the B_NESTED? This would eventually go into the
    iobuf part and is significant addition in functionality at practically no
    cost.

    With regards,
    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    +

    =eszM
    PGP SIGNATURE
  • No.13 | | 638 bytes | |

    Jan 4, 2006, at 3:45 PM, Reinoud Zandijk wrote:

    What do you think about the B_NESTED? This would eventually go into
    the
    iobuf part and is significant addition in functionality at
    practically no
    cost.

    B_NESTED sounds K but I don't think you actually need a separate
    bit for it; you should be able to tell if its nested just from the
    parent buf pointer.

    I would also like to see the parent buf have a list of the sub-bufs
    it depends on.

    Can you please produce a version of the patch that only contains the
    nested buf changes?

    With regards,
    Reinoud
    -- thorpej
  • No.14 | | 2286 bytes | |

    Thu, Jan 05, 2006 at 12:45:06AM +0100, Reinoud Zandijk wrote:
    Wed, Jan 04, 2006 at 02:16:19PM -0800, Jason Thorpe wrote:

    Except that getbuf() / brelse() imply the use of the caching
    mechanism. You then set a flag that says "oh, I'm just using this as
    an I/ descriptor", which basically makes brelse() mean nothing other
    than "pool_put()".

    So, why not just remain consistent with the other parts of the kernel
    that already use a buf as an I/ descriptor rather than adding
    something that isn't actually needed just so it can be ripped out
    later anyway?

    maybe i'm too focused on performace or rather see stuff changed the sooner
    the better.

    Are you sure that these steps are a performance bottleneck?

    Yes, if we can not copy, we can have some performance gains. However we
    need to look at the whole picture. First off, the read-a-buffer-and-copy
    technique lends itself easily to cross-endian operation. Also, for ffs we
    have the advantage that we read a bunch of inode structures into memory in
    one buffer. So yes, we copy, but one i/o reads multiple inodes in one
    step. Saving the i/o MRE than outweighs the copies.

    What do you think about the B_NESTED? This would eventually go into the
    iobuf part and is significant addition in functionality at practically no
    cost.

    I can't tell you what Jason thinks, just what I think. I think that you
    have a view of how the buffer system works which is different from the
    majority of the developers. I think you really need to understand how it
    works now before suggesting changes. we can end up in a real
    mess of conflicting expectations and behaviors.

    My understanding of your B_NESTED proposal is that you want to take
    functionality in genfs and lfs and soon to be in udf and consolidate it.
    So show us a diff that does that. Demonstrate that the functionality
    can be shared by making lfs and genfs use it. A diff doing that will be a
    MUCH stronger arguement that you have factored things correctly and that
    we end up reducing code duplication.

    Take care,

    Bill

    PGP SIGNATURE
    Version: GnuPG v1.2.3 (NetBSD)

    ty7zHK3HPLP6q4CN2blFb2A=
    =
    PGP SIGNATURE
  • No.15 | | 1852 bytes | |

    Dear folks,

    i've got a new patch ready for the nesting of buffers complete with patch
    for genfs. I've also added a patch for LFS but since LFS is messing around
    with B_LCKED buffers something went wrong. I've discussed with with Greg
    and it needs some more work. In the mean time LFS can still use the old
    method without problems; this means that the UVM patch can't be applied
    either since the function removed from UVM is needed still by LFS.

    Basicly, i've tried to make the nested buffers interface agnostic as much
    as possible and indifferent to other layers they can be passed to. The
    buffers can even be brelse()'d or issued with B_ASYNC without a callback;
    the code will do the right thing. If code uses callbacks they can just
    brelse() them like it is a normal buffer; no need for special knowledge.
    This is documented in the source. A patch for manpages is not added yet.

    If the splitup between iobuffers and cached buffers is made, this behaviour
    is then more logically implemented and can be easily taken care of.

    I've also checked if nested buffers can be nested again and this has proven
    to work fine. Nested buffers can be used for read and for write actions
    with the appropriate write counter updated.

    Also included - on request by Jason - is maintenance of a list of dependent
    buffers that is checked for sanity at biodone() time. This could also be
    used later to recall buffer operations if for such a thing is decided on.

    I'd like to post the BI en GENFS patch this week or so (if only to clean
    up my tree ;)). The LFS and its dependent UVM patch can be dealt with
    later.

    With regards,
    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    =xu1W
    PGP SIGNATURE
  • No.16 | | 744 bytes | |

    hi,

    Basicly, i've tried to make the nested buffers interface agnostic as much
    as possible and indifferent to other layers they can be passed to. The
    buffers can even be brelse()'d or issued with B_ASYNC without a callback;
    the code will do the right thing. If code uses callbacks they can just
    brelse() them like it is a normal buffer; no need for special knowledge.
    This is documented in the source. A patch for manpages is not added yet.

    using brelse() for non-buffercache buffers is a bug.

    i don't understand what exactly you want to achieve. can you explain?
    to me, it seems that you are going to increase the sizeof(struct buf)
    without any benefits.

    YAMAMT Takashi
  • No.17 | | 3087 bytes | |

    Hi Takashi,

    Sat, Jan 07, 2006 at 12:41:53PM +0900, YAMAMT Takashi wrote:
    Basicly, i've tried to make the nested buffers interface agnostic as much
    as possible and indifferent to other layers they can be passed to. The
    buffers can even be brelse()'d or issued with B_ASYNC without a callback;
    the code will do the right thing. If code uses callbacks they can just
    brelse() them like it is a normal buffer; no need for special knowledge.
    This is documented in the source. A patch for manpages is not added yet.

    using brelse() for non-buffercache buffers is a bug.

    maybe in `the new world order' of iobufs and buffercache bufs?

    Its basicly a fundemental problem. When a buffer (regardless of what kind)
    is processed, a biodone() is issued on the buffer. If a B_CALL is
    specified, the callback is issued and the callback is responsable for the
    cleaning up and - due to the hack in biodone() for the origional pager -
    the callback has to clean up and has to know if it was a normal buffer or
    an iobuffer.

    The best solution would be to break this callback hack in that the call
    back is issued when requested and then the normal processing and optional
    cleanup in biodone() is done as origionally intended by the design. (see
    source code for the comment on this) relieving the callback of its cleaning
    up burden.

    If i pass a nested buffer to say a device with a B_ASYNC flag this device
    calls biodone() on it and as long as it doesn't call brelse() all ought to
    go K. Since this is normally considered `not-done` it is K.

    For now, I could let biodone() clean up the nested iobuffer itself but
    wouldn't that break the semantics that biodone() leaves a buffer alive if
    B_ASYNC is *not* specified? should all the nested buffers kept alive
    (unless the callback explicitly removes them offcource) until the master
    buffer is biodone()'d eventually?

    Would a nested buffer cleanup in `putiobuf' satisfy you? (and removal of
    the addition to the brelse code), Then biodone() can decide to either call
    brelse() or putiobuf().

    i don't understand what exactly you want to achieve. can you explain?
    to me, it seems that you are going to increase the sizeof(struct buf)
    without any benefits.

    The main benefit to nesting is that an action on an (io)buf can be split up
    into several independently sheduled actions on the nested buf's like
    genfs's getpages/putpages do whatever the reason is. Genfs only uses a
    hardcoded mechanism to skip over dirty pages and the nested buffer
    implementation is a generalisation of this mechanism.

    By implementing it this way this splitting up is streamlined and reusable
    and provides another set of b_private, b_bp, b_call and other
    flags/values/callbacks that can be reused by the code that decided to split
    up the buffer.

    with regards,
    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    =zh5U
    PGP SIGNATURE
  • No.18 | | 627 bytes | |

    Dear Takashi,

    in a conversation with Greg we've concluded that since the struct buf is
    currently in flux and iobuf's are just to be introduced its better to
    implement the nestediobuf's in a less intrusive way I'll create a new
    DIFF_BI patch for that. The other patches don't have to changed.

    The B_CALL semantics as cleaning up agent is still broken IMH I think
    they really only envisioned async writings whereas its also used for async
    reads.

    with regards,
    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    =umG4
    PGP SIGNATURE
  • No.19 | | 464 bytes | |

    Dear Takashi,

    in a conversation with Greg we've concluded that since the struct buf is
    currently in flux and iobuf's are just to be introduced its better to
    implement the nestediobuf's in a less intrusive way I'll create a new
    DIFF_BI patch for that. The other patches don't have to changed.

    Here it is,

    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    =1LNj
    PGP SIGNATURE
  • No.20 | | 1128 bytes | |

    i don't understand what exactly you want to achieve. can you explain?
    to me, it seems that you are going to increase the sizeof(struct buf)
    without any benefits.

    The main benefit to nesting is that an action on an (io)buf can be split up
    into several independently sheduled actions on the nested buf's like
    genfs's getpages/putpages do whatever the reason is. Genfs only uses a
    hardcoded mechanism to skip over dirty pages and the nested buffer
    implementation is a generalisation of this mechanism.

    to me, it seems reverse.
    your proposed patch changes how to implement nested buffers,
    from B_CALL and b_private (generic way) to
    a special-purpose hook and members (less generic way).

    By implementing it this way this splitting up is streamlined and reusable
    and provides another set of b_private, b_bp, b_call and other
    flags/values/callbacks that can be reused by the code that decided to split
    up the buffer.

    do you mean, you want to use the "another set of flags/values/callbacks"
    for your filesystem? for what?

    YAMAMT Takashi
  • No.21 | | 1914 bytes | |

    Sun, Jan 08, 2006 at 11:35:00AM +0900, YAMAMT Takashi wrote:
    The main benefit to nesting is that an action on an (io)buf can be split up
    into several independently sheduled actions on the nested buf's like
    genfs's getpages/putpages do whatever the reason is. Genfs only uses a
    hardcoded mechanism to skip over dirty pages and the nested buffer
    implementation is a generalisation of this mechanism.

    to me, it seems reverse.
    your proposed patch changes how to implement nested buffers,
    from B_CALL and b_private (generic way) to
    a special-purpose hook and members (less generic way).

    The special-purpose hook (i think you mean b_masterbuf) is to allow the
    split up code in a routine that creates the nested buffers to use a
    b_private for its own use. The newest patch does use the B_CALL method to
    be less intrusive, see my latest patch.

    The members are for diagnostic/debug only (now) and i've added them on
    request of Jason. They can be defined away when not running such a kernel.

    do you mean, you want to use the "another set of flags/values/callbacks"
    for your filesystem? for what?

    maybe my wording wasn't that clear but i meant to say that implementing it
    in this way the parts that do split up buffers dont have to know all
    details on the current buffer implementation's quirks. Especially if we're
    going to restreamline the buffer code, the less knowledge is needed for a
    correct operation the better otherwise odd bugs can suddenly show up.

    A simple deligation system might be plausable in the future design to
    greatly reduce struct buf sizes and provide more customised design without
    the need for the overloading and/or adding of variables.

    Regards,
    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    +

    =awXM
    PGP SIGNATURE
  • No.22 | | 1553 bytes | |

    to me, it seems reverse.
    your proposed patch changes how to implement nested buffers,
    from B_CALL and b_private (generic way) to
    a special-purpose hook and members (less generic way).

    The special-purpose hook (i think you mean b_masterbuf) is to allow the
    split up code in a routine that creates the nested buffers to use a
    b_private for its own use. The newest patch does use the B_CALL method to
    be less intrusive, see my latest patch.

    the above comment was after seeing your latest patch.
    my question is, what is "its own use".
    i don't understand why your "less intrusive" version can't use b_private.

    do you mean, you want to use the "another set of flags/values/callbacks"
    for your filesystem? for what?

    maybe my wording wasn't that clear but i meant to say that implementing it
    in this way the parts that do split up buffers dont have to know all
    details on the current buffer implementation's quirks. Especially if we're
    going to restreamline the buffer code, the less knowledge is needed for a
    correct operation the better otherwise odd bugs can suddenly show up.

    A simple deligation system might be plausable in the future design to
    greatly reduce struct buf sizes and provide more customised design without
    the need for the overloading and/or adding of variables.

    what you want is an API to reduce the code to touch bp->b_* directly?
    if so, i'd suggest something like the attached patch.

    YAMAMT Takashi
  • No.23 | | 1701 bytes | |

    Dear Yamt,

    Mon, Jan 09, 2006 at 01:04:38PM +0900, YAMAMT Takashi wrote:
    the above comment was after seeing your latest patch.
    my question is, what is "its own use".
    i don't understand why your "less intrusive" version can't use b_private.

    it could though aren't there allready a lot of situations where devices
    bluntly use the b_private? or are you sure they only use that on their own
    created buffers? IF they would bluntly overwrite/use b_private it could
    result in a panic or am i paranoid in this?

    A simple deligation system might be plausable in the future design to
    greatly reduce struct buf sizes and provide more customised design without
    the need for the overloading and/or adding of variables.

    what you want is an API to reduce the code to touch bp->b_* directly?
    if so, i'd suggest something like the attached patch.

    offcource, that was my intention too from the start. I've looked at your
    patch and can agree with it (though i haven't tested it yet) but its
    important to also export nestiobuf_iodone() to provide support for custom
    callbacks.

    Furthermore it looks like you took a rewrite/simplification of genfs as
    basis and created the nestediobuf semantics from that whereas my
    implementation started as a mechanism and i modified genfs to use it.

    It solves one minor issue i've overlooked but thought of y/day night: the
    case that no subbuffer was issued. Your explicit nestiobuf_done() solves
    this.

    Do you plan to commit it soon?

    Reinoud

    PGP SIGNATURE
    Version: GnuPG v1.2.6 (NetBSD)

    =ZV0P
    PGP SIGNATURE
  • No.24 | | 591 bytes | |

    Please stop using the Mail-Followup-To: that you have been using on
    tech-kern.

    There is no reason that the folks who were CC'd on the notes you reply to
    need to be in the "To:" of the reply.

    Also, is there a specific reason you do not include yourself in the
    Followup? When I'm replying to your notes, I am addressing you and thus
    find the headers including a lot of names EXCEPT yours irritating. :-)

    Take care,

    Bill

    PGP SIGNATURE
    Version: GnuPG v1.2.3 (NetBSD)

    mc374FJDJpFNiCjnXA6vw=
    =sN6w
    PGP SIGNATURE
  • No.25 | | 882 bytes | |

    the above comment was after seeing your latest patch.
    my question is, what is "its own use".
    i don't understand why your "less intrusive" version can't use b_private.

    it could though aren't there allready a lot of situations where devices
    bluntly use the b_private? or are you sure they only use that on their own
    created buffers? IF they would bluntly overwrite/use b_private it could
    result in a panic or am i paranoid in this?

    why do you think so? i don't think they do.
    if they do, even the current genfs code shouldn't work. :-)

    but its
    important to also export nestiobuf_iodone() to provide support for custom
    callbacks.

    again, for what do you want to use custom callbacks on these buffers?

    Do you plan to commit it soon?

    if there is no objection, yes.

    YAMAMT Takashi

Re: RFC: addition of B_MANAGED and B_NESTED to buf.h and vfs_bio.c


max 4000 letters.
Your nickname that display:
In order to stop the spam: 4 + 3 =
QUESTION ON "BSD"

EMSDN.COM