Development

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • A memory management question

    6 answers - 1134 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Can someone explain the use of SETLENGTH() and SETTRUELENGTH()?
    I would like to allocate a vector and reserve some space at the end,
    so that it appears shorter than the allocated size. So that I can
    more efficiently append to the vector, without requiring a new copy
    every time. So I'd like to use SETLENGTH() with a shorter apparent
    length, and bump this up as needed until I've used the entire space.
    There are only a couple users of SETLENGTH() in R, and they all appear
    at first glance to be pointless: a few routines use allocVector() and
    then call SETLENGTH() to set the vector length to the value that was
    just allocated. What are valid uses for SETLENGTH()? And what are
    the intended semantics for "truelength" as opposed to the regular
    length?
    If GC happens and an object is moved, and its apparent LENGTH()
    differs from its allocated length, does GC preserve the allocated
    length, or the updated LENGTH()? Is there any way to get at the
    original allocated length, given an SEXP?
    -- Dave
    R-devel (AT) r-project (DOT) org mailing list
  • No.1 | | 2378 bytes | |

    Sun, 4 Sep 2005, dhinds (AT) sonic (DOT) net wrote:

    Can someone explain the use of SETLENGTH() and SETTRUELENGTH()?

    I would like to allocate a vector and reserve some space at the end,
    so that it appears shorter than the allocated size. So that I can
    more efficiently append to the vector, without requiring a new copy
    every time. So I'd like to use SETLENGTH() with a shorter apparent
    length, and bump this up as needed until I've used the entire space.

    There are only a couple users of SETLENGTH() in R, and they all appear
    at first glance to be pointless: a few routines use allocVector() and
    then call SETLENGTH() to set the vector length to the value that was
    just allocated. What are valid uses for SETLENGTH()? And what are
    the intended semantics for "truelength" as opposed to the regular
    length?

    This is not supported by the memory manager. Using SETLENGTH to
    change the length would confuse the garbage collector should
    probably remove SETLENGTH from the headers.

    TRUELENGTH is unused except for something very different in envir.c.
    Again we should probably remove or rename this to reflect how it is
    currently used. At one point, well before the current memory manager,
    I believe there was thought that we might allow this sort of
    over-allocation but I don't believe it was ever implemented.

    The memory manager does over-allocate small vectors by rounding up to
    convenient sizes, and the real size could be computed, but this is not
    true for large allocations correspond to malloc calls for the
    requested size in any case the memory manager relies on LENGTH
    giving the correct amount (maybe not heavily but this could change).

    If GC happens and an object is moved, and its apparent LENGTH()
    differs from its allocated length, does GC preserve the allocated
    length, or the updated LENGTH()? Is there any way to get at the
    original allocated length, given an SEXP?

    A GC does not move objects.

    Using R level vectors for the purpose you describe is in any case
    tricky since it is hard to reliably prevent copying. You are better
    off using something like an external pointer into an R-allocated
    object that is only accessible through the external pointer. Then you
    can manage the filled length yourself.

    luke
  • No.2 | | 2601 bytes | |

    Luke Tierney <luke (AT) stat (DOT) uiowa.eduwrote:

    This is not supported by the memory manager. Using SETLENGTH to
    change the length would confuse the garbage collector should
    probably remove SETLENGTH from the headers.

    The memory manager does over-allocate small vectors by rounding up to
    convenient sizes, and the real size could be computed, but this is not
    true for large allocations correspond to malloc calls for the
    requested size in any case the memory manager relies on LENGTH
    giving the correct amount (maybe not heavily but this could change).

    A GC does not move objects.

    Using R level vectors for the purpose you describe is in any case
    tricky since it is hard to reliably prevent copying. You are better
    off using something like an external pointer into an R-allocated
    object that is only accessible through the external pointer. Then you
    can manage the filled length yourself.

    since GC does not move objects, and large vectors are allocated
    using a regular malloc, and malloc/free manages space independent of
    the LENGTH information, it seems that SETLENGTH would be "safe" if it
    was possible to guarantee for an interval of time that this particular
    value would not be moved or released due to any user activity?

    What if I create the full-length vector, make it visible using
    defineVar(), then protect the vector by creating a reference with
    R_MakeExternalPtr(), and R_P() this reference? Then
    shouldn't the vector be left alone until I release the reference?
    And I could then play with SETLENGTH() on that vector safely, so long
    as I restore it before releasing the reference, and so long as I only
    perform operations that modify the vector in-place?

    i.e., should something like this work:

    static SEXP ptr;

    do_init()
    {
    SEXP s = PRTECT(allocVector(RAWSXP, 1000));
    defineVar("mystuff", s, R_BaseEnv);
    ptr = R_MakeExternalPtr(RAW(s), R_NilValue, s);
    R_P(ptr);
    SETLENGTH(s, 0);
    UNPRTECT(1);
    }

    do_extend()
    {
    SEXP s = R_ExternalPtrProtected(ptr);
    memcpy(RAW(s)+LENGTH(s), "xxxx", 4);
    SETLENGTH(s, LENGTH(s)+4);
    }

    do_finish()
    {
    SEXP s = R_ExternalPtrProtected(ptr);
    SETLENGTH(s, 1000);
    R_R(ptr);
    }

    i.e., if the user tries to modify "mystuff", they'll end up with a
    copy, but the value pointed to by ptr will hang around (no longer
    accessible by the user) until do_finish() is called?
    -- Dave

    R-devel (AT) r-project (DOT) org mailing list
  • No.3 | | 3333 bytes | |

    Mon, 5 Sep 2005, dhinds (AT) sonic (DOT) net wrote:

    Luke Tierney <luke (AT) stat (DOT) uiowa.eduwrote:
    >
    >This is not supported by the memory manager. Using SETLENGTH to
    >change the length would confuse the garbage collector should
    >probably remove SETLENGTH from the headers.
    >
    >The memory manager does over-allocate small vectors by rounding up to
    >convenient sizes, and the real size could be computed, but this is not
    >true for large allocations correspond to malloc calls for the
    >requested size in any case the memory manager relies on LENGTH
    >giving the correct amount (maybe not heavily but this could change).
    >
    >A GC does not move objects.
    >
    >Using R level vectors for the purpose you describe is in any case
    >tricky since it is hard to reliably prevent copying. You are better
    >off using something like an external pointer into an R-allocated
    >object that is only accessible through the external pointer. Then you
    >can manage the filled length yourself.
    >

    since GC does not move objects, and large vectors are allocated
    using a regular malloc, and malloc/free manages space independent of
    the LENGTH information, it seems that SETLENGTH would be "safe" if it
    was possible to guarantee for an interval of time that this particular
    value would not be moved or released due to any user activity?

    What if I create the full-length vector, make it visible using
    defineVar(), then protect the vector by creating a reference with
    R_MakeExternalPtr(), and R_P() this reference? Then
    shouldn't the vector be left alone until I release the reference?
    And I could then play with SETLENGTH() on that vector safely, so long
    as I restore it before releasing the reference, and so long as I only
    perform operations that modify the vector in-place?

    It might or might not work now but is not guaranteed to do so reliably
    in the future. Seeing the risks of leaving SETLENGTH exposed, it is
    very likely that SETLENGTH will be removed from the sources after the
    2.2.0 release.

    If you provide your own methods to read and write the external pointer
    then you don' need this; this is safer than relying on undocumented
    behavior of [ and [<- in any case. You also then don't need to use
    R_P unless you really need to use it from the C level
    outside of a context where an R reference exists.

    luke

    i.e., should something like this work:

    static SEXP ptr;

    do_init()
    {
    SEXP s = PRTECT(allocVector(RAWSXP, 1000));
    defineVar("mystuff", s, R_BaseEnv);
    ptr = R_MakeExternalPtr(RAW(s), R_NilValue, s);
    R_P(ptr);
    SETLENGTH(s, 0);
    UNPRTECT(1);
    }

    do_extend()
    {
    SEXP s = R_ExternalPtrProtected(ptr);
    memcpy(RAW(s)+LENGTH(s), "xxxx", 4);
    SETLENGTH(s, LENGTH(s)+4);
    }

    do_finish()
    {
    SEXP s = R_ExternalPtrProtected(ptr);
    SETLENGTH(s, 1000);
    R_R(ptr);
    }

    i.e., if the user tries to modify "mystuff", they'll end up with a
    copy, but the value pointed to by ptr will hang around (no longer
    accessible by the user) until do_finish() is called?

    -- Dave

    R-devel (AT) r-project (DOT) org mailing list
  • No.4 | | 3094 bytes | |

    Luke Tierney <luke (AT) stat (DOT) uiowa.eduwrote:

    It might or might not work now but is not guaranteed to do so reliably
    in the future. Seeing the risks of leaving SETLENGTH exposed, it is
    very likely that SETLENGTH will be removed from the sources after the
    2.2.0 release.

    If you provide your own methods to read and write the external pointer
    then you don' need this; this is safer than relying on undocumented
    behavior of [ and [<- in any case. You also then don't need to use
    R_P unless you really need to use it from the C level
    outside of a context where an R reference exists.

    I'm not sure I follow this. Maybe I should explain the context for
    the problem.

    textConnection("xyz", "w") creates a connection, the output of which
    is deposited in a char vector named "xyz", which is updated line by
    line as output is sent to the connection. The current code maintains
    a pointer to "xyz" in the form of an unprotected SEXP. Hence if the
    user does rm(xyz), bad things happen. A small bug, I admit.

    I think the best fix is to use a protected reference to the result
    vector. I think this is safe and doesn't rely on any abuse of the
    interfaces.

    There's also a performance issue, that the result is updated after
    every line of output, resulting in a vast amount of copying if a large
    result is accumulated. This is the part that could be fixed by using
    SETLENGTH to manage the length of the protected result vector.

    I'm not sure what you mean by undocumented behavior of [ and [<-. I
    think all I'm relying on is that as long as an outstanding reference
    to the result vector exists, that R has to make sure the reference
    remains valid, and hence can't change the memory allocation of the
    result vector in any way. I don't care what else happens to the
    contents of the vector, as long as I get to control when it is
    released. It is ok with me if the user modifies the result vector
    in-place, since my reference stays valid. So I don't actually care
    how [ and [<- work.

    I think the only undocumented thing I'm relying on, is that the memory
    manager doesn't pay attention to the LENGTH of objects that it isn't
    actively doing anything to. Currently, it actually only uses LENGTH
    in one spot: for updating R_LargeVallocSize when a large vector is
    released. The true allocation sizes for individual objects are always
    kept in another place (either by malloc, or in the node class of the
    object).

    It seems like in this limited usage, SETLENGTH does represent a useful
    feature, by permitting safe over-allocation of a protected object, and
    might be worth preserving (and documenting) for that purpose.

    course, the real problem here is the semantics of textConnection(),
    which make life much more difficult and can't be changed because they
    are specified outside of R.
    -- Dave

    R-devel (AT) r-project (DOT) org mailing list
  • No.5 | | 4537 bytes | |

    Mon, 5 Sep 2005, dhinds (AT) sonic (DOT) net wrote:

    Luke Tierney <luke (AT) stat (DOT) uiowa.eduwrote:
    >
    >It might or might not work now but is not guaranteed to do so reliably
    >in the future. Seeing the risks of leaving SETLENGTH exposed, it is
    >very likely that SETLENGTH will be removed from the sources after the
    >2.2.0 release.
    >
    >If you provide your own methods to read and write the external pointer
    >then you don' need this; this is safer than relying on undocumented
    >behavior of [ and [<- in any case. You also then don't need to use
    >R_P unless you really need to use it from the C level
    >outside of a context where an R reference exists.
    >

    I'm not sure I follow this. Maybe I should explain the context for
    the problem.

    textConnection("xyz", "w") creates a connection, the output of which
    is deposited in a char vector named "xyz", which is updated line by
    line as output is sent to the connection. The current code maintains
    a pointer to "xyz" in the form of an unprotected SEXP. Hence if the
    user does rm(xyz), bad things happen. A small bug, I admit.

    I think the best fix is to use a protected reference to the result
    vector. I think this is safe and doesn't rely on any abuse of the
    interfaces.

    There's also a performance issue, that the result is updated after
    every line of output, resulting in a vast amount of copying if a large
    result is accumulated. This is the part that could be fixed by using
    SETLENGTH to manage the length of the protected result vector.

    I'm not sure what you mean by undocumented behavior of [ and [<-. I
    think all I'm relying on is that as long as an outstanding reference
    to the result vector exists, that R has to make sure the reference
    remains valid, and hence can't change the memory allocation of the
    result vector in any way. I don't care what else happens to the
    contents of the vector, as long as I get to control when it is
    released. It is ok with me if the user modifies the result vector
    in-place, since my reference stays valid. So I don't actually care
    how [ and [<- work.

    It would have helped to explain what you are up to. I had to guess
    and guessed wrong, so forget the [ and [<- issue for now.

    I think the only undocumented thing I'm relying on, is that the memory
    manager doesn't pay attention to the LENGTH of objects that it isn't
    actively doing anything to. Currently, it actually only uses LENGTH
    in one spot: for updating R_LargeVallocSize when a large vector is
    released. The true allocation sizes for individual objects are always
    kept in another place (either by malloc, or in the node class of the
    object).

    It seems like in this limited usage, SETLENGTH does represent a useful
    feature, by permitting safe over-allocation of a protected object, and
    might be worth preserving (and documenting) for that purpose.

    I am not comfortable making this available at this point. It might be
    useful to have but would need careful thought. Without some way to
    find out the true length there are potential problems. Without some
    way of making sure the fields in VECSXP and STRSXP that are added are
    valid there are potential problems (not the first time but if the size
    is shrunk and then increased). Not that this can't be resolved but it
    would take time that I don't have now, and this isn't high priority
    enough to schedule in the near future. So for now you should not use
    SETLENGTH if you want your code to work beyond 2.2.0.

    course, the real problem here is the semantics of textConnection(),
    which make life much more difficult and can't be changed because they
    are specified outside of R.

    It may be possible to expand the semantics by adding a logical
    argument that controls whether the vector is to be over-allocated and
    filled with zero length strings and truncated to the true length on
    close. Another variant would be to have a logical argument that says
    to keep the input internally and provide a function, say
    textC, to retrieve the internal output. I would then
    use a linked list internally. The semantics of close complicate this
    a bit; this function would probably need to optionally close the
    connection to get a final complete line.

    luke
  • No.6 | | 1474 bytes | |

    Luke Tierney <luke (AT) stat (DOT) uiowa.eduwrote:

    I am not comfortable making this available at this point. It might be
    useful to have but would need careful thought. Without some way to
    find out the true length there are potential problems. Without some
    way of making sure the fields in VECSXP and STRSXP that are added are
    valid there are potential problems (not the first time but if the size
    is shrunk and then increased). Not that this can't be resolved but it
    would take time that I don't have now, and this isn't high priority
    enough to schedule in the near future. So for now you should not use
    SETLENGTH if you want your code to work beyond 2.2.0.

    , that's fine given the lack of other valid uses of SETLENGTH, it
    doesn't seem worth preserving it just for this one debatable usage.

    It may be possible to expand the semantics by adding a logical
    argument that controls whether the vector is to be over-allocated and
    filled with zero length strings and truncated to the true length on
    close. Another variant would be to have a logical argument that says
    to keep the input internally and provide a function, say
    textC, to retrieve the internal output.

    These are possible or optionally just don't reveal the intermediate
    output at all, and just make the final result visible on close
    -- Dave

    R-devel (AT) r-project (DOT) org mailing list

Re: A memory management question


max 4000 letters.
Your nickname that display:
In order to stop the spam: 8 + 7 =
QUESTION ON "Development"

EMSDN.COM