BSD

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • sets compression

    3 answers - 472 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    When a new release comes out, I generally build a custom CD image
    containing i386, sparc, sparc64, and source. I've been noticing that
    my images are getting bigger, which is to be expected. My latest image
    is 704,512,000 bytes. This is more then the maximum size of a
    traditional CD and is getting near the limit for many older CD drives.
    I'm wondering if we should switch to bzip2 for compressing sets in
    order to reduce image size?
  • No.1 | | 316 bytes | |

    Sun, Nov 06, 2005 at 12:55:17PM -0800, John Nemeth wrote:
    I'm wondering if we should switch to bzip2 for compressing sets in
    order to reduce image size?

    rzip (which we c/should import in that case), which uses the bz2
    library and adds some stuff to do much longer-range compression
    sorting.
  • No.2 | | 2293 bytes | |

    Mon, Nov 07, 2005 at 12:06:16PM +1100, Simon Burge wrote:
    rzip can't decompress to stdout, making it harder to use for extracting
    sets. We'd need to have separate "decompress the set" and "extract the
    set" stages.

    Huh. I knew about the compress stage, I must admit I hadn't gotten
    around to that this applied to
    decompression, too. I've been using it to compress .iso files so far,
    and this hadn't been an issue. Bummer (for this usage).

    I agree that we need streaming decompression for installs, so never
    mind about my suggestion.

    It also uses _much_ more memory on compression - an otherwise idle box
    here with 256MB of RAM started swapping when trying to rzip a recent
    i386 comp set. After 8 minutes it had made a 50 byte output file, while
    a similar speed box with 2GB of RAM took 45 seconds to rzip the same
    file.

    It uses more memory, certainly, but don't read too much into the
    initial data production rate. It's quite variable, and seems to go in
    several stages. You probably hadn't gotten past the initial
    rsync-like mapping.

    It also seems to be heavily disk/seek bound rather than cpu bound, for
    the bulk of the compression work once the initial mapping is done.
    Unsurprisingly, it would appear to be selectively reading a big memory
    buffer of blocks from all over the file, and then compressing that
    quite quickly and writing it out before reading again.

    This makes for some new and different tradeoffs with respect to
    multiple jobs or other processes on a machine.

    rzip however does produce a smaller compressed file in this case:

    15104 -rw-r 1 simonb simonb - 15451684 Nov 7 11:57 comp.tar.rz
    18680 -rw-r 1 simonb simonb - 19097153 Nov 7 11:56 comp.tar.bz2
    23016 -rw-r 1 simonb simonb - 23542910 24 23:00 comp.tar.gz
    82992 -rw-r 1 simonb simonb - 84920320 Nov 7 11:55 comp.tar

    Yes. I've seen considerably better results than this, for some files,
    too. It certainly gives size benefits in return for its other
    constraints.

    Personally, I think we should stick with gzip for sets, but maybe we
    have an option for using bzip2 so people who want to use it locally can?

    That would be good.
  • No.3 | | 440 bytes | |

    Mon, Nov 07, 2005 at 12:06:16PM +1100, Simon Burge wrote:
    Personally, I think we should stick with gzip for sets, but maybe we
    have an option for using bzip2 so people who want to use it locally can?

    Could the binary sets be compressed with gzip and source sets with bzip2?
    If somebody wants to spend time on compiling from source, he could also
    accept the overhead of bzip2 decompression?

    Pavel Cahyna

Re: sets compression


max 4000 letters.
Your nickname that display:
In order to stop the spam: 8 + 7 =
QUESTION ON "BSD"

EMSDN.COM