Perl

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • beg for Bag

    7 answers - 1656 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    H,
    as a spin-off of the 'Set-returning .keys (was Re: Smart Matching
    clarification)' thread I want to propose the addition of a Bag
    type that completes the set of immutable types. It shall have the
    following properties.
    1) It is a multiset generalization of Set
    2) It is a supertype of Set and Seq (a Set can of course be build
    from a Seq). That is 'Set does Bag' and 'Seq does Bag'. Note
    that a Seq is a ready-made Bag and if it happens to have no
    duplicates it behaves like a Set.
    3) It has set operations as generalizations of the Set operations
    4) It provides some Bag specific ops like (+) that return a Bag
    even when called with Sets
    5) It provides the iteration interface (which in turn is applicable
    to the subtypes Set and Seq, of course)
    6) %hash.values returns a Bag
    7) %hash.keys returns a Set
    The wording in section 'Immutable Types' in S06 concerning Set as
    "Unordered Seqs that allow no duplicates" is a bit misleading because
    it hints as Set being a subtype of Seq. The Mapping could be explained
    as Set of Pair.
    The Bag type could be implemented in a module but then we need a
    way to do supertyping. The added Bag type would need to add a
    multiplicity accessor that returns 1 to the Set implementation and
    add lazy multiplicity counting to Seq. And I don't know how Hash::values
    would be augmented to return a Bag. This Bag return type would guarantee
    %h1.values %h2.values to yield true when the two hashes happen to
    return their values in different orders.
    Comments?
    --
  • No.1 | | 168 bytes | |

    TSa writes:
    I want to propose the addition of a Bag type
    Different from the C<Bagthat's already mentioned in Synopsis 3?
    Smylers
  • No.2 | | 755 bytes | |

    At 7:08 PM +0000 11/28/06, Smylers wrote:
    >TSa writes:
    >I want to propose the addition of a Bag type
    >
    >Different from the C<Bagthat's already mentioned in Synopsis 3?
    >Smylers


    TSa wasn't the first person to ask for an explicit Bag type. I did
    too, a few weeks ago. And one reason for that was exactly what you
    mention. Various other parts of the Synopsis documents mention Bag
    in examples and such, but the list of built-in types in Synopsis 6
    does not include it. In my mind, unless Bag appears in the Synopsis
    6 list, its references elsewhere count as nothing more than an
    example stand-in for some arbitrary user-defined type. -- Darren
    Duncan
  • No.3 | | 568 bytes | |

    TSa wrote:
    1) It is a multiset generalization of Set
    2) It is a supertype of Set and Seq (a Set can of course be build
    from a Seq). That is 'Set does Bag' and 'Seq does Bag'. Note
    that a Seq is a ready-made Bag and if it happens to have no
    duplicates it behaves like a Set.
    3) It has set operations as generalizations of the Set operations

    Note that this would mean that Seq would also have set operations.

    4) It provides some Bag specific ops like (+) that return a Bag
    even when called with Sets

    or Seqs.
  • No.4 | | 1041 bytes | |

    H,

    Jonathan Lang wrote:
    Note that this would mean that Seq would also have set operations.

    I count this as an advantage. So one can write (1,2,3) (|) (2,2,3,4,4)
    to get a result of (1,2,2,3,4,4). As long as the Seq is a Set, that is
    it has no duplicates, you get Set behavior through the Bag ops:
    (1,2,3) (|) (2,3,4) (1,2,3,4); (1,2,3) (&) (2,3,4) (2,3).

    BTW, the set/bag operations are not yet mentioned in S03 as new
    operators. Here's a list what I think they should be:

    (|) union
    (&) intersection
    (^) symmetric difference
    (/) disjoint union?
    (!) complement, this is difficult because you need the surrounding set
    (-) difference
    (+) join, returns a bag
    (*) cartesian product
    (**) powerset
    (in) membership
    (!in) negated membership
    (<) proper subset
    (>) proper superset
    (<=) subset
    (>=) superset
    (=) equality, also with
    (!=) inequality, also with !

    Did I forget something?

    Regards, TSa.
    --
  • No.5 | | 3781 bytes | |

    TSa wrote:
    Jonathan Lang wrote:
    Note that this would mean that Seq would also have set operations.

    I count this as an advantage. So one can write (1,2,3) (|) (2,2,3,4,4)
    to get a result of (1,2,2,3,4,4). As long as the Seq is a Set, that is
    it has no duplicates, you get Set behavior through the Bag ops:
    (1,2,3) (|) (2,3,4) (1,2,3,4); (1,2,3) (&) (2,3,4) (2,3).

    Would (1,2,2,3,4,4) be a Seq or a Bag? IMH, the _only_ way this
    could work would be if it's a Bag: if it's a Seq, I see no way that
    one could resolve '(1,2,3) (3,1,2)'.

    Mind you, I'm still not sold on the idea of performing set operations
    on Seqs - it may be technically feasible to do so, but it strikes me
    as fundamentally unintuitive.

    BTW, the set/bag operations are not yet mentioned in S03 as new
    operators. Here's a list what I think they should be:

    (|) union
    (&) intersection
    (^) symmetric difference
    (/) disjoint union?
    (!) complement, this is difficult because you need the surrounding set
    (-) difference
    (+) join, returns a bag
    (*) cartesian product
    (**) powerset
    (in) membership
    (!in) negated membership
    (<) proper subset
    (>) proper superset
    (<=) subset
    (>=) superset
    (=) equality, also with
    (!=) inequality, also with !

    Initial thought: overkill. Several of these operations (e.g., the
    Cartesian product) are obscure and only of interest to mathematicians.
    This isn't a reason to exclude them; but a non-mathematician should
    not be made to feel like he needs to get a math degree in order to use
    sets. ( a tangent, he also shouldn't be made to feel like he has to
    learn Type Theory in order to use Perl6's type system.)

    I'm still bothered by the idea that you have to wrap _every_ ASCII
    representative of a set operation in parentheses - something which is
    only necessary when you start applying the full range of set
    operations to non-Set entities. In particular, I want 'Set - Set' to
    produce the difference of the two Sets.

    Setting aside the issue of the notation to be used, there are several
    concerns that I have with this:

    If set operations also apply to Seq, then (=) is not the same as
    The former ignores the order of the terms; the latter only does so for
    Sets and Bags. In a way, this is what started the whole debate.

    You mention a single "disjoint union" operator: is it supposed to be
    the "disjoint union" comparison operator (i.e., it returns true if the
    sets are disjoint), or the "disjoint union" composition operator
    (which returns a Set of Pairs, with each element being keyed according
    to the Set that it was originally in)?

    Saying that "complement" is difficult is an understatement. I suppose
    you _could_ get it to work by having a 'complemented Set' would keep
    track of which elements it _doesn't_ have; but this opens a can of
    worms that I really don't think we want to get into (e.g., "A
    (!)B"). And the notion of a complement with regard to the surrounding
    set is already handled by the difference operator.

    A Cartesian product would return a Set (or Bag, depending on the left
    term) of Pairs, keyed by the elements of the left term.

    A powerset strikes me as something that you'd want to do as a 0-ary
    method, rather than as an operator.

    Did I forget something?

    You did highlight some things - for instance, a Set of Pairs is _not_
    a Hash: a Hash has a further requirement that every key must be
    unique, whereas a Set of Pairs allows for duplicate keys (but not
    duplicate key-value pairs; go with a Bag of Pairs for that).
  • No.6 | | 4657 bytes | |

    H,

    Jonathan Lang wrote:
    Would (1,2,2,3,4,4) be a Seq or a Bag?

    Comma constructs a Seq, of course.

    IMH, the _only_ way this
    could work would be if it's a Bag: if it's a Seq, I see no way that
    one could resolve '(1,2,3) (3,1,2)'.

    This is not any different from '3' + '4' resulting in numeric 7.
    It is the operator that builds Bags from (1,2,3) and (3,1,2) and
    then calculates the union.

    Mind you, I'm still not sold on the idea of performing set operations
    on Seqs - it may be technically feasible to do so, but it strikes me
    as fundamentally unintuitive.

    But you have no problem with doing numeric stuff? E.g. (1,2,3) + (3,4)
    actually means +(1,2,3) + +(3,4) == 3 + 2 == 5. It is the operator that
    indicates Set/Bag operations. And 'Seq does Bag' comes in handy here.
    Actually, no set operation is performed on Seq. The Seq is just used to
    build a Bag. This is like there is no string concatenation for numbers,
    arrays etc. They are stringified first.

    could also write @a (|) @b and the two arrays would be flattened to
    Seq and then used to build a Bag. The only drawback with Bag being the
    Seq supertype is that people might want @a (|) @b to mean Set(@a) (|)
    Set(@b). The Set enforcement could also come after the Bag union:
    Set(@a (|) @b). Perhaps @a (|) @b delivers the result in an Array
    instead of a Bag value.

    So in general the parens ops should not be called set ops but unordered
    ops because this is their core meaning. They are overloaded for Bag and
    Set.

    I'm still bothered by the idea that you have to wrap _every_ ASCII
    representative of a set operation in parentheses - something which is
    only necessary when you start applying the full range of set
    operations to non-Set entities. In particular, I want 'Set - Set' to
    produce the difference of the two Sets.

    But that should be the numeric difference of the cardinalities of the
    Sets. Plain - means numeric. It's the same thing that we don't do string
    concatenation with + but have a stringish ~ for the task. Hence it works
    to say 3 ~ 4 and end up with '34'.

    If set operations also apply to Seq, then (=) is not the same as
    The former ignores the order of the terms; the latter only does so for
    Sets and Bags. In a way, this is what started the whole debate.

    You are correct. This emphasizes the need for (=).

    You mention a single "disjoint union" operator: is it supposed to be
    the "disjoint union" comparison operator (i.e., it returns true if the
    sets are disjoint), or the "disjoint union" composition operator
    (which returns a Set of Pairs, with each element being keyed according
    to the Set that it was originally in)?

    Perhaps we need both. I've chosen (/) for it's visual similarity with
    (|) like || and //. We could use (/?) for the disjointness test.
    doing visual games again: (%). You see the disjoint sets in the glyph.
    (:) might work, too.

    Saying that "complement" is difficult is an understatement. I suppose
    you _could_ get it to work by having a 'complemented Set' would keep
    track of which elements it _doesn't_ have; but this opens a can of
    worms that I really don't think we want to get into (e.g., "A
    (!)B"). And the notion of a complement with regard to the surrounding
    set is already handled by the difference operator.

    Isn't that similar in semantics to the none junction? If you have e.g.
    a Set of Int (1,2,3) you can express (!)(1,2,3) as Set(*0,4*).
    But I agree that it is difficult in general. Well, (!)$x might actually
    create a large set of almost all currently known objects in some scope.
    This might just be written as * (-) $x. IW, Set(*) means all known
    objects. As long as a program is not wielding large data sets complement
    might just work the complement of a large set is small, of course.

    Note that $a (-) $b is equivalent to $a (&) (!)$b, you have it as the
    union. did you mean $a (|) (!)$b (!)($b (-) $a)?

    Hmm, how is Bag complement defined in the first place? Is it just
    complementing the set and goes for multiplicities of 1?

    You did highlight some things - for instance, a Set of Pairs is _not_
    a Hash: a Hash has a further requirement that every key must be
    unique, whereas a Set of Pairs allows for duplicate keys (but not
    duplicate key-value pairs; go with a Bag of Pairs for that).

    Correct.

    Regards, TSa.
    --
  • No.7 | | 3901 bytes | |

    TSa wrote:
    Jonathan Lang wrote:
    Would (1,2,2,3,4,4) be a Seq or a Bag?

    Comma constructs a Seq, of course.

    The context of the question was that you provided the above as the
    result of unioning two Seqs; as such, I was trying to find out whether
    you meant that the union of two Seqs should be a Seq, or if that was
    an unintended artifact of the way you said it.

    IMH, the _only_ way this
    could work would be if it's a Bag: if it's a Seq, I see no way that
    one could resolve '(1,2,3) (3,1,2)'.

    This is not any different from '3' + '4' resulting in numeric 7.
    It is the operator that builds Bags from (1,2,3) and (3,1,2) and
    then calculates the union.

    IW, you're going with the notion that Seqs should be treated like
    Bags when you apply set operations to them - that is, the order of
    their elements becomes irrelevant.

    Mind you, I'm still not sold on the idea of performing set operations
    on Seqs - it may be technically feasible to do so, but it strikes me
    as fundamentally unintuitive.

    But you have no problem with doing numeric stuff? E.g. (1,2,3) + (3,4)
    actually means +(1,2,3) + +(3,4) == 3 + 2 == 5.

    I didn't say that. For the record, I _do_ have a problem with saying
    that '(1,2,3) + (3,4)' should be equivalent to '+(1,2,3) + +(3,4)'.
    It's counter-DWIMish, and I'd rather see the compiler complain about
    the former than to do the latter when that's not what I wanted. As a
    general rule of thumb, I do _not_ want data types being coerced behind
    my back. '3' + '4' producing 7 should be an exception to the rule,
    not the basis for it; and the reason for the exception is that the
    fact that '3' is supposed to mean the same as 3 is not very hard to
    spot. Stating that (1,2,3) is supposed to mean the same as 3 is
    considerably more murky.

    I'm still bothered by the idea that you have to wrap _every_ ASCII
    representative of a set operation in parentheses - something which is
    only necessary when you start applying the full range of set
    operations to non-Set entities. In particular, I want 'Set - Set' to
    produce the difference of the two Sets.

    But that should be the numeric difference of the cardinalities of the
    Sets. Plain - means numeric. It's the same thing that we don't do string
    concatenation with + but have a stringish ~ for the task. Hence it works
    to say 3 ~ 4 and end up with '34'.

    Plain '-' means numeric when the terms are scalars - and possibly not
    even then. Perl 6 allows for operator overloading, and it does so
    because sometimes it makes more sense to have the same operator mean
    different things depending on what it's operating on. The interplay
    between Strings and Nums shouldn't be taken as indicative of how
    everything should work.

    If set operations also apply to Seq, then (=) is not the same as
    The former ignores the order of the terms; the latter only does so for
    Sets and Bags. In a way, this is what started the whole debate.

    You are correct. This emphasizes the need for (=).

    or for allowing explicit conversion of Seqs to Bags or Sets, and
    demanding that it be done before set operations are permitted.

    Note that $a (-) $b is equivalent to $a (&) (!)$b, you have it as the
    union. did you mean $a (|) (!)$b (!)($b (-) $a)?

    No, I goofed and meant A !B when I said A !B.

    Hmm, how is Bag complement defined in the first place? Is it just
    complementing the set and goes for multiplicities of 1?

    Conceptually, it would be going for multiplicities of Inf. Even if
    you define Set complement, you probably don't want to define Bag
    complement.

Re: beg for Bag


max 4000 letters.
Your nickname that display:
In order to stop the spam: 6 + 5 =
QUESTION ON "Perl"

EMSDN.COM