Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • DB with list datatype?

    11 answers - 406 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi, I am looking for a small embeddable database that allows lists as
    datatype, similar to the internal database of a Prolog programming system.
    I.e. the data are organized in tables, so it is a relational database. But
    an element in a table can be a list of variable length, eg. a string list or
    a number list
    Any idea? Could SQlite do that with some magic tricky things?
  • No.1 | | 3063 bytes | |

    Andreas Kochenburger wrote:
    Hi, I am looking for a small embeddable database that allows lists as
    datatype

    Firebird is an example of an RDBMS that has a built-in datatype that
    more or less does what you describe. It has a datype that is stored
    like a BLB, but it is structured like a multi-dimensional array. There
    is an API that allows you to fetch "slices" of the array into
    application arrays. You can also reference individual elements of the
    array in SQL expressions.

    However, this is a proprietary extension, not standard SQL. Standard
    programming interfaces like JDBC, DBC, Perl DBI, PHP PD, etc. have no
    clue how to fetch array slices and return them to the calling
    application. You'd have to do all your programming in C, and learn
    Firebird's proprietary API for fetching array slices.

    Anyway, it's generally considered a bad practice to store multiple
    values in one column. It breaks rules of normalization. (NB: this is
    true for traditional RDBMS, but not necessarily for experimental
    technology such as "network databases").

    Normalization says that if you have multiple values associated with an
    entity, you should create a dependent table. Each value in the list
    should go in a separate row of the additional table. These rows also
    include a foreign key referencing the row in the primary table with
    which the list is associated.

    Some people resist the database normalization guideline above, and store
    lists as a single column, containing long strings with comma-separated
    values.

    The disadvantages of using this list-in-a-string design include:
    - The list has a hard maximum length limit, because the column itself
    must have a size limit. But it's hard to predict how many values you
    can put in the list, if values are of variable length themselves.
    - You can't easily use the list-separator character as part of the
    values in the list.
    - You need to fetch the list data into your application to do many
    calculations on it, e.g. counting the number of values in the list,
    calculating the minimum/maximum/average value, etc.
    - You need to fetch the entire list even if you're only interested in a
    small portion of it.
    - Inserting or deleting a value from the list requires fetching the
    list, altering it, and then updating it. Keeping the list in sorted
    order is additional work.
    - No way to use referential integrity, if the values in the list need to
    reference other database entities.
    - No way to ensure values in the list are valid. What would prevent
    someone from entering "1,2,3,banana,5" into a column that is supposed to
    list integers?
    - No way to associate additional information with each element of the list.

    Any primitive "list" datatype in a database should account for these
    disadvantages. That would be difficult. Even Firebird's arrays
    implementation fails on several of these points.

    Regards,
    Bill K.
  • No.2 | | 1272 bytes | |


    "Bill Karwin" <bill@karwin.comwrote in message
    news:ed7to102968@enews2.newsguy.com
    Andreas Kochenburger wrote:
    Hi, I am looking for a small embeddable database that allows lists as
    datatype

    Firebird is an example of an RDBMS that has a built-in datatype that
    more or less does what you describe. It has a datype that is stored
    like a BLB, but it is structured like a multi-dimensional array. There
    is an API that allows you to fetch "slices" of the array into
    application arrays. You can also reference individual elements of the
    array in SQL expressions.

    Unless I miss my guess, Firebird's BLBS are very much like DEC Rdb/VMS's
    BLBS.

    DEC Rdb/VMS still survives as /Rdb.

    If I'm right in the above, then the "slices" you refer to are "segments",
    and they are physical components of the BLBS rather than logical
    components. You could certainly store a list in a BLB if you wanted to.
    But then again you could store anything else, also.

    My question is this:

    Why not split out the array of lists into a table of its own, and reference
    the lists via a FK? In short, what can you do with lists that you can't do
    just as conveniently with tables?

  • No.3 | | 1161 bytes | |

    David Cressey wrote:
    If I'm right in the above, then the "slices" you refer to are "segments",
    and they are physical components of the BLBS rather than logical
    components.

    No, Firebird supports "array" as a distinct datatype. Arrays are
    logical datatypes in Firebird. They are not the same thing as physical
    blob segments.

    Internally, arrays are implemented as blobs, but there are SQL
    extensions to declare logical arrays with multiple dimensions. There is
    also expression syntax for referencing single elements or "slices" which
    are sub-arrays (e.g. "elements 4 through 6").

    I'm pretty confident I know what I'm talking about in this case. I
    worked for InterBase for five years. Firebird is based on a forked
    source tree of InterBase 6.0 beta.

    Why not split out the array of lists into a table of its own, and reference
    the lists via a FK? In short, what can you do with lists that you can't do
    just as conveniently with tables?

    Right; using a normalized table structure is a better solution. That
    was the point of my post.

    Regards,
    Bill K.
  • No.4 | | 1321 bytes | |


    "Bill Karwin" <bill@karwin.comwrote in message
    news:edlp1g01u5g@enews2.newsguy.com
    David Cressey wrote:
    If I'm right in the above, then the "slices" you refer to are
    "segments",
    and they are physical components of the BLBS rather than logical
    components.

    No, Firebird supports "array" as a distinct datatype. Arrays are
    logical datatypes in Firebird. They are not the same thing as physical
    blob segments.

    Internally, arrays are implemented as blobs, but there are SQL
    extensions to declare logical arrays with multiple dimensions. There is
    also expression syntax for referencing single elements or "slices" which
    are sub-arrays (e.g. "elements 4 through 6").

    Thanks for clearing that up. I've never worked with Firebird or Interbase.
    I downloaded Firebird
    about a month ago, but I haven't gotten around to fooling around with it.
    There are some ways in which it's reminiscent of Rdb. When I mentioned
    BLBS, I figured they were like Rdb BLBS.

    Right; using a normalized table structure is a better solution. That
    was the point of my post.

    Agreed. While I don't take normalization as some sort of holy grail, there
    are some rpoblems it deals with neatly. This appears to be one of them.

  • No.5 | | 305 bytes | |

    David Cressey wrote:
    There are some ways in which it's reminiscent of Rdb. When I mentioned
    BLBS, I figured they were like Rdb BLBS.

    It wouldn't be surprising, because the same software architect developed
    both RDB/ELN and InterBase! :-)

    Regards,
    Bill K.
  • No.6 | | 851 bytes | |

    Andreas Kochenburger wrote:
    Hi, I am looking for a small embeddable database that allows lists as
    datatype, similar to the internal database of a Prolog programming system.
    I.e. the data are organized in tables, so it is a relational database. But
    an element in a table can be a list of variable length, eg. a string list or
    a number list
    Any idea? Could SQlite do that with some magic tricky things?

    IBM U2 databases (UniVerse and UniData)
    are the leaders in MultiValue category
    of databases
    open source product with the same MultiValue model is QM
    are several other MultiValue databases including jBASE and
    Revelation
    ' from InterSystems is another possibility, supporting the
    Multivalue Model and the multi-dimensional MUMPS model (in addition to
    SQL)

    best wishes.

  • No.7 | | 676 bytes | |

    Andreas Kochenburger wrote:

    Hi, I am looking for a small embeddable database that allows lists as
    datatype, similar to the internal database of a Prolog programming system.
    I.e. the data are organized in tables, so it is a relational database. But
    an element in a table can be a list of variable length, eg. a string list
    or a number list
    Any idea? Could SQlite do that with some magic tricky things?

    PostgreSQL has an array data type and several language extensions to assist
    in using them.

    However, I would echo all of Bill's warning, with this conclusion about
    putting lists into columns: "Abandon hope all ye who enter here."
  • No.8 | | 516 bytes | |

    However, I would echo all of Bill's warning, with this conclusion about
    putting lists into columns: "Abandon hope all ye who enter here."

    Thanks to all for your responses.

    However Prolog is a powerful programming language and command environment,
    and one of its key features is indeed the built-in database which can
    "digest" lists. I say: "Have hope all ye who enter here!"

    It works! ;-))
    I am just looking for such a database without the other Prolog stuff.

  • No.9 | | 1299 bytes | |

    Andreas Kochenburger wrote:
    However, I would echo all of Bill's warning, with this conclusion about
    putting lists into columns: "Abandon hope all ye who enter here."

    Thanks to all for your responses.

    However Prolog is a powerful programming language and command environment,
    and one of its key features is indeed the built-in database which can
    "digest" lists. I say: "Have hope all ye who enter here!"

    It works! ;-))

    Much to the chagrin of many relational proponents, eh?

    I am just looking for such a database without the other Prolog stuff.

    You might be interested in reading the new IDC white paper entitled
    "Because Not All Data is Flat: IBM's U2 Extended Relational DBMSs".
    (I'm not sure why he opted to use the oft inflammatory "flat" word
    except that the subject will trip emotions whether he uses such terms
    or not, it seems.)

    This paper is posted on the right had side of the U2 home page:
    and I think the URL is

    You can see that the U2 products are positioned along with other
    not-exactly-relational dbms's as an "embedded" database. However, many
    companies use such RDBMS's for enterprise systems, so this is a bit of
    a misleading term.

    Cheers!

  • No.10 | | 1894 bytes | |

    dawn wrote:
    You might be interested in reading the new IDC white paper entitled
    "Because Not All Data is Flat: IBM's U2 Extended Relational DBMSs".
    (I'm not sure why he opted to use the oft inflammatory "flat" word

    -- I thought you were going to say that the inflammatory wording is
    "data is"! <g>

    , I've read the white paper. U2 provides physical storage cohesion
    for "multivalued fields" and "nested tables". That is, related data is
    stored close together.

    This is based on a DBMS architecture that dates back to products such as
    Pick, UniData, and VMark. These offered multivalued and
    multidimensional data modeling. The paper claims that this achieves
    greater performance and scalability than relational systems, and
    represents the data in a model that is easier to understand.

    However, it is not clear from the white paper that the performance
    improvement of the multidimensional data model is relevant today as it
    was 10, 15, or 20 years ago. Processing power, quantity of high-speed
    RAM, and even speed of disk devices is orders of magnitude higher than
    it was back then. For instance, decreasing the need for disk seeks
    doesn't provide as much benefit, when a portions of data and indexes are
    cached in random-access memory. Today's hardware resources can, to some
    extent, change where the "bottleneck" is in data retrieval systems.

    I would be surprised if the performance advantage of multidimensional
    systems over relational systems is significant, given current hardware
    resources. The IBM white paper fails to give any quantitative measure
    to show this advantage.

    I'm assuming that both multidimensional and relational systems require
    "tuning" so that they make best use of the resources available.

    Regards,
    Bill K.
  • No.11 | | 4469 bytes | |


    Bill Karwin wrote:
    dawn wrote:
    You might be interested in reading the new IDC white paper entitled
    "Because Not All Data is Flat: IBM's U2 Extended Relational DBMSs".
    (I'm not sure why he opted to use the oft inflammatory "flat" word

    -- I thought you were going to say that the inflammatory wording is
    "data is"! <g>

    As the daughter of a linguist and a grammar school teacher, it took a
    long time before I gave up and conformed on that one (at least some of
    the time). Language does evolve, and it seems that "data" is now used
    both as singular for a collection and plural. Because he is referring
    to the shape of a collection in the title, he is using the singular.
    course, I do understand if you wish to hold on to a purer approach.

    , I've read the white paper. U2 provides physical storage cohesion
    for "multivalued fields" and "nested tables". That is, related data is
    stored close together.

    This is based on a DBMS architecture that dates back to products such as
    Pick, UniData, and VMark. These offered multivalued and
    multidimensional data modeling. The paper claims that this achieves
    greater performance and scalability than relational systems, and
    represents the data in a model that is easier to understand.

    However, it is not clear from the white paper that the performance
    improvement of the multidimensional data model is relevant today as it
    was 10, 15, or 20 years ago.

    I agree. Unless you are working with a single processor, this is less
    relevant today.

    Processing power, quantity of high-speed
    RAM, and even speed of disk devices is orders of magnitude higher than
    it was back then. For instance, decreasing the need for disk seeks
    doesn't provide as much benefit, when a portions of data and indexes are
    cached in random-access memory. Today's hardware resources can, to some
    extent, change where the "bottleneck" is in data retrieval systems.

    I would be surprised if the performance advantage of multidimensional
    systems over relational systems is significant, given current hardware
    resources. The IBM white paper fails to give any quantitative measure
    to show this advantage.

    I was surprised they did not give some of the benchmarks too. I can
    find this one to quote. From an IBM employee in Nov '05

    "This last weekend, we attained a new high number of users on a U2
    database
    of 15,200 on a single system running an application!

    Principle Consultant of the U2 Lab Services group, performed
    this formal benchmark at one of the IBM benchmark centers on an
    IBM p590
    + 64 dual core CPUs for a total of 128 processors
    + 124gb memory (no, that's not a typo - that's all the benchmark center
    had available - odd number though)
    + 34 disk drives, striped with JFS2 on two FAStT controllers

    The database had over 1 billion records in it."

    I'm assuming that both multidimensional and relational systems require
    "tuning" so that they make best use of the resources available.

    There are tuning techniques, but many sites I know surprisingly employ
    few of them. Some of the biggest performance gains can come from
    distributing files wisely on disks (said loosely).

    >From my perspective, the huge gains are in people performance. I have

    worked with teams working on U2 and on relational/SQL-DBMS's. My
    anecdotal experience is that the "Pick" developers developed and
    maintained software with much higher productivity.

    Since I believed the claims of relational theory, I started researching
    it myself, getting some blog entries in earlier this year at
    www.tincat-group.com/mewsings (starting with the "Is Codd Dead?"
    entry).

    Given what I have read to date and what I have seen in practice, I
    think the industry would be well-served to move more toward the
    U2/MV/Pick approach with non-1NF data structures (such as multivalued
    attributes), two-valued logic (a point on which many relational
    proponents agree), variable length data as the norm, and possibly even
    descriptive rather than restrictive/prescriptive schema (with duck
    typing rather than strong typing).

    Given that I cared about language at the start of this response, I
    apologize for that last sentence. Cheers!

Re: DB with list datatype?


max 4000 letters.
Your nickname that display:
In order to stop the spam: 7 + 7 =
QUESTION ON "Databases"

EMSDN.COM