Standards

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • A precedent suggesting a compromise for the SWHCLS IG Best Practices (ARK)

    11 answers - 5303 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi All,
    I frequently see genes, transcripts, dna and mrna and their
    sequences, proteins, protein sequences, transcripts, and peptides
    all confusedly identified by overlapping identifiers. I don't
    see how
    any identifier scheme, in itself, lsid's included, currently fixes
    this problem. It is this problem that I personally want to see
    progress on.
    You're correct here but it is the state of the art. Interestingly
    enough, I've found that in general the biology-based scientists and
    investigators are not all that bothered by this confusion and despite
    the confusion seem to make their way through it.
    cheers,
    Michael
    Michael Miller
    Lead Software Developer
    Rosetta Biosoftware Business Unit
    www.rosettabio.com
    Message
    From: public-semweb-lifesci-request (AT) w3 (DOT) org
    [@w3.org] Behalf
    Alan Ruttenberg
    Sent: Sunday, July 30, 2006 7:08 PM
    To: Mark Wilkinson
    Cc: Alan Ruttenberg; public-semweb-lifesci (AT) w3 (DOT) org;
    noah_mendelsohn (AT) us (DOT) ibm.com; Sean Martin; Henry S. Thompson;
    Phillip Lord; www-tag (AT) w3 (DOT) org; Dan Connolly
    Subject: Re: A precedent suggesting a compromise for the
    SWHCLS IG Best Practices (ARK)
    Excellent response! I 95% heartedly agree (all but the "I stand by
    LSIDS part" :)
    I will note however that whenever there are versions of something,
    there tends to some concept of the thing that they are versions of.
    So even though there are versions of the sequence, there ought to
    still be some thing which represents the thing that all the versions
    are of.
    Back to your point, is there anyone out there who has minted LSIDs
    for genes and for the sequences distinctly and related them? Do the
    gene LSIDs ever get versions? Do the sequence LSIDs ever not have
    versions? When there are different authorities for the genes and
    sequences, what are the relations that people use to relate them?
    Let's put these examples on the table.
    If any one has done this in the context of NCBI databases in
    particular I think it would be helpful to share the specifics of how
    these ids were used and conceptualized.
    My experience has been that there is routine confusion of the sort
    that you describe throughout the life sciences community and that
    this bleeds into the discussion of identifiers (as it just did,
    though I have to admit I was baiting for exactly this discussion :)
    I frequently see genes, transcripts, dna and mrna and their
    sequences, proteins, protein sequences, transcripts, and peptides
    all confusedly identified by overlapping identifiers. I don't
    see how
    any identifier scheme, in itself, lsid's included, currently fixes
    this problem. It is this problem that I personally want to see
    progress on.
    LSID's contract seems more to do with persistence, mutability,
    cacheability, and discoverability of byte sequences - not around
    issues of the identifiers and their relations making
    ontological sense.
    While I understand that in some contexts the issues around data
    management are central, they aren't in all contexts. Because I think
    that optimization of the data management issues, while in some ways
    elegantly handled by the LSID protocol, aren't central to the issue
    of representation in the life sciences, and because I don't see LSID
    addressing the representation issues, I worry that imposing the use
    of the LSID protocol puts a burden on all, for the benefit of
    relatively few. And for those relatively few who are going
    to go out
    of their way to have internal copies of data and the like, I don't
    see why a custom system that is circumvents http for efficiency
    reasons is too much of a burden.
    How do you see things otherwise?
    -Alan
    (Being deliberately provocative here - my assigned role in this
    debate :)
    Jul 30, 2006, at 9:06 PM, Mark Wilkinson wrote:
    Sun, 30 Jul 2006 16:46:21 -0700, Alan Ruttenberg
    <alanruttenberg (AT) gmail (DOT) comwrote:
    I may be speaking out-of-turn here, and should probably let Sean
    answer this one since he may have (no doubt) thought-through it
    more deeply than I have; however I think you may be mixing up
    several different entities here (as so often happens in a URL
    world ;-) )
    In the case you cite above you are likely talking about a "gene",
    not a "sequence". A "gene" will have its own LSID, and it
    is (even
    by the strict genetic definition) a conceptual entity defined by
    complementation. A "gene" and its "sequence" are not the same
    thing! So I don't see a problem. When you need to
    refer to the
    gene in the abstract, you can refer to the gene's LSID. When you
    need to talk about a concrete sequence, you refer to *it's* LSID.
    The metadata of the gene will (in a sensible world) include
    triples
    that describe its possible sequences, and these will have versions.
    Genes have many many many properties, so we cannot munge them all
    into "sequence". Certainly, this is how we are modelling our data
    locally
    I stand by LSID's :-)
    Mark
  • No.1 | | 549 bytes | |

    Quoting "Miller, Michael D (Rosetta)" <Michael_Miller (AT) Rosettabio (DOT) com>:

    You're correct here but it is the state of the art. Interestingly
    enough, I've found that in general the biology-based scientists and
    investigators are not all that bothered by this confusion and despite
    the confusion seem to make their way through it.

    The problem is that semantic web is intended to make machine to
    understand. And
    the clarity is a prerequisite to instruct machine unambigously.

    Xiaoshu
  • No.2 | | 549 bytes | |

    Quoting "Miller, Michael D (Rosetta)" <Michael_Miller (AT) Rosettabio (DOT) com>:

    You're correct here but it is the state of the art. Interestingly
    enough, I've found that in general the biology-based scientists and
    investigators are not all that bothered by this confusion and despite
    the confusion seem to make their way through it.

    The problem is that semantic web is intended to make machine to
    understand. And
    the clarity is a prerequisite to instruct machine unambigously.

    Xiaoshu
  • No.3 | | 1188 bytes | |

    [wangxiao (AT) musc (DOT) edu]

    Quoting "Miller, Michael D (Rosetta)" <Michael_Miller (AT) Rosettabio (DOT) com>:

    You're correct here but it is the state of the art. Interestingly
    enough, I've found that in general the biology-based scientists and
    investigators are not all that bothered by this confusion and despite
    the confusion seem to make their way through it.

    The problem is that semantic web is intended to make machine to
    understand. And
    the clarity is a prerequisite to instruct machine unambigously.

    Naming genes is an interesting case where proper names shade into
    generic names. However, I think on balance genes tend to have so many
    idiosyncratic properties that their names are never going to fit into
    a systematic naming scheme very well. But remember, the key
    contribution of the semantic-web methodology is to use URIs as names
    period. So long as a URI means only one gene, and everyone agrees
    what gene it means, there is no ambiguity problem. It's also a good
    idea to avoid having more than one name for a gene, but multiple names
    do not constitute ambiguity, merely inefficiency.
  • No.4 | | 2304 bytes | |

    Drew McDermott wrote:


    >
    >>[wangxiao (AT) musc (DOT) edu]
    >>
    >>Quoting "Miller, Michael D (Rosetta)" <Michael_Miller (AT) Rosettabio (DOT) com>:
    >>

    >
    >>

    You're correct here but it is the state of the art. Interestingly
    enough, I've found that in general the biology-based scientists and
    investigators are not all that bothered by this confusion and despite
    the confusion seem to make their way through it.


    >>The problem is that semantic web is intended to make machine to
    >>understand. And
    >>the clarity is a prerequisite to instruct machine unambigously.

    >
    >>

    >
    >Naming genes is an interesting case where proper names shade into
    >generic names. However, I think on balance genes tend to have so many
    >idiosyncratic properties that their names are never going to fit into
    >a systematic naming scheme very well. But remember, the key
    >contribution of the semantic-web methodology is to use URIs as names

    period. So long as a URI means only one gene, and everyone agrees
    >what gene it means, there is no ambiguity problem. It's also a good
    >idea to avoid having more than one name for a gene, but multiple names
    >do not constitute ambiguity, merely inefficiency.
    >


    Hi Drew et al.,

    I agree that gene names are interesting use cases for URI/LSID. In
    addition to synonyms (different terms may be used to refer to the same
    concept), we need to deal with homonyms (the same term may mean
    different things). As discussed in the BioRDF call yesterday, I promised
    to come up with some neuroscience examples for URI/LSID, here is one
    such example for looking up the definition of "spine" in wikipedia.
    Notice that this term has different meanings in different contexts
    (biological vs. anatomical). It looks like we might want to think about
    the possibility of providing such a context in LSID for disambiguation.

    (biology)

    (anatomy),

    Cheers,
    -Kei
  • No.5 | | 905 bytes | |

    Yes, indeed. Machine processing of information relies on
    consistent usage of terms. You can't reuse information for
    new problems when its use requires human intervention to disambiguate
    it.

    Tim Berners-Lee

    Aug 10, 2006, at 21:54, wangxiao (AT) musc (DOT) edu wrote:

    Quoting "Miller, Michael D (Rosetta)" <Michael_Miller (AT) Rosettabio (DOT) com>:
    >
    >You're correct here but it is the state of the art. Interestingly
    >enough, I've found that in general the biology-based scientists and
    >investigators are not all that bothered by this confusion and despite
    >the confusion seem to make their way through it.
    >

    The problem is that semantic web is intended to make machine to
    understand. And
    the clarity is a prerequisite to instruct machine unambigously.

    Xiaoshu
  • No.6 | | 905 bytes | |

    Yes, indeed. Machine processing of information relies on
    consistent usage of terms. You can't reuse information for
    new problems when its use requires human intervention to disambiguate
    it.

    Tim Berners-Lee

    Aug 10, 2006, at 21:54, wangxiao (AT) musc (DOT) edu wrote:

    Quoting "Miller, Michael D (Rosetta)" <Michael_Miller (AT) Rosettabio (DOT) com>:
    >
    >You're correct here but it is the state of the art. Interestingly
    >enough, I've found that in general the biology-based scientists and
    >investigators are not all that bothered by this confusion and despite
    >the confusion seem to make their way through it.
    >

    The problem is that semantic web is intended to make machine to
    understand. And
    the clarity is a prerequisite to instruct machine unambigously.

    Xiaoshu
  • No.7 | | 936 bytes | |

    Tim --
    At 10:54 AM 8/21/2006 -0400, you wrote:

    >Machine processing of information relies on
    >consistent usage of terms. You can't reuse information for
    >new problems when its use requires human intervention to disambiguate
    >it.


    But, perhaps ontologies can help in mapping various human usages to one
    another in such a way that the machines can then work uninterrupted?

    A little example in this direction that you can run in a browser is

    Does anyone have a pointer please to a similar task in biology / health
    sciences?

    Thanks, -- Adrian

    Internet Business Logic (R)
    Executable open vocabulary English
    at www.reengineeringllc.com
    Shared use is free

    Adrian Walker
    Reengineering
    P Box 1412
    Bristol
    CT 06011-1412 USA

    Phone: USA 860 583 9677
    Cell: USA 860 830 2085
    Fax: USA 860 314 1029
  • No.8 | | 2317 bytes | |

    I agree, consistent use of terms makes life easier for machines and for
    humans too when the terms have been agreed on, learned, and understood.
    Unfortunately, this takes a lot of effort and dedication from the
    humans. Learning a whole ontology before anything can be done is a bit
    like reading the whole manual of a DVD player before one can use that.
    And we all know that while there are people who actually read the whole
    manual, they are a minority.

    As a usability person I always like to see the machines support the
    humans as much as possible and not vice versa.
    In my view, new inventions often start from not so great terms and
    evolve stepwise as learning happens. terms are first shared and
    polished in small groups and later links are made between groups that
    may use different terminologies for similar things. If we want to
    support humans doing inventions I think we should support the use of
    different terms, their evolution, and making connections between similar
    terms when they are discovered as much as possible. And I think Semantic
    Web is great for that.

    Marja

    Tim Berners-Lee wrote:

    Yes, indeed. Machine processing of information relies on
    consistent usage of terms. You can't reuse information for
    new problems when its use requires human intervention to disambiguate
    it.

    Tim Berners-Lee

    Aug 10, 2006, at 21:54, wangxiao (AT) musc (DOT) edu wrote:
    >
    >>

    >Quoting "Miller, Michael D (Rosetta)" <Michael_Miller (AT) Rosettabio (DOT) com>:
    >>

    You're correct here but it is the state of the art. Interestingly
    enough, I've found that in general the biology-based scientists and
    investigators are not all that bothered by this confusion and despite
    the confusion seem to make their way through it.
    >>
    >>

    >The problem is that semantic web is intended to make machine to
    >understand. And
    >the clarity is a prerequisite to instruct machine unambigously.
    >>

    >Xiaoshu
    >>

    >
    >
  • No.9 | | 2317 bytes | |

    I agree, consistent use of terms makes life easier for machines and for
    humans too when the terms have been agreed on, learned, and understood.
    Unfortunately, this takes a lot of effort and dedication from the
    humans. Learning a whole ontology before anything can be done is a bit
    like reading the whole manual of a DVD player before one can use that.
    And we all know that while there are people who actually read the whole
    manual, they are a minority.

    As a usability person I always like to see the machines support the
    humans as much as possible and not vice versa.
    In my view, new inventions often start from not so great terms and
    evolve stepwise as learning happens. terms are first shared and
    polished in small groups and later links are made between groups that
    may use different terminologies for similar things. If we want to
    support humans doing inventions I think we should support the use of
    different terms, their evolution, and making connections between similar
    terms when they are discovered as much as possible. And I think Semantic
    Web is great for that.

    Marja

    Tim Berners-Lee wrote:

    Yes, indeed. Machine processing of information relies on
    consistent usage of terms. You can't reuse information for
    new problems when its use requires human intervention to disambiguate
    it.

    Tim Berners-Lee

    Aug 10, 2006, at 21:54, wangxiao (AT) musc (DOT) edu wrote:
    >
    >>

    >Quoting "Miller, Michael D (Rosetta)" <Michael_Miller (AT) Rosettabio (DOT) com>:
    >>

    You're correct here but it is the state of the art. Interestingly
    enough, I've found that in general the biology-based scientists and
    investigators are not all that bothered by this confusion and despite
    the confusion seem to make their way through it.
    >>
    >>

    >The problem is that semantic web is intended to make machine to
    >understand. And
    >the clarity is a prerequisite to instruct machine unambigously.
    >>

    >Xiaoshu
    >>

    >
    >
  • No.10 | | 14256 bytes | |

    Hi All,

    There are examples of systems that strive to separate the lexicon
    from the ontology, so as to ensure one particular lexical view of the
    underlying semantics doesn't "lock out" either humans or machines who
    do not "understand" that lexicon. Few are perfect, but many have
    effectively handled the issue of semantic interoperability, though
    often not at the level of semantic granularity required by experts at
    the bleeding edge of a specific scientific field.

    An ontology is of little use to anyone - person or machine - without
    instantiating it via a lexicon. Where very significant problems
    arise is when the lexicon is confused with the universals the
    ontology is intended to formally represent. I realize this boundary
    may appear artificial to some, but those who've worked on such issues
    for decades in the library & info sciences and in computational
    linguistics - despite some disagreement at the edges - will generally
    see this boundary as useful - even if they agree to disagree on
    whether it is in fact an artifact of human linguistic expression or a
    more fundamental expression of a sort of Heisenberg Uncertainty
    principle of KE/KR/KD. What I mean is the moment an algorithm tries
    to compute on an ontological expression in the context of specific
    data instances - whether the algorithm resides in silico or in a
    human brain - it "breaks" the universal nature of the principles and
    grounds it in a lexicon used to address the specific existential
    instances being manipulated within the domain of a specific
    application. I believe this issue is at the heart of some
    significant confusion regarding what an ontology is and the tasks it
    can help to implement.

    An effective and practical knowledge resource needs to include both
    ontological graphs and a complex lexical repository.

    I think where "ontology" construction often goes wrong is when it is
    not EXPLICIT and - of equal importance- quite SYSTEMATIC regarding
    the lexical extensions it includes - e.g., abbreviations,
    misspellings, various types of synonyms, homographic homonyms (the
    bane of NLP efforts everywhere), etc

    I was just listening to Michio Kaku discussing the recent controversy
    regarding the redefinition of "planet" status. As he and the
    astronomer Ken Croswell were discussing the issue, Dr. Kaku brought
    up the story from Richard Feinmann's biography regarding the
    difference between "naming" an entity and studying the fundamental
    properties and rules relating the continuum of entities in the
    physical world. Both the naming and the formalisms for
    characterizing the fundamentals are human artifacts - BUT what
    separates the naming from the expression of universals is the latter
    is guided by our increasing level of insight and understanding of
    real-world entities and the ways in which they relate to one
    another. No such criterion exists for the naming process, and this
    is why it is extremely helpful to keep the lexicon characterizing
    these names distinct from the expression of fundamentals (the
    ontologies). This is also an issue addressed by Gottfried W. von
    Leibniz in his philosophical works which all derived from the insight
    he had as a child that it MIGHT be possible to create a computable
    formalism for ontological entities analogous to the system created by
    mathematicians for performing axiomatic proofs in geometry. In MANY
    ways, our efforts here date back to this work by Leibniz via several,
    related historical threads in mathematics, philosophy, and various
    computationally-oriented scientific fields.

    other general point - obviously the strategies and "best
    practices" for addressing these issues in the context of existing
    (and historical) data records including the literature are somewhat
    different, as opposed to what we hope to see researchers doing going
    forward. In an ideal world - say 10 years form now - we can hope to
    see publication mechanisms in place both for primary data, supporting
    , and the larger world of the
    scientific literature - systems such as SWAN and some of the more
    advanced systems in development at BioMed Central and PLoS - to help
    reduce the complexity of the lexical Babel-esque landscape we must
    currently contend with. This needs to be done in a manner that
    doesn't in any way restrict the expressiveness of lexicon or the
    onotological foundations, while also being implemented in a highly
    intuitive manner not requiring the researcher learn a complex formal
    means to express themselves beyond the existing complexity typically
    used amongst domain experts. This is why I'd still place this 10
    years out. I don't think that's too optimistic a duration, however,
    given some of the revolutionary changes being introduced both by the
    SWTech C.S. community, as well as by the community of researchers
    embedded in the increasingly less messy process of biomedical
    ontology development and use. Some of these more modern scientific
    publication systems will come on line much sooner than this, but
    probably only in restricted contexts where there is a centralized
    authority that can both provide technical resources to develop,
    support, and evolve the systems, as well as enforce a certain level
    of compliance amongst its users - e.g., caBIG, the eScience myGRID
    project, REWERSE, The MIND Center at MGH, the BIRN project, etc
    For better or worse, as great a profile as these organizations
    represent, the landscape of working neuroscientists extends way
    beyond this privileged environment, and we all hope to see our
    efforts be of use and relevant to all neuroscientists (given the
    current scope of the HCLSIG hosted efforts is focussed on the
    neurosciences) and the value it can help neuroscientists realize for
    society at-large.

    As an example of where things can go wrong when convolving the
    lexicon with the ontology, take an artifact as relatively simple and
    seemingly "self-evident" as the "preferred label" or "preferred term"
    for a node in an ontological graph. In making the assertion
    "preferred", there is the implication some person or agency has
    passed judgement on the term. Reconciling two ontologies with
    overlapping knowledge domains can be made unnecessarily difficult
    when this implied contract is not made explicit. In other words, if
    you focus on reconciling the terms rather than reconciling the
    underlying semantic graphs, you can run into many unnecessary
    problems. I believe this issue is related to many of the discussions
    we've had on this list over the past 3 months both regarding ontology
    construction and use, as well as URI uniqueness and versioning
    contract. Formalisms such as SKS can be extremely helpful in this
    regard, as we need to compute on the lexicon, as well as the
    ontological graph.

    To offer a relatively simple and ubiquitous example from neuroscience
    - on one side of the pond they prefer "neurone", while on the other
    "neuron" is standard term. Is one more true? Do they refer to
    different, underlying fundamental entities? Can we even call the
    underlying entities "fundamental" when any neuroscientist would admit
    there is no neuron/neurone which has been explicitly qualified down
    to the level of all it's constituent molecules**, along with their
    explicit disposition in space and time?

    I won't hold you at bay. I'll give you my sense of the "practical"
    answers to these questions.

    Is one more true?
    not, since they are just lexical habits, as opposed to
    fundamental differences in the view of the world.

    Do they refer to different, underlying fundamental entities?
    This is a harder call - and very context dependent, obviously. It
    will be acutely sensitive to the level of granularity of the
    information provided on the neuron/neurone. If you presented two
    neuroscientists with coarse-grained data on a neuron/neurone, it is
    likely they could come to agreement they both were referring to the
    same fundamental entity when they named the source of that data as a
    neuron/neurone.

    Can we call the underlying entities "fundamental" when any
    neuroscientist would admit there is no neuron/neurone which has been
    explicitly qualified down to the level of all it's constituent
    molecules, along with their explicit disposition through time?
    What happens when you provide more detailed information regarding the
    purported neuron/neurone - say sufficient detail so that the two
    neuroscientists find aspects of data interpretation that are
    incommensurable in the Kuhnian sense (http://plato.stanford.edu/
    entries/thomas-kuhn/). Then, even if the two referred to the
    biological material entity that was the source of the data as a
    "neuron", they would likely not agree they were referring to the
    same, underlying fundamental entity. This is not unlike the
    situation described several posts below in this thread regarding a
    "gene". There could be a gene X identified by gene finding algorithm
    1, an "identical" gene X (in terms of the coding sequences it
    contains) derived from gene finding algorithm 2, the same gene X
    defined via a chromosomal walk, and finally a gene X defined via
    conventional genetic complementarity or hybrid mapping. They could
    all contain the same coding sequence - or the same as yet
    functionally unidentified ESTs. What it comes down to here, as Mark
    Wilkinson stated deep in the thread is there is much confusion
    regarding what actual material entity is being referenced - or
    whether a material entity is being referenced at all.

    In the end, I hope what SWTech can help us do is provide a robust,
    shared means to express the semantic facts about the data collected,
    as well as providing a dynamic and semi-automatic means to improve
    our characterization of the fundamentals - semi-automatic in the
    sense of "augmentation" of human intellectual abilities along the
    lines pursued by Doug Engelbart and Vanevar Bush before him. If we
    can devise a technical infrastructure allowing the formal, shared,
    semantic description of data to evolve toward an ever converging
    sense of what the true underlying entities are, then many of the
    misgivings folks have regarding the use of ontological frameworks to
    formally express semantic information will very likely fade.

    Cheers,
    Bill

    **Biophysicists who study ion-channel kinetics, protein folding
    dynamics, rhodopsin-based photon detection, mitochondrial energy
    transfer, etc. would probably also include quantum level formalisms
    to represent the states and dynamics of atoms, electrons, and sub-
    atomic particles.

    Aug 22, 2006, at 3:57 PM, Marja Koivunen wrote:

    I agree, consistent use of terms makes life easier for machines and
    for humans too when the terms have been agreed on, learned, and
    understood. Unfortunately, this takes a lot of effort and
    dedication from the humans. Learning a whole ontology before
    anything can be done is a bit like reading the whole manual of a
    DVD player before one can use that. And we all know that while
    there are people who actually read the whole manual, they are a
    minority.

    As a usability person I always like to see the machines support the
    humans as much as possible and not vice versa.
    In my view, new inventions often start from not so great terms and
    evolve stepwise as learning happens. terms are first shared
    and polished in small groups and later links are made between
    groups that may use different terminologies for similar things. If
    we want to support humans doing inventions I think we should
    support the use of different terms, their evolution, and making
    connections between similar terms when they are discovered as much
    as possible. And I think Semantic Web is great for that.

    Marja

    Tim Berners-Lee wrote:
    >
    >>

    >Yes, indeed. Machine processing of information relies on
    >consistent usage of terms. You can't reuse information for
    >new problems when its use requires human intervention to
    >disambiguate it.
    >>

    >Tim Berners-Lee
    >>

    >Aug 10, 2006, at 21:54, wangxiao (AT) musc (DOT) edu wrote:
    >>


    Quoting "Miller, Michael D (Rosetta)"
    <Michael_Miller (AT) Rosettabio (DOT) com>:

    You're correct here but it is the state of the art. Interestingly
    enough, I've found that in general the biology-based scientists and
    investigators are not all that bothered by this confusion and
    despite
    the confusion seem to make their way through it.

    The problem is that semantic web is intended to make machine to
    understand. And
    the clarity is a prerequisite to instruct machine unambigously.

    Xiaoshu

    >>
    >>

    >
    >


    Bill Bug
    Senior Research Analyst/ Engineer

    Laboratory for Bioimaging & Anatomical Informatics
    www.neuroterrain.org
    Department of Neurobiology & Anatomy
    Drexel University College of Medicine
    2900 Queen Lane
    Philadelphia, PA 19129
    215 991 8430 (ph)
    610 457 0443 (mobile)
    215 843 9367 (fax)

    Please Note: I now have a new email - William.Bug (AT) DrexelMed (DOT) edu

    This email and any accompanying attachments are confidential.
    This information is intended solely for the use of the individual
    to whom it is addressed. Any review, disclosure, copying,
    distribution, or use of this email communication by others is strictly
    prohibited. If you are not the intended recipient please notify us
    immediately by returning this message to the sender and delete
    all copies. Thank you for your cooperation.
  • No.11 | | 14256 bytes | |

    Hi All,

    There are examples of systems that strive to separate the lexicon
    from the ontology, so as to ensure one particular lexical view of the
    underlying semantics doesn't "lock out" either humans or machines who
    do not "understand" that lexicon. Few are perfect, but many have
    effectively handled the issue of semantic interoperability, though
    often not at the level of semantic granularity required by experts at
    the bleeding edge of a specific scientific field.

    An ontology is of little use to anyone - person or machine - without
    instantiating it via a lexicon. Where very significant problems
    arise is when the lexicon is confused with the universals the
    ontology is intended to formally represent. I realize this boundary
    may appear artificial to some, but those who've worked on such issues
    for decades in the library & info sciences and in computational
    linguistics - despite some disagreement at the edges - will generally
    see this boundary as useful - even if they agree to disagree on
    whether it is in fact an artifact of human linguistic expression or a
    more fundamental expression of a sort of Heisenberg Uncertainty
    principle of KE/KR/KD. What I mean is the moment an algorithm tries
    to compute on an ontological expression in the context of specific
    data instances - whether the algorithm resides in silico or in a
    human brain - it "breaks" the universal nature of the principles and
    grounds it in a lexicon used to address the specific existential
    instances being manipulated within the domain of a specific
    application. I believe this issue is at the heart of some
    significant confusion regarding what an ontology is and the tasks it
    can help to implement.

    An effective and practical knowledge resource needs to include both
    ontological graphs and a complex lexical repository.

    I think where "ontology" construction often goes wrong is when it is
    not EXPLICIT and - of equal importance- quite SYSTEMATIC regarding
    the lexical extensions it includes - e.g., abbreviations,
    misspellings, various types of synonyms, homographic homonyms (the
    bane of NLP efforts everywhere), etc

    I was just listening to Michio Kaku discussing the recent controversy
    regarding the redefinition of "planet" status. As he and the
    astronomer Ken Croswell were discussing the issue, Dr. Kaku brought
    up the story from Richard Feinmann's biography regarding the
    difference between "naming" an entity and studying the fundamental
    properties and rules relating the continuum of entities in the
    physical world. Both the naming and the formalisms for
    characterizing the fundamentals are human artifacts - BUT what
    separates the naming from the expression of universals is the latter
    is guided by our increasing level of insight and understanding of
    real-world entities and the ways in which they relate to one
    another. No such criterion exists for the naming process, and this
    is why it is extremely helpful to keep the lexicon characterizing
    these names distinct from the expression of fundamentals (the
    ontologies). This is also an issue addressed by Gottfried W. von
    Leibniz in his philosophical works which all derived from the insight
    he had as a child that it MIGHT be possible to create a computable
    formalism for ontological entities analogous to the system created by
    mathematicians for performing axiomatic proofs in geometry. In MANY
    ways, our efforts here date back to this work by Leibniz via several,
    related historical threads in mathematics, philosophy, and various
    computationally-oriented scientific fields.

    other general point - obviously the strategies and "best
    practices" for addressing these issues in the context of existing
    (and historical) data records including the literature are somewhat
    different, as opposed to what we hope to see researchers doing going
    forward. In an ideal world - say 10 years form now - we can hope to
    see publication mechanisms in place both for primary data, supporting
    , and the larger world of the
    scientific literature - systems such as SWAN and some of the more
    advanced systems in development at BioMed Central and PLoS - to help
    reduce the complexity of the lexical Babel-esque landscape we must
    currently contend with. This needs to be done in a manner that
    doesn't in any way restrict the expressiveness of lexicon or the
    onotological foundations, while also being implemented in a highly
    intuitive manner not requiring the researcher learn a complex formal
    means to express themselves beyond the existing complexity typically
    used amongst domain experts. This is why I'd still place this 10
    years out. I don't think that's too optimistic a duration, however,
    given some of the revolutionary changes being introduced both by the
    SWTech C.S. community, as well as by the community of researchers
    embedded in the increasingly less messy process of biomedical
    ontology development and use. Some of these more modern scientific
    publication systems will come on line much sooner than this, but
    probably only in restricted contexts where there is a centralized
    authority that can both provide technical resources to develop,
    support, and evolve the systems, as well as enforce a certain level
    of compliance amongst its users - e.g., caBIG, the eScience myGRID
    project, REWERSE, The MIND Center at MGH, the BIRN project, etc
    For better or worse, as great a profile as these organizations
    represent, the landscape of working neuroscientists extends way
    beyond this privileged environment, and we all hope to see our
    efforts be of use and relevant to all neuroscientists (given the
    current scope of the HCLSIG hosted efforts is focussed on the
    neurosciences) and the value it can help neuroscientists realize for
    society at-large.

    As an example of where things can go wrong when convolving the
    lexicon with the ontology, take an artifact as relatively simple and
    seemingly "self-evident" as the "preferred label" or "preferred term"
    for a node in an ontological graph. In making the assertion
    "preferred", there is the implication some person or agency has
    passed judgement on the term. Reconciling two ontologies with
    overlapping knowledge domains can be made unnecessarily difficult
    when this implied contract is not made explicit. In other words, if
    you focus on reconciling the terms rather than reconciling the
    underlying semantic graphs, you can run into many unnecessary
    problems. I believe this issue is related to many of the discussions
    we've had on this list over the past 3 months both regarding ontology
    construction and use, as well as URI uniqueness and versioning
    contract. Formalisms such as SKS can be extremely helpful in this
    regard, as we need to compute on the lexicon, as well as the
    ontological graph.

    To offer a relatively simple and ubiquitous example from neuroscience
    - on one side of the pond they prefer "neurone", while on the other
    "neuron" is standard term. Is one more true? Do they refer to
    different, underlying fundamental entities? Can we even call the
    underlying entities "fundamental" when any neuroscientist would admit
    there is no neuron/neurone which has been explicitly qualified down
    to the level of all it's constituent molecules**, along with their
    explicit disposition in space and time?

    I won't hold you at bay. I'll give you my sense of the "practical"
    answers to these questions.

    Is one more true?
    not, since they are just lexical habits, as opposed to
    fundamental differences in the view of the world.

    Do they refer to different, underlying fundamental entities?
    This is a harder call - and very context dependent, obviously. It
    will be acutely sensitive to the level of granularity of the
    information provided on the neuron/neurone. If you presented two
    neuroscientists with coarse-grained data on a neuron/neurone, it is
    likely they could come to agreement they both were referring to the
    same fundamental entity when they named the source of that data as a
    neuron/neurone.

    Can we call the underlying entities "fundamental" when any
    neuroscientist would admit there is no neuron/neurone which has been
    explicitly qualified down to the level of all it's constituent
    molecules, along with their explicit disposition through time?
    What happens when you provide more detailed information regarding the
    purported neuron/neurone - say sufficient detail so that the two
    neuroscientists find aspects of data interpretation that are
    incommensurable in the Kuhnian sense (http://plato.stanford.edu/
    entries/thomas-kuhn/). Then, even if the two referred to the
    biological material entity that was the source of the data as a
    "neuron", they would likely not agree they were referring to the
    same, underlying fundamental entity. This is not unlike the
    situation described several posts below in this thread regarding a
    "gene". There could be a gene X identified by gene finding algorithm
    1, an "identical" gene X (in terms of the coding sequences it
    contains) derived from gene finding algorithm 2, the same gene X
    defined via a chromosomal walk, and finally a gene X defined via
    conventional genetic complementarity or hybrid mapping. They could
    all contain the same coding sequence - or the same as yet
    functionally unidentified ESTs. What it comes down to here, as Mark
    Wilkinson stated deep in the thread is there is much confusion
    regarding what actual material entity is being referenced - or
    whether a material entity is being referenced at all.

    In the end, I hope what SWTech can help us do is provide a robust,
    shared means to express the semantic facts about the data collected,
    as well as providing a dynamic and semi-automatic means to improve
    our characterization of the fundamentals - semi-automatic in the
    sense of "augmentation" of human intellectual abilities along the
    lines pursued by Doug Engelbart and Vanevar Bush before him. If we
    can devise a technical infrastructure allowing the formal, shared,
    semantic description of data to evolve toward an ever converging
    sense of what the true underlying entities are, then many of the
    misgivings folks have regarding the use of ontological frameworks to
    formally express semantic information will very likely fade.

    Cheers,
    Bill

    **Biophysicists who study ion-channel kinetics, protein folding
    dynamics, rhodopsin-based photon detection, mitochondrial energy
    transfer, etc. would probably also include quantum level formalisms
    to represent the states and dynamics of atoms, electrons, and sub-
    atomic particles.

    Aug 22, 2006, at 3:57 PM, Marja Koivunen wrote:

    I agree, consistent use of terms makes life easier for machines and
    for humans too when the terms have been agreed on, learned, and
    understood. Unfortunately, this takes a lot of effort and
    dedication from the humans. Learning a whole ontology before
    anything can be done is a bit like reading the whole manual of a
    DVD player before one can use that. And we all know that while
    there are people who actually read the whole manual, they are a
    minority.

    As a usability person I always like to see the machines support the
    humans as much as possible and not vice versa.
    In my view, new inventions often start from not so great terms and
    evolve stepwise as learning happens. terms are first shared
    and polished in small groups and later links are made between
    groups that may use different terminologies for similar things. If
    we want to support humans doing inventions I think we should
    support the use of different terms, their evolution, and making
    connections between similar terms when they are discovered as much
    as possible. And I think Semantic Web is great for that.

    Marja

    Tim Berners-Lee wrote:
    >
    >>

    >Yes, indeed. Machine processing of information relies on
    >consistent usage of terms. You can't reuse information for
    >new problems when its use requires human intervention to
    >disambiguate it.
    >>

    >Tim Berners-Lee
    >>

    >Aug 10, 2006, at 21:54, wangxiao (AT) musc (DOT) edu wrote:
    >>


    Quoting "Miller, Michael D (Rosetta)"
    <Michael_Miller (AT) Rosettabio (DOT) com>:

    You're correct here but it is the state of the art. Interestingly
    enough, I've found that in general the biology-based scientists and
    investigators are not all that bothered by this confusion and
    despite
    the confusion seem to make their way through it.

    The problem is that semantic web is intended to make machine to
    understand. And
    the clarity is a prerequisite to instruct machine unambigously.

    Xiaoshu

    >>
    >>

    >
    >


    Bill Bug
    Senior Research Analyst/ Engineer

    Laboratory for Bioimaging & Anatomical Informatics
    www.neuroterrain.org
    Department of Neurobiology & Anatomy
    Drexel University College of Medicine
    2900 Queen Lane
    Philadelphia, PA 19129
    215 991 8430 (ph)
    610 457 0443 (mobile)
    215 843 9367 (fax)

    Please Note: I now have a new email - William.Bug (AT) DrexelMed (DOT) edu

    This email and any accompanying attachments are confidential.
    This information is intended solely for the use of the individual
    to whom it is addressed. Any review, disclosure, copying,
    distribution, or use of this email communication by others is strictly
    prohibited. If you are not the intended recipient please notify us
    immediately by returning this message to the sender and delete
    all copies. Thank you for your cooperation.

Re: A precedent suggesting a compromise for the SWHCLS IG Best Practices (ARK)


max 4000 letters.
Your nickname that display:
In order to stop the spam: 2 + 1 =
QUESTION ON "Standards"

EMSDN.COM