Hi All,
There are examples of systems that strive to separate the lexicon
from the ontology, so as to ensure one particular lexical view of the
underlying semantics doesn't "lock out" either humans or machines who
do not "understand" that lexicon. Few are perfect, but many have
effectively handled the issue of semantic interoperability, though
often not at the level of semantic granularity required by experts at
the bleeding edge of a specific scientific field.
An ontology is of little use to anyone - person or machine - without
instantiating it via a lexicon. Where very significant problems
arise is when the lexicon is confused with the universals the
ontology is intended to formally represent. I realize this boundary
may appear artificial to some, but those who've worked on such issues
for decades in the library & info sciences and in computational
linguistics - despite some disagreement at the edges - will generally
see this boundary as useful - even if they agree to disagree on
whether it is in fact an artifact of human linguistic expression or a
more fundamental expression of a sort of Heisenberg Uncertainty
principle of KE/KR/KD. What I mean is the moment an algorithm tries
to compute on an ontological expression in the context of specific
data instances - whether the algorithm resides in silico or in a
human brain - it "breaks" the universal nature of the principles and
grounds it in a lexicon used to address the specific existential
instances being manipulated within the domain of a specific
application. I believe this issue is at the heart of some
significant confusion regarding what an ontology is and the tasks it
can help to implement.
An effective and practical knowledge resource needs to include both
ontological graphs and a complex lexical repository.
I think where "ontology" construction often goes wrong is when it is
not EXPLICIT and - of equal importance- quite SYSTEMATIC regarding
the lexical extensions it includes - e.g., abbreviations,
misspellings, various types of synonyms, homographic homonyms (the
bane of NLP efforts everywhere), etc
I was just listening to Michio Kaku discussing the recent controversy
regarding the redefinition of "planet" status. As he and the
astronomer Ken Croswell were discussing the issue, Dr. Kaku brought
up the story from Richard Feinmann's biography regarding the
difference between "naming" an entity and studying the fundamental
properties and rules relating the continuum of entities in the
physical world. Both the naming and the formalisms for
characterizing the fundamentals are human artifacts - BUT what
separates the naming from the expression of universals is the latter
is guided by our increasing level of insight and understanding of
real-world entities and the ways in which they relate to one
another. No such criterion exists for the naming process, and this
is why it is extremely helpful to keep the lexicon characterizing
these names distinct from the expression of fundamentals (the
ontologies). This is also an issue addressed by Gottfried W. von
Leibniz in his philosophical works which all derived from the insight
he had as a child that it MIGHT be possible to create a computable
formalism for ontological entities analogous to the system created by
mathematicians for performing axiomatic proofs in geometry. In MANY
ways, our efforts here date back to this work by Leibniz via several,
related historical threads in mathematics, philosophy, and various
computationally-oriented scientific fields.
other general point - obviously the strategies and "best
practices" for addressing these issues in the context of existing
(and historical) data records including the literature are somewhat
different, as opposed to what we hope to see researchers doing going
forward. In an ideal world - say 10 years form now - we can hope to
see publication mechanisms in place both for primary data, supporting
, and the larger world of the
scientific literature - systems such as SWAN and some of the more
advanced systems in development at BioMed Central and PLoS - to help
reduce the complexity of the lexical Babel-esque landscape we must
currently contend with. This needs to be done in a manner that
doesn't in any way restrict the expressiveness of lexicon or the
onotological foundations, while also being implemented in a highly
intuitive manner not requiring the researcher learn a complex formal
means to express themselves beyond the existing complexity typically
used amongst domain experts. This is why I'd still place this 10
years out. I don't think that's too optimistic a duration, however,
given some of the revolutionary changes being introduced both by the
SWTech C.S. community, as well as by the community of researchers
embedded in the increasingly less messy process of biomedical
ontology development and use. Some of these more modern scientific
publication systems will come on line much sooner than this, but
probably only in restricted contexts where there is a centralized
authority that can both provide technical resources to develop,
support, and evolve the systems, as well as enforce a certain level
of compliance amongst its users - e.g., caBIG, the eScience myGRID
project, REWERSE, The MIND Center at MGH, the BIRN project, etc
For better or worse, as great a profile as these organizations
represent, the landscape of working neuroscientists extends way
beyond this privileged environment, and we all hope to see our
efforts be of use and relevant to all neuroscientists (given the
current scope of the HCLSIG hosted efforts is focussed on the
neurosciences) and the value it can help neuroscientists realize for
society at-large.
As an example of where things can go wrong when convolving the
lexicon with the ontology, take an artifact as relatively simple and
seemingly "self-evident" as the "preferred label" or "preferred term"
for a node in an ontological graph. In making the assertion
"preferred", there is the implication some person or agency has
passed judgement on the term. Reconciling two ontologies with
overlapping knowledge domains can be made unnecessarily difficult
when this implied contract is not made explicit. In other words, if
you focus on reconciling the terms rather than reconciling the
underlying semantic graphs, you can run into many unnecessary
problems. I believe this issue is related to many of the discussions
we've had on this list over the past 3 months both regarding ontology
construction and use, as well as URI uniqueness and versioning
contract. Formalisms such as SKS can be extremely helpful in this
regard, as we need to compute on the lexicon, as well as the
ontological graph.
To offer a relatively simple and ubiquitous example from neuroscience
- on one side of the pond they prefer "neurone", while on the other
"neuron" is standard term. Is one more true? Do they refer to
different, underlying fundamental entities? Can we even call the
underlying entities "fundamental" when any neuroscientist would admit
there is no neuron/neurone which has been explicitly qualified down
to the level of all it's constituent molecules**, along with their
explicit disposition in space and time?
I won't hold you at bay. I'll give you my sense of the "practical"
answers to these questions.
Is one more true?
not, since they are just lexical habits, as opposed to
fundamental differences in the view of the world.
Do they refer to different, underlying fundamental entities?
This is a harder call - and very context dependent, obviously. It
will be acutely sensitive to the level of granularity of the
information provided on the neuron/neurone. If you presented two
neuroscientists with coarse-grained data on a neuron/neurone, it is
likely they could come to agreement they both were referring to the
same fundamental entity when they named the source of that data as a
neuron/neurone.
Can we call the underlying entities "fundamental" when any
neuroscientist would admit there is no neuron/neurone which has been
explicitly qualified down to the level of all it's constituent
molecules, along with their explicit disposition through time?
What happens when you provide more detailed information regarding the
purported neuron/neurone - say sufficient detail so that the two
neuroscientists find aspects of data interpretation that are
incommensurable in the Kuhnian sense (http://plato.stanford.edu/
entries/thomas-kuhn/). Then, even if the two referred to the
biological material entity that was the source of the data as a
"neuron", they would likely not agree they were referring to the
same, underlying fundamental entity. This is not unlike the
situation described several posts below in this thread regarding a
"gene". There could be a gene X identified by gene finding algorithm
1, an "identical" gene X (in terms of the coding sequences it
contains) derived from gene finding algorithm 2, the same gene X
defined via a chromosomal walk, and finally a gene X defined via
conventional genetic complementarity or hybrid mapping. They could
all contain the same coding sequence - or the same as yet
functionally unidentified ESTs. What it comes down to here, as Mark
Wilkinson stated deep in the thread is there is much confusion
regarding what actual material entity is being referenced - or
whether a material entity is being referenced at all.
In the end, I hope what SWTech can help us do is provide a robust,
shared means to express the semantic facts about the data collected,
as well as providing a dynamic and semi-automatic means to improve
our characterization of the fundamentals - semi-automatic in the
sense of "augmentation" of human intellectual abilities along the
lines pursued by Doug Engelbart and Vanevar Bush before him. If we
can devise a technical infrastructure allowing the formal, shared,
semantic description of data to evolve toward an ever converging
sense of what the true underlying entities are, then many of the
misgivings folks have regarding the use of ontological frameworks to
formally express semantic information will very likely fade.
Cheers,
Bill
**Biophysicists who study ion-channel kinetics, protein folding
dynamics, rhodopsin-based photon detection, mitochondrial energy
transfer, etc. would probably also include quantum level formalisms
to represent the states and dynamics of atoms, electrons, and sub-
atomic particles.
Aug 22, 2006, at 3:57 PM, Marja Koivunen wrote:
I agree, consistent use of terms makes life easier for machines and
for humans too when the terms have been agreed on, learned, and
understood. Unfortunately, this takes a lot of effort and
dedication from the humans. Learning a whole ontology before
anything can be done is a bit like reading the whole manual of a
DVD player before one can use that. And we all know that while
there are people who actually read the whole manual, they are a
minority.
As a usability person I always like to see the machines support the
humans as much as possible and not vice versa.
In my view, new inventions often start from not so great terms and
evolve stepwise as learning happens. terms are first shared
and polished in small groups and later links are made between
groups that may use different terminologies for similar things. If
we want to support humans doing inventions I think we should
support the use of different terms, their evolution, and making
connections between similar terms when they are discovered as much
as possible. And I think Semantic Web is great for that.
Marja
Tim Berners-Lee wrote:
>
>>
>Yes, indeed. Machine processing of information relies on
>consistent usage of terms. You can't reuse information for
>new problems when its use requires human intervention to
>disambiguate it.
>>
>Tim Berners-Lee
>>
>Aug 10, 2006, at 21:54, wangxiao (AT) musc (DOT) edu wrote:
>>
Quoting "Miller, Michael D (Rosetta)"
<Michael_Miller (AT) Rosettabio (DOT) com>:
You're correct here but it is the state of the art. Interestingly
enough, I've found that in general the biology-based scientists and
investigators are not all that bothered by this confusion and
despite
the confusion seem to make their way through it.
The problem is that semantic web is intended to make machine to
understand. And
the clarity is a prerequisite to instruct machine unambigously.
Xiaoshu
>>
>>
>
>
Bill Bug
Senior Research Analyst/ Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)
Please Note: I now have a new email - William.Bug (AT) DrexelMed (DOT) edu
This email and any accompanying attachments are confidential.
This information is intended solely for the use of the individual
to whom it is addressed. Any review, disclosure, copying,
distribution, or use of this email communication by others is strictly
prohibited. If you are not the intended recipient please notify us
immediately by returning this message to the sender and delete
all copies. Thank you for your cooperation.