notes at contepts vs notes at terms
13 answers - 8628 bytes -

Hi Mark,
Note that I'm referring to use cases other than annotation for
document retrieval, for which I agree you should annotate with the
concept, not the term.
Can you please describe these use cases in detail, explaining in each case exactly what it is you want to be able to assert, what those assertions would mean, and what exactly is the nature of the resources involved in those assertions.
These are just additional arguments on top of
the "we need a Term class to attach properties to" argument
What are these properties? Please list, with an explanation of the meaning of any assertions made using them.
Fwiw
'Term' is the most hideous word. It means a million different things to a million different people. A 'term' from a controlled vocabulary, and a 'term' from a terminology are *completely different things* [1][2]. In metadata applications, 'terms' can be properties of things, or values of those properties, or classes of things, or meaningless strings, or all of the above - cf. the 'Dublin Core Metadata Terms' [3]. The SKS Core Vocabulary Specification [4] uses 'term' to refer to the classes and properties of the SKS Core Vocabulary itself, a usage that is consistent with Dublin Core and other RDF documentation.
Because of this incredibly overloaded usage in overlapping fields of discourse, the SKS Core Guide [5] contains virtually no occurrences of the character string 'term' in prose. This is *very* deliberate. (I just found a couple that slipped through, doh.)
The lesson Dublin Core folks have learned is: be precise. The meaning of several of the properties of the dublin core element set is now so overloaded in practice as to render them effectively meaningless. This is a huge problem for the DCMI architecture and usage teams.
If we were to coin a class 'Term' for SKS Core, I'm quite certain that the incredible variation that would be found in its practical usage would render it, and all the associated parts of SKS Core, effectively meaningless. We would be contributing confusion to an already very confused field of discourse.
Bottom line: If you can define a class of resources that isn't called 'Term', whose meaning is clear and easily defined, whose application is straightforward and unambiguous, and whose supporting use cases can be justified by a significant body of practice, then great, let's talk about it.
If you can't, think outside the box. Think about n-ary relations. If you're finding it hard to define the nature (i.e. type) of the things you're trying to relate, perhaps you're conflating resources. Perhaps what you understand as a 'thesaurus term' is actually an instance of an n-ary relationship between several things. If you don't like n-ary relations, make an effort to differentiate what you mean by the word 'term' in all the different contexts in which you use it, then start defining classes from there. I'll bet you end up with about 12 classes, almost all of which are disjoint.
Cheers,
Al.
[1]
[2]
[3]
[4]
[5]
Message
From: Mark van Assem [mailto:mark (AT) cs (DOT) vu.nl]
Sent: 26 2005 12:01
To: Miles, AJ (Alistair)
Cc: public-esw-thes (AT) w3 (DOT) org
Subject: Re: notes at contepts vs notes at terms
Hi Alistair,
I don't know how to say this without sounding like an arse
but I'm pretty sure that what you're suggesting
contradicts the basic principles of thesaurus construction
and use, as I've learned them from IS 2788, the new BS 8723,
and directly from folks like Stella and Leonard.
Probably you're right, but I think that some of the thesaurus
folk are
in favour of having a Term class for the reason of attaching
properties to them. The result is that you can have URIs for
them, and
use the terms in the ways I suggest. And I guess that if people find
those useful, they *will*, no matter what any standard is saying. And
I don't think they would be wrong in doing so.
then thesaurus T term <rockand thesaurus T term
<basaltare semantically equivalent tokens.
Yep, in the thesaurus they are, just like (I think) in WN the
WordSenses are equivalent within one Synset. But for some practical
uses (which you agreed to exist for WordSenses) they are not.
Therefore, 'annotating' a document with the thesaurus T
term <basaltis semantically equivalent to 'annotating' the
document with the thesarus T term <rock>. Therefore, there's
no point in doing it.
Would someone using that thesaurus agree that <basaltand <rockare
equivalent?
If you want to say something more specific, using a
thesaurus, then you need a thesaurus that has <basaltas a
preferred term.
But if there isn't any?
Alternatively, use free text keyword annotations.
Note that I'm referring to use cases other than annotation for
document retrieval, for which I agree you should annotate with the
concept, not the term.
The words 'rock' and 'basalt' may have quite different
meanings to you when used in natural language discourse, but
that is completely irrelevant. The word 'rock', and thesarus
T term <rock>, are entirely separate entities.
>>A more probable/useful scenario is that a prefterm in one
>>language is mapped to
>>a nonpref term in another, because it is a more accurate
>>translation of the
>>word. It enables a more finegrained mapping than just between
>>concepts.
If you are talking about semantic mapping, then whether you
choose thesaurus T term <rockor thesaurus T term <basalt
as your mapping target makes no difference to the meaning of
the mapping, because thesaurus T term <rockand thesaurus T
term <basaltare semantically equivalent tokens. Therefore,
if you are talking about semantic mapping, it is not possible
to create a 'more fine-grained mapping' than that which is
possible by mapping between the concepts.
Not on the concept level, but it is possible on the term level?
What is wrong with stating that prefTerm A in language X is usually
displayed/used in texts/ in language Y with nonPrefTerm B?
It gives
you additional information that you are free to ignore, because the
concept-to-concept mappings are implied by term-to-term mappings
(well, if you define your mapping vocabulary in that way). It
may help
e.g. in translation or displays.
Maybe this is not extremely useful, but I don't see anything
fundamentally wrong with it, either.
>>A first use is if you are really interested in that specific
>>term instead of its
>>synonyms. For example if you want to count the number of
>>times a certain concept
>>is misspelled. counting the # occurences of a specific term.
How can you misspell a 'concept'? What are you counting
exactly? What do you mean by an 'occurrence of a specific term'?
A concept cannot be misspelled because it is nameless. You are
counting the terms, not the concept.
N.B. A word, or collocations of words, that appears in a
natural language document, and a thesaurus term that shares
an identical character sequence, are entirely separate
entities. The fact that they share an identical character
sequence allows you to infer absolutely nothing at all.
Why not? course you may need to assume that the meaning of
term and
word overlap, but I think that programmers might just do that.
Am I making any sense?
I can see perfectly clear where you're coming from, and my use cases
may turn out to be complete DB after all, but I do think that people
would try to (ab)use a thesaurus in all kinds of ways, and would not
be wrong in doing so. These are just additional arguments on top of
the "we need a Term class to attach properties to" argument (which is
probably a more compelling argument). And, if we do introduce a Term
class, they are possible uses which we cannot prohibit.
Cheers,
Mark.
No.1 | | 2594 bytes |
| 
Hi Alistair,
I'm leaving the "use cases" that I wrote about for later, in this post
I to summarize what I think Sue Ellen Wright, Bernard Vatant, Phil
Carlisle and Stella Dextre Clark have been saying (please correct me
if I'm wrong!) [1,2,3,4,5]. I'm hoping Sue and Stella can find time to
provide some more examples.
About the word "term": I agree that it is overloaded. Should I call
"the thing in the range of the skos:prefLabel and skos:altLabel
properties" a Label or a Token?
If a class e.g. Label is introduced instead of the literal currently
defined as the range of skos:prefLabel and skos:altLabel, additional
information can be attached to Labels. The categories of information
that can be attached to instances of a class Label or Token are
(summarizing other people's posts):
- scope notes for terms, also referring to other terms to use [4]
- lexical information about the term [2,5]
- scope of usage of the lexical term [5]
- etymological, register-related, standardization
related [2] (I hope Sue can find time to clarify this further)
(what follows is not summarizing [1-5])
Furthermore, some examples from MeSH [6]:
- TermUI (local identifier)
- date created
- source thesaurus (MeSH groups different thesauri into one)
- abbreviation
which mostly fall under a category "editorial information". I also
remember someone posting that some thesauri attach different
definitions to different terms.
A class Label would make it possible to extend the SKS schema for
categories of information (attached to Label) we can't foresee right now.
You provide an alternative way to attach notes to labels [7]:
ex:conceptA a skos:Concept;
skos:prefLabel 'Animals';
skos:altLabel 'Fauna';
skos:editorialNote [
skos:onLbl 'Fauna';
rdf:value 'Check with Mr.X. whether to keep "Fauna".';
];
I think this is a solution, but in principle this method could be used
everywhere you normally use a class to group information about an
entity. I think the more usual way to do this in RDF or WL is
introduce a class. Also, this is harder to maintain (changing
skos:altLabel 'Fauna' without changing skos:onLbl 'Fauna' leads to
errors).
course we can choose not to support this kind of information
attached to terms, but then we should say so explicitly.
Cheers,
Mark.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
No.2 | | 5814 bytes |
| 
In a posting way back on 1 March 2005, I mentioned a couple of problems
with the underlying SKS model in terms of thesaurus management. The one
issue seems to have attracted a lot of attention ('notes at term level')
and some progress seems to be being made on accommodating that issue.
However I feel I also have to raise again the second problem, that of
semantic associations between non-preferred terms (or tokens as Mark
suggests).
Frequently in multilingual thesauri, we want to capture a relationship of
equivalence between different tokens used as altLabels in different
languages. This feature is frequently used in multilingual thesauri such as
the ECD Macrothesaurus, the IL Thesaurus or Agrivoc (there are
undoubtedly others, but these are the ones that I know about). In semantic
terms, these altLabel tokens have some close relationship (usually they all
equivalently express an underlying shared concept [1] that has been
specifically excluded from the accepted concepts within the thesaurus
itself, often for reasons of literary warrant). In practical terms, the
different altLabel tokens are then managed together as a single unit: for
example, in doing a translation into a new thesaurus language, there would
be a special effort to try to find an equivalent altLabel in the target
language corresponding to that excluded concept, and in revising or
adapting the thesaurus to local conditions and use, the non-descriptors may
be promoted together to descriptor level (or descriptors demoted to
non-descriptors, while retaining the semantic relationship between the
altLabel tokens that exists in the original descriptor/concept record).
If SKS is meant to be a way of interoperating for query expansion and
similar kinds of uses, this feature is not (I think) necessary. In fact, in
my personal opinion, I think at this stage we should be keeping SKS as
simple as possible to promote wide implementation.
However if we talk about SKS as a way of exchanging thesaurus information
for purposes of thesaurus management (which I know is one of the uses that
Alistair talks about), then it is both a necessary and reasonable
requirement. multilingual thesaurus won't be able to use SKS for
management purposes without a significant loss of information.
I'm hoping that some of the proposals being put forward for accommodating
notes on altLabel tokens will also be able to accommodate this multilingual
requirement.
Ron
[1] Sorry if this use of the word concept offends but I don't know what
other word to use here.
At 12:29 1/11/2005, Mark van Assem wrote:
>Hi Alistair,
>
>I'm leaving the "use cases" that I wrote about for later, in this post I
>to summarize what I think Sue Ellen Wright, Bernard Vatant, Phil Carlisle
>and Stella Dextre Clark have been saying (please correct me if I'm wrong!)
>[1,2,3,4,5]. I'm hoping Sue and Stella can find time to provide some more
>examples.
>
>About the word "term": I agree that it is overloaded. Should I call "the
>thing in the range of the skos:prefLabel and skos:altLabel properties" a
>Label or a Token?
>
>If a class e.g. Label is introduced instead of the literal currently
>defined as the range of skos:prefLabel and skos:altLabel, additional
>information can be attached to Labels. The categories of information that
>can be attached to instances of a class Label or Token are (summarizing
>other people's posts):
>
>- scope notes for terms, also referring to other terms to use [4]
>- lexical information about the term [2,5]
>- scope of usage of the lexical term [5]
>- etymological, register-related, standardization
>related [2] (I hope Sue can find time to clarify this further)
>
>(what follows is not summarizing [1-5])
>
>Furthermore, some examples from MeSH [6]:
>
>- TermUI (local identifier)
>- date created
>- source thesaurus (MeSH groups different thesauri into one)
>- abbreviation
>
>which mostly fall under a category "editorial information". I also
>remember someone posting that some thesauri attach different definitions
>to different terms.
>
>A class Label would make it possible to extend the SKS schema for
>categories of information (attached to Label) we can't foresee right now.
>
>You provide an alternative way to attach notes to labels [7]:
>
>ex:conceptA a skos:Concept;
skos:prefLabel 'Animals';
skos:altLabel 'Fauna';
skos:editorialNote [
skos:onLbl 'Fauna';
rdf:value 'Check with Mr.X. whether to keep "Fauna".';
];
>
>I think this is a solution, but in principle this method could be used
>everywhere you normally use a class to group information about an entity.
>I think the more usual way to do this in RDF or WL is introduce a class.
>Also, this is harder to maintain (changing skos:altLabel 'Fauna' without
>changing skos:onLbl 'Fauna' leads to errors).
>
course we can choose not to support this kind of information attached
>to terms, but then we should say so explicitly.
>
>Cheers,
>Mark.
>
>
>[1]
>[2]
>[3]
>[4]
>[5]
>[6]
>[7]
>
>
Mark F.J. van Assem - Vrije Universiteit Amsterdam
mark (AT) cs (DOT) vu.nl - http://www.cs.vu.nl/~mark
--
No.3 | | 6220 bytes |
| 
>From a terminological standpoint, what Ron writes is absolutely appropriate
(also his use of "concept" because concepts occur at all levels and are
many-faceted). Because of diversification and neutralization of concepts
when we move from one language to another, multilingual resources have to
distinguish between apparent sub-concepts that seem trivial sometimes in one
language, but which form critical distinctions in another. In terminology
management, however, these issues would all be handled with notes on concept
rather than notes on terms. (We use a special note called a "transfer
comment", which would certainly violate the principle of simplicity that is
the focus of SKS.) We would need these kinds of notations in a
terminological extension.
Sue Ellen
11/1/05, Ron Davies <ron (AT) rondavies (DOT) bewrote:
In a posting way back on 1 March 2005, I mentioned a couple of problems
with the underlying SKS model in terms of thesaurus management. The one
issue seems to have attracted a lot of attention ('notes at term level') and
some progress seems to be being made on accommodating that issue. However I
feel I also have to raise again the second problem, that of semantic
associations between non-preferred terms (or tokens as Mark suggests).
Frequently in multilingual thesauri, we want to capture a relationship of
equivalence between different tokens used as altLabels in different
languages. This feature is frequently used in multilingual thesauri such as
the ECD Macrothesaurus, the IL Thesaurus or Agrivoc (there are undoubtedly
others, but these are the ones that I know about). In semantic terms, these
altLabel tokens have some close relationship (usually they all equivalently
express an underlying shared concept [1] that has been specifically excluded
from the accepted concepts within the thesaurus itself, often for reasons of
literary warrant). In practical terms, the different altLabel tokens are
then managed together as a single unit: for example, in doing a translation
into a new thesaurus language, there would be a special effort to try to
find an equivalent altLabel in the target language corresponding to that
excluded concept, and in revising or adapting the thesaurus to local
conditions and use, the non-descriptors may be promoted together to
descriptor level (or descriptors demoted to non-descriptors, while retaining
the semantic relationship between the altLabel tokens that exists in the
original descriptor/concept record).
If SKS is meant to be a way of interoperating for query expansion and
similar kinds of uses, this feature is not (I think) necessary. In fact, in
my personal opinion, I think at this stage we should be keeping SKS as
simple as possible to promote wide implementation.
However if we talk about SKS as a way of exchanging thesaurus information
for purposes of thesaurus management (which I know is one of the uses that
Alistair talks about), then it is both a necessary and reasonable
requirement. multilingual thesaurus won't be able to use SKS for
management purposes without a significant loss of information.
I'm hoping that some of the proposals being put forward for accommodating
notes on altLabel tokens will also be able to accommodate this multilingual
requirement.
Ron
[1] Sorry if this use of the word concept offends but I don't know what
other word to use here.
At 12:29 1/11/2005, Mark van Assem wrote:
Hi Alistair,
I'm leaving the "use cases" that I wrote about for later, in this post I
to summarize what I think Sue Ellen Wright, Bernard Vatant, Phil Carlisle
and Stella Dextre Clark have been saying (please correct me if I'm wrong!)
[1,2,3,4,5]. I'm hoping Sue and Stella can find time to provide some more
examples.
About the word "term": I agree that it is overloaded. Should I call "the
thing in the range of the skos:prefLabel and skos:altLabel properties" a
Label or a Token?
If a class e.g. Label is introduced instead of the literal currently
defined as the range of skos:prefLabel and skos:altLabel, additional
information can be attached to Labels. The categories of information that
can be attached to instances of a class Label or Token are (summarizing
other people's posts):
- scope notes for terms, also referring to other terms to use [4]
- lexical information about the term [2,5]
- scope of usage of the lexical term [5]
- etymological, register-related, standardization
related [2] (I hope Sue can find time to clarify this further)
(what follows is not summarizing [1-5])
Furthermore, some examples from MeSH [6]:
- TermUI (local identifier)
- date created
- source thesaurus (MeSH groups different thesauri into one)
- abbreviation
which mostly fall under a category "editorial information". I also
remember someone posting that some thesauri attach different definitions to
different terms.
A class Label would make it possible to extend the SKS schema for
categories of information (attached to Label) we can't foresee right now.
You provide an alternative way to attach notes to labels [7]:
ex:conceptA a skos:Concept;
skos:prefLabel 'Animals';
skos:altLabel 'Fauna';
skos:editorialNote [
skos:onLbl 'Fauna';
rdf:value 'Check with Mr.X. whether to keep "Fauna".';
];
I think this is a solution, but in principle this method could be used
everywhere you normally use a class to group information about an entity. I
think the more usual way to do this in RDF or WL is introduce a class.
Also, this is harder to maintain (changing skos:altLabel 'Fauna' without
changing skos:onLbl 'Fauna' leads to errors).
course we can choose not to support this kind of information attached
to terms, but then we should say so explicitly.
Cheers,
Mark.
--
[1]
[2]
[3]
[4]
[5]
[6]
[7]
--
No.4 | | 6847 bytes |
| 
Alas, some wise person once said that one should be careful what you wish
for. I mentioned a couple instances of very clear term notes, leaving out
the full list because these are mostly elements of information that are
relevant for terminologies but not always significant for thesauri. But
since you asked, here's the list of term-related information -- most of
which we distinguish with a full variety of different data elements in
terminology work, but all of which under some circumstances might be
relevant as information in a "term note". The numbers are from the old
classification system we used in IS 12620:1999. Here goes:
A.2 term-related information
A.2.1 term type
DESCRIPTIN: An attribute assigned to a term.
NTE: Term types can include:
A.2.1.1 main entry term
A.2.1.2 synonym
A.2.1.3 quasi-synonym
A.2.1.4 international scientific term
A.2.1.5 common name
A.2.1.6 internationalism
A.2.1.7 full form
A.2.1.8 abbreviated form of term
NTE 2: Types of abbreviated form can include:
A.2.1.8.1 abbreviation
A.2.1.8.2 short form of term
ADMITTED NAME: short form
DESCRIPTIN: A variant of a multiword term that includes fewer words than
the full form of the term.
A.2.1.8.3 initialism
A.2.1.8.4 acronym
A.2.1.8.5 clipped term
A.2.1.9 variant
A.2.1.10 transliterated form
A.2.1.11 transcribed form
A.2.1.12 romanized form
A.2.1.13 symbol
A.2.1.14 formula
A.2.1.15 equation
A.2.1.16 logical expression
A.2.1.17 materials management categories
A.2.1.17.1 sku
A.2.1.17.2 part number
A.2.1.18 phraseological unit
A.2.1.18.1 collocation
A.2.1.18.2 set phrase
A.2.1.19 standard text
A.2.2 grammar
A.2.2.1 part of speech
ERMISSIBLE INSTANCES: Examples of parts of speech commonly documented in
terminology databases can include:
a) noun
b) verb
c) adjective
A.2.2.2 grammatical gender
a) masculine
b) feminine
c) neuter
d) other
A.2.2.3 grammatical number
a) singular
b) plural
c) dual
d) mass noun
e) other
A.2.2.4 animacy
a) animate
b) inanimate
c) other
A.2.2.5 noun class
a) proper noun
b) common noun
A.2.2.6 adjective class
a) proper adjective
b) common adjective
A.2.3 usage
A.2.3.1 usage note
A.2.3.2 geographical usage
A.2.3.3 register
a) neutral register
b) technical register
c) in house register
d) bench level register
e) slang register
f) vulgar register
A.2.3.4 frequency
a) commonly used
b) infrequently used
c) rarely used
A.2.3.5 temporal qualifier
a) archaic term
b) outdated term
c) obsolete term
A.2.3.6 time restriction
A.2.3.7 proprietary restriction
a) trademark
b) trade name
A.2.4 term formation
A.2.4.1 term provenance
a) transdisciplinary borrowing
b) translingual borrowing
c) loan translation
d) neologism
A.2.4.2 etymology
A.2.5 pronunciation
A.2.6 syllabification
A.2.7 hyphenation
A.2.8 morphology
A.2.8.1 morphological element
A.2.8.2 term element
A.2.9 term status
Data categories associated with term status include:
normative authorization
administrative status
process status
language-planning qualifier
A.2.9.1 normative authorization
a) standardized term
b) preferred term
c) admitted term
d) deprecated term
e) superseded term
f) legal term
g) regulated term
A.2.9.2 language-planning qualifier
a) recommended term
b) nonstandardized term
c) proposed term
d) new term
A.2.9.3 administrative status
A.2.9.4 process status
a) unprocessed
b) provisionally processed
c) finalized
A.2.10 degree of synonymy
(Equivalence relations and synonymy could be construed as concept related
rather than term related.)
course this is much too much stuff to burden SKS with and I would never
suggest it. But since some items might be interesting within a thesaurus
environment, the notion of a term note is not a bad idea if it doesn't
overburden the system.
Sue Ellen
11/1/05, Mark van Assem <mark (AT) cs (DOT) vu.nlwrote:
>
>
>
Hi Alistair,
I'm leaving the "use cases" that I wrote about for later, in this post
I to summarize what I think Sue Ellen Wright, Bernard Vatant, Phil
Carlisle and Stella Dextre Clark have been saying (please correct me
if I'm wrong!) [1,2,3,4,5]. I'm hoping Sue and Stella can find time to
provide some more examples.
About the word "term": I agree that it is overloaded. Should I call
"the thing in the range of the skos:prefLabel and skos:altLabel
properties" a Label or a Token?
If a class e.g. Label is introduced instead of the literal currently
defined as the range of skos:prefLabel and skos:altLabel, additional
information can be attached to Labels. The categories of information
that can be attached to instances of a class Label or Token are
(summarizing other people's posts):
- scope notes for terms, also referring to other terms to use [4]
- lexical information about the term [2,5]
- scope of usage of the lexical term [5]
- etymological, register-related, standardization
related [2] (I hope Sue can find time to clarify this further)
(what follows is not summarizing [1-5])
Furthermore, some examples from MeSH [6]:
- TermUI (local identifier)
- date created
- source thesaurus (MeSH groups different thesauri into one)
- abbreviation
which mostly fall under a category "editorial information". I also
remember someone posting that some thesauri attach different
definitions to different terms.
A class Label would make it possible to extend the SKS schema for
categories of information (attached to Label) we can't foresee right now.
You provide an alternative way to attach notes to labels [7]:
ex:conceptA a skos:Concept;
skos:prefLabel 'Animals';
skos:altLabel 'Fauna';
skos:editorialNote [
skos:onLbl 'Fauna';
rdf:value 'Check with Mr.X. whether to keep "Fauna".';
];
I think this is a solution, but in principle this method could be used
everywhere you normally use a class to group information about an
entity. I think the more usual way to do this in RDF or WL is
introduce a class. Also, this is harder to maintain (changing
skos:altLabel 'Fauna' without changing skos:onLbl 'Fauna' leads to
errors).
course we can choose not to support this kind of information
attached to terms, but then we should say so explicitly.
Cheers,
Mark.
--
[1]
[2]
[3]
[4]
[5]
[6]
[7]
--
No.5 | | 9231 bytes |
| 
I do agree with the rant on the word "term". That doesn't mean that there
should be a note related to whatever you choose to use instead (lable?). But
the word "term" is very problematic because each community of practice uses
it in a different way.
Sue Ellen
10/26/05, Miles, AJ (Alistair) <A.J.Miles (AT) rl (DOT) ac.ukwrote:
--
Hi Mark,
Note that I'm referring to use cases other than annotation for
document retrieval, for which I agree you should annotate with the
concept, not the term.
Can you please describe these use cases in detail, explaining in each case
exactly what it is you want to be able to assert, what those assertions
would mean, and what exactly is the nature of the resources involved in
those assertions.
These are just additional arguments on top of
the "we need a Term class to attach properties to" argument
What are these properties? Please list, with an explanation of the meaning
of any assertions made using them.
Fwiw
'Term' is the most hideous word. It means a million different things to a
million different people. A 'term' from a controlled vocabulary, and a
'term' from a terminology are *completely different things* [1][2]. In
metadata applications, 'terms' can be properties of things, or values of
those properties, or classes of things, or meaningless strings, or all of
the above - cf. the 'Dublin Core Metadata Terms' [3]. The SKS Core
Vocabulary Specification [4] uses 'term' to refer to the classes and
properties of the SKS Core Vocabulary itself, a usage that is consistent
with Dublin Core and other RDF documentation.
Because of this incredibly overloaded usage in overlapping fields of
discourse, the SKS Core Guide [5] contains virtually no occurrences of the
character string 'term' in prose. This is *very* deliberate. (I just found a
couple that slipped through, doh.)
The lesson Dublin Core folks have learned is: be precise. The meaning of
several of the properties of the dublin core element set is now so
overloaded in practice as to render them effectively meaningless. This is a
huge problem for the DCMI architecture and usage teams.
If we were to coin a class 'Term' for SKS Core, I'm quite certain that
the incredible variation that would be found in its practical usage would
render it, and all the associated parts of SKS Core, effectively
meaningless. We would be contributing confusion to an already very confused
field of discourse.
Bottom line: If you can define a class of resources that isn't called
'Term', whose meaning is clear and easily defined, whose application is
straightforward and unambiguous, and whose supporting use cases can be
justified by a significant body of practice, then great, let's talk about
it.
If you can't, think outside the box. Think about n-ary relations. If
you're finding it hard to define the nature (i.e. type) of the things
you're trying to relate, perhaps you're conflating resources. Perhaps what
you understand as a 'thesaurus term' is actually an instance of an n-ary
relationship between several things. If you don't like n-ary relations, make
an effort to differentiate what you mean by the word 'term' in all the
different contexts in which you use it, then start defining classes from
there. I'll bet you end up with about 12 classes, almost all of which are
disjoint.
Cheers,
Al.
>
>
>
[1]
[2]
[3]
[4]
[5]
>
>
>
Message
From: Mark van Assem [mailto:mark (AT) cs (DOT) vu.nl]
Sent: 26 2005 12:01
To: Miles, AJ (Alistair)
Cc: public-esw-thes (AT) w3 (DOT) org
Subject: Re: notes at contepts vs notes at terms
--
Hi Alistair,
I don't know how to say this without sounding like an arse
but I'm pretty sure that what you're suggesting
contradicts the basic principles of thesaurus construction
and use, as I've learned them from IS 2788, the new BS 8723,
and directly from folks like Stella and Leonard.
Probably you're right, but I think that some of the thesaurus
folk are
in favour of having a Term class for the reason of attaching
properties to them. The result is that you can have URIs for
them, and
use the terms in the ways I suggest. And I guess that if people find
those useful, they *will*, no matter what any standard is saying. And
I don't think they would be wrong in doing so.
then thesaurus T term <rockand thesaurus T term
<basaltare semantically equivalent tokens.
Yep, in the thesaurus they are, just like (I think) in WN the
WordSenses are equivalent within one Synset. But for some practical
uses (which you agreed to exist for WordSenses) they are not.
Therefore, 'annotating' a document with the thesaurus T
term <basaltis semantically equivalent to 'annotating' the
document with the thesarus T term <rock>. Therefore, there's
no point in doing it.
Would someone using that thesaurus agree that <basaltand <rockare
equivalent?
If you want to say something more specific, using a
thesaurus, then you need a thesaurus that has <basaltas a
preferred term.
But if there isn't any?
Alternatively, use free text keyword annotations.
Note that I'm referring to use cases other than annotation for
document retrieval, for which I agree you should annotate with the
concept, not the term.
The words 'rock' and 'basalt' may have quite different
meanings to you when used in natural language discourse, but
that is completely irrelevant. The word 'rock', and thesarus
T term <rock>, are entirely separate entities.
>
>
>>A more probable/useful scenario is that a prefterm in one
>>language is mapped to
>>a nonpref term in another, because it is a more accurate
>>translation of the
>>word. It enables a more finegrained mapping than just between
>>concepts.
>
>
If you are talking about semantic mapping, then whether you
choose thesaurus T term <rockor thesaurus T term <basalt>
as your mapping target makes no difference to the meaning of
the mapping, because thesaurus T term <rockand thesaurus T
term <basaltare semantically equivalent tokens. Therefore,
if you are talking about semantic mapping, it is not possible
to create a 'more fine-grained mapping' than that which is
possible by mapping between the concepts.
Not on the concept level, but it is possible on the term level?
What is wrong with stating that prefTerm A in language X is usually
displayed/used in texts/ in language Y with nonPrefTerm B?
It gives
you additional information that you are free to ignore, because the
concept-to-concept mappings are implied by term-to-term mappings
(well, if you define your mapping vocabulary in that way). It
may help
e.g. in translation or displays.
Maybe this is not extremely useful, but I don't see anything
fundamentally wrong with it, either.
>
>>A first use is if you are really interested in that specific
>>term instead of its
>>synonyms. For example if you want to count the number of
>>times a certain concept
>>is misspelled. counting the # occurences of a specific term.
>
>
How can you misspell a 'concept'? What are you counting
exactly? What do you mean by an 'occurrence of a specific term'?
A concept cannot be misspelled because it is nameless. You are
counting the terms, not the concept.
N.B. A word, or collocations of words, that appears in a
natural language document, and a thesaurus term that shares
an identical character sequence, are entirely separate
entities. The fact that they share an identical character
sequence allows you to infer absolutely nothing at all.
Why not? course you may need to assume that the meaning of
term and
word overlap, but I think that programmers might just do that.
Am I making any sense?
I can see perfectly clear where you're coming from, and my use cases
may turn out to be complete DB after all, but I do think that people
would try to (ab)use a thesaurus in all kinds of ways, and would not
be wrong in doing so. These are just additional arguments on top of
the "we need a Term class to attach properties to" argument (which is
probably a more compelling argument). And, if we do introduce a Term
class, they are possible uses which we cannot prohibit.
Cheers,
Mark.
No.6 | | 4939 bytes |
| 
Hi Ron, Sue,
Frequently in multilingual thesauri, we want to capture a relationship
of equivalence between different tokens used as altLabels in different
languages. This feature is frequently used in multilingual thesauri such
as the ECD Macrothesaurus, the IL Thesaurus or Agrivoc (there are
undoubtedly others, but these are the ones that I know about). In
Could you give some concrete examples from these thesauri?
I'm hoping that some of the proposals being put forward for
accommodating notes on altLabel tokens will also be able to accommodate
this multilingual requirement.
Sue also refers to a future terminological extension of SKS. I think
it might be possible to make the SKS Core a little bit more complex
(by introducing a class Label or Token as range for alt/prefLabel)
that allows extension, while in the current design (literals as range
for alt/prefLabel) this would be much more difficult to accomodate.
Regards,
Mark.
P.S. sorry for my sloppy wording by using "term"; so deep into this
community-of-practice that I thought it was the right word to use.
Ron
[1] Sorry if this use of the word concept offends but I don't know what
other word to use here.
At 12:29 1/11/2005, Mark van Assem wrote:
>Hi Alistair,
>>
>I'm leaving the "use cases" that I wrote about for later, in this post
>I to summarize what I think Sue Ellen Wright, Bernard Vatant, Phil
>Carlisle and Stella Dextre Clark have been saying (please correct me
>if I'm wrong!) [1,2,3,4,5]. I'm hoping Sue and Stella can find time to
>provide some more examples.
>>
>About the word "term": I agree that it is overloaded. Should I call
>"the thing in the range of the skos:prefLabel and skos:altLabel
>properties" a Label or a Token?
>>
>If a class e.g. Label is introduced instead of the literal currently
>defined as the range of skos:prefLabel and skos:altLabel, additional
>information can be attached to Labels. The categories of information
>that can be attached to instances of a class Label or Token are
>(summarizing other people's posts):
>>
>- scope notes for terms, also referring to other terms to use [4]
>- lexical information about the term [2,5]
>- scope of usage of the lexical term [5]
>- etymological, register-related, standardization
>related [2] (I hope Sue can find time to clarify this further)
>>
>(what follows is not summarizing [1-5])
>>
>Furthermore, some examples from MeSH [6]:
>>
>- TermUI (local identifier)
>- date created
>- source thesaurus (MeSH groups different thesauri into one)
>- abbreviation
>>
>which mostly fall under a category "editorial information". I also
>remember someone posting that some thesauri attach different
>definitions to different terms.
>>
>A class Label would make it possible to extend the SKS schema for
>categories of information (attached to Label) we can't foresee right now.
>>
>You provide an alternative way to attach notes to labels [7]:
>>
>ex:conceptA a skos:Concept;
>skos:prefLabel 'Animals';
>skos:altLabel 'Fauna';
>skos:editorialNote [
>skos:onLbl 'Fauna';
>rdf:value 'Check with Mr.X. whether to keep "Fauna".';
>];
>>
>I think this is a solution, but in principle this method could be used
>everywhere you normally use a class to group information about an
>entity. I think the more usual way to do this in RDF or WL is
>introduce a class. Also, this is harder to maintain (changing
>skos:altLabel 'Fauna' without changing skos:onLbl 'Fauna' leads to
>errors).
>>
>course we can choose not to support this kind of information
>attached to terms, but then we should say so explicitly.
>>
>Cheers,
>Mark.
>>
>>
>[1]
>[2]
>[3]
>[4]
>[5]
>[6]
>[7]
>>
>>
>--
>Mark F.J. van Assem - Vrije Universiteit Amsterdam
>mark (AT) cs (DOT) vu.nl - http://www.cs.vu.nl/~mark
>>
>>
No.7 | | 1853 bytes |
| 
Mark,
Yes, what you makes say makes sense, but only if we are allowed to enter in
a SKS structure a non-preferred concept (Scientific research), and link
that concept to a preferred concept (Research). The term 'Scientific
research' then becomes a prefLabel for a non-preferred concept, i.e. a
concept that is specifically excluded from the thesaurus. And if you accept
that 'Scientific research' labels a non-preferred concept, does that mean
that an altLabel that is a "lonely term" doesn't label a non-preferred
concept? How do we make this distinction?
I don't think this is a helpful avenue to pursue. This whole approach is a
long and very slippery slope, as we discovered within the BS8723 working
group when we tried to replace 'term' with 'concept'.
Ron
At 18:25 1/11/2005, Mark van Assem wrote:
>Hi Ron,
>
>Thanks for the examples, but I'm not sure I understand. Is every entry
>below a "relationship of equivalence between different tokens used as
>altLabels in different languages" ? E.g. does the example below say
>
>>Scientific research
>>USE Research
>>Recherche scientifique
>>EM Recherche
>
>altLabel "Scientific research" equivalent to altLabel "Recherce
>scientifique" ?
>
>This only makes sense if "Research" and "recherce" are prefLabels for the
>concept and the concepts are NT equivalent to each other, right? Else the
>equivalence between the concepts instead of between the labels does the trick.
>
>Mark.
>
Mark F.J. van Assem - Vrije Universiteit Amsterdam
mark (AT) cs (DOT) vu.nl - http://www.cs.vu.nl/~mark
No.8 | | 5569 bytes |
| 
Actually, "lonely terms" or lacunae are a standard concern in detailed
terminology management. Some systems even set up monosemic/mononymic terms
in order to accommodate the fine nuances of distinction that appear in
multilingual environments and then build links between these very granular
entries in order to accommodate these problems.
Bye for now
Sue Ellen
11/1/05, Ron Davies <ron (AT) rondavies (DOT) bewrote:
Hi Mark,
there are hundreds of examples. But because I don't work with a
thesaurus directly they are harder to dig up and it took me ten minutes to
find these ;-)
Scientific research
USE Research
Recherche scientifique
EM Recherche
Decorative arts
USE Fine arts
Arts
EM Beaux arts
Medical costs
USE Health expenditure
C
EM D de
Poultry rearing
USE Aviculture
Elevage de volaille
EMP Aviculture
I hope that helps give you the idea. Monique Bonnichon (once responsible
for Agrovoc) used to refer to single non-descriptors, i.e. non-descriptors
that DIDN'T have correspondences in other languages, as "lonely terms"
because they had no companions in other languages to keep them company. It's
an image I've never been able to get out of my head.
As for Sue Ellen's comment, I don't have a lot of experience with
terminology applications, but I've never seen a terminology database that
used this kind of approach. But again, terminology is a different type of
application.
Ron
At 16:41 1/11/2005, Mark van Assem wrote:
Hi Ron, Sue,
Frequently in multilingual thesauri, we want to capture a relationship of
equivalence between different tokens used as altLabels in different
languages. This feature is frequently used in multilingual thesauri such as
the ECD Macrothesaurus, the IL Thesaurus or Agrivoc (there are undoubtedly
others, but these are the ones that I know about). In
--
Could you give some concrete examples from these thesauri?
I'm hoping that some of the proposals being put forward for accommodating
notes on altLabel tokens will also be able to accommodate this multilingual
requirement.
--
Sue also refers to a future terminological extension of SKS. I think it
might be possible to make the SKS Core a little bit more complex (by
introducing a class Label or Token as range for alt/prefLabel) that allows
extension, while in the current design (literals as range for alt/prefLabel)
this would be much more difficult to accomodate.
Regards,
Mark.
P.S. sorry for my sloppy wording by using "term"; so deep into this
community-of-practice that I thought it was the right word to use.
Ron
[1] Sorry if this use of the word concept offends but I don't know what
other word to use here.
At 12:29 1/11/2005, Mark van Assem wrote:
Hi Alistair,
I'm leaving the "use cases" that I wrote about for later, in this post I
to summarize what I think Sue Ellen Wright, Bernard Vatant, Phil Carlisle
and Stella Dextre Clark have been saying (please correct me if I'm wrong!)
[1,2,3,4,5]. I'm hoping Sue and Stella can find time to provide some more
examples.
About the word "term": I agree that it is overloaded. Should I call "the
thing in the range of the skos:prefLabel and skos:altLabel properties" a
Label or a Token?
If a class e.g. Label is introduced instead of the literal currently
defined as the range of skos:prefLabel and skos:altLabel, additional
information can be attached to Labels. The categories of information that
can be attached to instances of a class Label or Token are (summarizing
other people's posts):
- scope notes for terms, also referring to other terms to use [4]
- lexical information about the term [2,5]
- scope of usage of the lexical term [5]
- etymological, register-related, standardization
related [2] (I hope Sue can find time to clarify this further)
(what follows is not summarizing [1-5])
Furthermore, some examples from MeSH [6]:
- TermUI (local identifier)
- date created
- source thesaurus (MeSH groups different thesauri into one)
- abbreviation
which mostly fall under a category "editorial information". I also
remember someone posting that some thesauri attach different definitions to
different terms.
A class Label would make it possible to extend the SKS schema for
categories of information (attached to Label) we can't foresee right now.
You provide an alternative way to attach notes to labels [7]:
ex:conceptA a skos:Concept;
skos:prefLabel 'Animals';
skos:altLabel 'Fauna';
skos:editorialNote [
skos:onLbl 'Fauna';
rdf:value 'Check with Mr.X. whether to keep "Fauna".';
];
I think this is a solution, but in principle this method could be used
everywhere you normally use a class to group information about an entity. I
think the more usual way to do this in RDF or WL is introduce a class.
Also, this is harder to maintain (changing skos:altLabel 'Fauna' without
changing skos:onLbl 'Fauna' leads to errors).
course we can choose not to support this kind of information attached
to terms, but then we should say so explicitly.
Cheers,
Mark.
--
[1]
[2]
[3]
[4]
[5]
[6]
[7]
--
No.9 | | 665 bytes |
| 
Hi Ron,
Thanks for the examples, but I'm not sure I understand. Is every entry
below a "relationship of equivalence between different tokens used as
altLabels in different languages" ? E.g. does the example below say
Scientific research
USE Research
Recherche scientifique
EM Recherche
altLabel "Scientific research" equivalent to altLabel "Recherce
scientifique" ?
This only makes sense if "Research" and "recherce" are prefLabels for
the concept and the concepts are NT equivalent to each other, right?
Else the equivalence between the concepts instead of between the
labels does the trick.
Mark.
No.10 | | 5537 bytes |
| 
Don't apologize for "sloppy" use of "term". Term is an accepted term in each
of our subdisciplines, but it's just problematic because it means something
different in each context. And of course, we are each convinced that our own
personal "revelation" is the correct one after all!
Bye for now
Sue Ellen
11/1/05, Mark van Assem <mark (AT) cs (DOT) vu.nlwrote:
Hi Ron, Sue,
Frequently in multilingual thesauri, we want to capture a relationship
of equivalence between different tokens used as altLabels in different
languages. This feature is frequently used in multilingual thesauri such
as the ECD Macrothesaurus, the IL Thesaurus or Agrivoc (there are
undoubtedly others, but these are the ones that I know about). In
Could you give some concrete examples from these thesauri?
I'm hoping that some of the proposals being put forward for
accommodating notes on altLabel tokens will also be able to accommodate
this multilingual requirement.
Sue also refers to a future terminological extension of SKS. I think
it might be possible to make the SKS Core a little bit more complex
(by introducing a class Label or Token as range for alt/prefLabel)
that allows extension, while in the current design (literals as range
for alt/prefLabel) this would be much more difficult to accomodate.
Regards,
Mark.
P.S. sorry for my sloppy wording by using "term"; so deep into this
community-of-practice that I thought it was the right word to use.
Ron
[1] Sorry if this use of the word concept offends but I don't know what
other word to use here.
At 12:29 1/11/2005, Mark van Assem wrote:
>
>Hi Alistair,
>>
>I'm leaving the "use cases" that I wrote about for later, in this post
>I to summarize what I think Sue Ellen Wright, Bernard Vatant, Phil
>Carlisle and Stella Dextre Clark have been saying (please correct me
>if I'm wrong!) [1,2,3,4,5]. I'm hoping Sue and Stella can find time to
>provide some more examples.
>>
>About the word "term": I agree that it is overloaded. Should I call
>"the thing in the range of the skos:prefLabel and skos:altLabel
>properties" a Label or a Token?
>>
>If a class e.g. Label is introduced instead of the literal currently
>defined as the range of skos:prefLabel and skos:altLabel, additional
>information can be attached to Labels. The categories of information
>that can be attached to instances of a class Label or Token are
>(summarizing other people's posts):
>>
>- scope notes for terms, also referring to other terms to use [4]
>- lexical information about the term [2,5]
>- scope of usage of the lexical term [5]
>- etymological, register-related, standardization
>related [2] (I hope Sue can find time to clarify this further)
>>
>(what follows is not summarizing [1-5])
>>
>Furthermore, some examples from MeSH [6]:
>>
>- TermUI (local identifier)
>- date created
>- source thesaurus (MeSH groups different thesauri into one)
>- abbreviation
>>
>which mostly fall under a category "editorial information". I also
>remember someone posting that some thesauri attach different
>definitions to different terms.
>>
>A class Label would make it possible to extend the SKS schema for
>categories of information (attached to Label) we can't foresee right
now.
>>
>You provide an alternative way to attach notes to labels [7]:
>>
>ex:conceptA a skos:Concept;
>skos:prefLabel 'Animals';
>skos:altLabel 'Fauna';
>skos:editorialNote [
>skos:onLbl 'Fauna';
>rdf:value 'Check with Mr.X. whether to keep "Fauna".';
>];
>>
>I think this is a solution, but in principle this method could be used
>everywhere you normally use a class to group information about an
>entity. I think the more usual way to do this in RDF or WL is
>introduce a class. Also, this is harder to maintain (changing
>skos:altLabel 'Fauna' without changing skos:onLbl 'Fauna' leads to
>errors).
>>
>course we can choose not to support this kind of information
>attached to terms, but then we should say so explicitly.
>>
>Cheers,
>Mark.
>>
>>
>[1]
>[2]
>>
[3]
>[4]
>[5]
>[6]
>>
[7]
>>
>>
>--
>Mark F.J. van Assem - Vrije Universiteit Amsterdam
>mark (AT) cs (DOT) vu.nl - http://www.cs.vu.nl/~mark
>>
>>
>
No.11 | | 5623 bytes |
| 
Hi Mark,
there are hundreds of examples. But because I don't work with a
thesaurus directly they are harder to dig up and it took me ten minutes to
find these ;-)
Scientific research
USE Research
Recherche scientifique
EM Recherche
Decorative arts
USE Fine arts
Arts
EM Beaux arts
Medical costs
USE Health expenditure
C
EM D de
Poultry rearing
USE Aviculture
Elevage de volaille
EMP Aviculture
I hope that helps give you the idea. Monique Bonnichon (once responsible
for Agrovoc) used to refer to single non-descriptors, i.e. non-descriptors
that DIDN'T have correspondences in other languages, as "lonely terms"
because they had no companions in other languages to keep them company.
It's an image I've never been able to get out of my head.
As for Sue Ellen's comment, I don't have a lot of experience with
terminology applications, but I've never seen a terminology database that
used this kind of approach. But again, terminology is a different type of
application.
Ron
At 16:41 1/11/2005, Mark van Assem wrote:
>Hi Ron, Sue,
>
>>Frequently in multilingual thesauri, we want to capture a relationship of
>>equivalence between different tokens used as altLabels in different
>>languages. This feature is frequently used in multilingual thesauri such
>>as the ECD Macrothesaurus, the IL Thesaurus or Agrivoc (there are
>>undoubtedly others, but these are the ones that I know about). In
>
>Could you give some concrete examples from these thesauri?
>
>>I'm hoping that some of the proposals being put forward for accommodating
>>notes on altLabel tokens will also be able to accommodate this
>>multilingual requirement.
>
>Sue also refers to a future terminological extension of SKS. I think it
>might be possible to make the SKS Core a little bit more complex (by
>introducing a class Label or Token as range for alt/prefLabel) that allows
>extension, while in the current design (literals as range for
>alt/prefLabel) this would be much more difficult to accomodate.
>
>Regards,
>Mark.
>
>P.S. sorry for my sloppy wording by using "term"; so deep into this
>community-of-practice that I thought it was the right word to use.
>
>>Ron
>>[1] Sorry if this use of the word concept offends but I don't know what
>>other word to use here.
>>At 12:29 1/11/2005, Mark van Assem wrote:
>>
Hi Alistair,
I'm leaving the "use cases" that I wrote about for later, in this post I
to summarize what I think Sue Ellen Wright, Bernard Vatant, Phil
Carlisle and Stella Dextre Clark have been saying (please correct me if
I'm wrong!) [1,2,3,4,5]. I'm hoping Sue and Stella can find time to
provide some more examples.
About the word "term": I agree that it is overloaded. Should I call "the
thing in the range of the skos:prefLabel and skos:altLabel properties" a
Label or a Token?
If a class e.g. Label is introduced instead of the literal currently
defined as the range of skos:prefLabel and skos:altLabel, additional
information can be attached to Labels. The categories of information
that can be attached to instances of a class Label or Token are
(summarizing other people's posts):
scope notes for terms, also referring to other terms to use [4]
lexical information about the term [2,5]
scope of usage of the lexical term [5]
etymological, register-related, standardization
related [2] (I hope Sue can find time to clarify this further)
(what follows is not summarizing [1-5])
Furthermore, some examples from MeSH [6]:
TermUI (local identifier)
date created
source thesaurus (MeSH groups different thesauri into one)
abbreviation
which mostly fall under a category "editorial information". I also
remember someone posting that some thesauri attach different definitions
to different terms.
A class Label would make it possible to extend the SKS schema for
categories of information (attached to Label) we can't foresee right now.
You provide an alternative way to attach notes to labels [7]:
ex:conceptA a skos:Concept;
skos:prefLabel 'Animals';
skos:altLabel 'Fauna';
skos:editorialNote [
skos:onLbl 'Fauna';
rdf:value 'Check with Mr.X. whether to keep "Fauna".';
];
I think this is a solution, but in principle this method could be used
everywhere you normally use a class to group information about an
entity. I think the more usual way to do this in RDF or WL is introduce
a class. Also, this is harder to maintain (changing skos:altLabel
'Fauna' without changing skos:onLbl 'Fauna' leads to errors).
course we can choose not to support this kind of information attached
to terms, but then we should say so explicitly.
Cheers,
Mark.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Mark F.J. van Assem - Vrije Universiteit Amsterdam
mark (AT) cs (DOT) vu.nl - http://www.cs.vu.nl/~mark
Mark F.J. van Assem - Vrije Universiteit Amsterdam
mark (AT) cs (DOT) vu.nl - http://www.cs.vu.nl/~mark
No.12 | | 15110 bytes |
| 
In terminology management, history notes, editorial notes, and for us,
administrative notes can all be attached to the term. Multiple definitions
can be used at some stages in terminology management and in descriptive
work, but we would never construe a definition as term-related rather than
concept related, in which case we would begin to spit entries and treat
these concepts as associated with homophones. But I would not argue against
the practice in thesauri because they behave differently.
Actually, I don't have a problem with using term as long as we know the
context where we are using it.
Bye for now
Sue Ellen
11/1/05, Stella Dextre Clarke <sdclarke (AT) lukehouse (DOT) demon.co.ukwrote:
This time I don't see this quite the same way as Sue, or Alistair for that
matter. I agree that the term "term" may be used in a lot of different
contexts; I agree this can cause confusion in communications. But I don't
believe you can get rid of "term". Terms do happen to exist, they are very
important in thesauri, and we have to deal with them, whether we like the
name or not. If necessary, we could call them "thesaurus terms", but we
cannot pretend they are not there, and we *do* need to be able to refer to
them without calling them "concepts" (because they are *not* concepts - they
only represent concepts.)
Moving on from that, I must try to fulfil my promise to provide examples
of when thesaurus editors may like to attach notes to terms *not* concepts.
You may find the examples more convincing if you imagine them all being
applied to non-preferred terms:
1. History notes.
For example, a non-preferred term "Beagles" might need the following
history note: 'Previously a non-preferred term of "Dogs"; became a
non-preferred term of "Hounds" when the latter was introduced as a preferred
term in 2003.'
As it happens I have never myself used this type of note, and we have not
provided for it (yet) in BS8723. But I have been sorely tempted on several
occasions during a recent project. course, it is possible to attach the
information to the concept History Note(s) - in this case you'd need to say
something in the HNs of both "Dogs" and "Hounds" - but it gets cumbersome
2. Editorial Notes.
Example A: "Term proposed for upgrading to preferred status on 2004-10-01
Proposal rejected on grounds of File reference XYZ-123"
Example B: "Term requested by Bloggins on 2002-03-03"
Example C: "Term source: ABC Thesaurus"
In a recent project I have been merging three vocabularies into one, and
there are vested interests behind the retention of some terms that might
otherwise have been dropped. Sometimes it is useful to keep an audit trail
of exactly where the term came from, who wants it, why they want it, and
what arguments have already been had about it. Some of the arguments may be
about the underlying concept; but sometimes they are really focussed on a
particular term.
3. Definitions.
Sometimes it is useful to retain definitions of terms gleaned from various
sources - even when several definitions for the same term conflict with each
other. They do *not* constitute definitions of the concept that is wanted
for retrieval purposes. But they may come in handy when thesaurus changes
are proposed, or for associated scholarly work. To see examples, look at the
AAT (
).
Look at the record for any preferred term - take "drug jars" for example.
Last time I looked, 14 different non-preferred terms were listed, and for
each of these there was a reference to the sources where it was found e.g
Webster's Dictionary, the ED, Spillman's "Glass Bottles", etc. Not everyone
can afford to do scholarly work on this scale, and you could say the AAT is
an example in a class of its own. But work like this does happen, you do
find it in real live thesauri, and people do want to exchange such data.
4. Mappings
I've heard some people say they want to be able to map to/from
non-preferred terms (separately from the mappings between their
corresponding preferred terms). I've yet to be convinced of this in a real
case, but some people do believe in it strongly.
K, I hope that's enough examples. I agree with the argument that a
capability for having notes on terms is not nearly such a high priority as
that for notes on concepts. But the need occurs commonly enough to make a
case for accommodating it in a model that aims to be comprehensive. Perhaps
it could be in a model for more advanced users, so as not to create
unnecessary difficulties for users with simpler needs?
Then there's a parallel argument, the one Ron raised about relationships
between non-preferred terms in different languages of one multilingual
thesauri. He and I have discussed this before, and he knows I'm not keen on
this practice. (It has a lot in common with the case of mappings, mentioned
above.) But he is right to say that a number of well-known multilingual
thesauri do follow this practice. If you want to keep their editors on side,
you have to provide for their needs.
Plenty to keep us all busy thinking
Stella
Stella Dextre Clarke
Information Consultant
Luke House, West Hendred, Wantage, , X12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298
SDClarke (AT) LukeHouse (DOT) demon.co.uk
Message
*From:* public-esw-thes-request (AT) w3 (DOT) org [mailto:
public-esw-thes-request (AT) w3 (DOT) org] * Behalf *Sue Ellen Wright
*Sent:* 01 November 2005 15:13
*To:* Miles, AJ (Alistair)
*Cc:* Mark van Assem; public-esw-thes (AT) w3 (DOT) org
*Subject:* Re: notes at contepts vs notes at terms
I do agree with the rant on the word "term". That doesn't mean that there
should be a note related to whatever you choose to use instead (lable?). But
the word "term" is very problematic because each community of practice uses
it in a different way.
Sue Ellen
10/26/05, Miles, AJ (Alistair) <A.J.Miles (AT) rl (DOT) ac.ukwrote:
--
Hi Mark,
Note that I'm referring to use cases other than annotation for
document retrieval, for which I agree you should annotate with the
concept, not the term.
Can you please describe these use cases in detail, explaining in each
case exactly what it is you want to be able to assert, what those assertions
would mean, and what exactly is the nature of the resources involved in
those assertions.
These are just additional arguments on top of
the "we need a Term class to attach properties to" argument
What are these properties? Please list, with an explanation of the
meaning of any assertions made using them.
Fwiw
'Term' is the most hideous word. It means a million different things to
a million different people. A 'term' from a controlled vocabulary, and a
'term' from a terminology are *completely different things* [1][2]. In
metadata applications, 'terms' can be properties of things, or values of
those properties, or classes of things, or meaningless strings, or all of
the above - cf. the 'Dublin Core Metadata Terms' [3]. The SKS Core
Vocabulary Specification [4] uses 'term' to refer to the classes and
properties of the SKS Core Vocabulary itself, a usage that is consistent
with Dublin Core and other RDF documentation.
Because of this incredibly overloaded usage in overlapping fields of
discourse, the SKS Core Guide [5] contains virtually no occurrences of the
character string 'term' in prose. This is *very* deliberate. (I just found a
couple that slipped through, doh.)
The lesson Dublin Core folks have learned is: be precise. The meaning of
several of the properties of the dublin core element set is now so
overloaded in practice as to render them effectively meaningless. This is a
huge problem for the DCMI architecture and usage teams.
If we were to coin a class 'Term' for SKS Core, I'm quite certain that
the incredible variation that would be found in its practical usage would
render it, and all the associated parts of SKS Core, effectively
meaningless. We would be contributing confusion to an already very confused
field of discourse.
Bottom line: If you can define a class of resources that isn't called
'Term', whose meaning is clear and easily defined, whose application is
straightforward and unambiguous, and whose supporting use cases can be
justified by a significant body of practice, then great, let's talk about
it.
If you can't, think outside the box. Think about n-ary relations. If
you're finding it hard to define the nature (i.e. type) of the things
you're trying to relate, perhaps you're conflating resources. Perhaps what
you understand as a 'thesaurus term' is actually an instance of an n-ary
relationship between several things. If you don't like n-ary relations, make
an effort to differentiate what you mean by the word 'term' in all the
different contexts in which you use it, then start defining classes from
there. I'll bet you end up with about 12 classes, almost all of which are
disjoint.
Cheers,
Al.
>
>
>
[1]
[2]
[3]
[4]
[5]
>
>
>
Message
From: Mark van Assem [mailto:mark (AT) cs (DOT) vu.nl]
Sent: 26 2005 12:01
To: Miles, AJ (Alistair)
Cc: public-esw-thes (AT) w3 (DOT) org
Subject: Re: notes at contepts vs notes at terms
--
Hi Alistair,
I don't know how to say this without sounding like an arse
but I'm pretty sure that what you're suggesting
contradicts the basic principles of thesaurus construction
and use, as I've learned them from IS 2788, the new BS 8723,
and directly from folks like Stella and Leonard.
Probably you're right, but I think that some of the thesaurus
folk are
in favour of having a Term class for the reason of attaching
properties to them. The result is that you can have URIs for
them, and
use the terms in the ways I suggest. And I guess that if people find
those useful, they *will*, no matter what any standard is saying. And
I don't think they would be wrong in doing so.
then thesaurus T term <rockand thesaurus T term
<basaltare semantically equivalent tokens.
Yep, in the thesaurus they are, just like (I think) in WN the
WordSenses are equivalent within one Synset. But for some practical
uses (which you agreed to exist for WordSenses) they are not.
Therefore, 'annotating' a document with the thesaurus T
term <basaltis semantically equivalent to 'annotating' the
document with the thesarus T term <rock>. Therefore, there's
no point in doing it.
Would someone using that thesaurus agree that <basaltand <rockare
equivalent?
If you want to say something more specific, using a
thesaurus, then you need a thesaurus that has <basaltas a
preferred term.
But if there isn't any?
Alternatively, use free text keyword annotations.
Note that I'm referring to use cases other than annotation for
document retrieval, for which I agree you should annotate with the
concept, not the term.
The words 'rock' and 'basalt' may have quite different
meanings to you when used in natural language discourse, but
that is completely irrelevant. The word 'rock', and thesarus
T term <rock>, are entirely separate entities.
>
>
>>A more probable/useful scenario is that a prefterm in one
>>language is mapped to
>>a nonpref term in another, because it is a more accurate
>>translation of the
>>word. It enables a more finegrained mapping than just between
>>concepts.
>
>
If you are talking about semantic mapping, then whether you
choose thesaurus T term <rockor thesaurus T term <basalt>
as your mapping target makes no difference to the meaning of
the mapping, because thesaurus T term <rockand thesaurus T
term <basaltare semantically equivalent tokens. Therefore,
if you are talking about semantic mapping, it is not possible
to create a 'more fine-grained mapping' than that which is
possible by mapping between the concepts.
Not on the concept level, but it is possible on the term level?
What is wrong with stating that prefTerm A in language X is usually
displayed/used in texts/ in language Y with nonPrefTerm B?
It gives
you additional information that you are free to ignore, because the
concept-to-concept mappings are implied by term-to-term mappings
(well, if you define your mapping vocabulary in that way). It
may help
e.g. in translation or displays.
Maybe this is not extremely useful, but I don't see anything
fundamentally wrong with it, either.
>
>>A first use is if you are really interested in that specific
>>term instead of its
>>synonyms. For example if you want to count the number of
>>times a certain concept
>>is misspelled. counting the # occurences of a specific term.
>
>
How can you misspell a 'concept'? What are you counting
exactly? What do you mean by an 'occurrence of a specific term'?
A concept cannot be misspelled because it is nameless. You are
counting the terms, not the concept.
N.B. A word, or collocations of words, that appears in a
natural language document, and a thesaurus term that shares
an identical character sequence, are entirely separate
entities. The fact that they share an identical character
sequence allows you to infer absolutely nothing at all.
Why not? course you may need to assume that the meaning of
term and
word overlap, but I think that programmers might just do that.
Am I making any sense?
I can see perfectly clear where you're coming from, and my use cases
may turn out to be complete DB after all, but I do think that people
would try to (ab)use a thesaurus in all kinds of ways, and would not
be wrong in doing so. These are just additional arguments on top of
the "we need a Term class to attach properties to" argument (which is
probably a more compelling argument). And, if we do introduce a Term
class, they are possible uses which we cannot prohibit.
Cheers,
Mark.
No.13 | | 1498 bytes |
| 
We were talking about non-preferred terms and lacunae in certain languages.
Your examples here of "USE Research" is wonderful in this respect: in German
there is a strict distinction between *Forschung*, which is original
investigative, experimental research, and *Recherche*, which is research
involving the collection of information and data from existing sources.
Both, of course, can be scientific in nature. If I were mapping an English
thesaurus using this heading to a similar German one, I'd need to be able to
split the concept. German colleagues are inevitably miffed that we don't
make the same distinction in English and French, but of course, they have
stolen our term in order to split theirs.
Sue Ellen
11/1/05, Mark van Assem <mark (AT) cs (DOT) vu.nlwrote:
Hi Ron,
Thanks for the examples, but I'm not sure I understand. Is every entry
below a "relationship of equivalence between different tokens used as
altLabels in different languages" ? E.g. does the example below say
Scientific research
USE Research
Recherche scientifique
EM Recherche
altLabel "Scientific research" equivalent to altLabel "Recherce
scientifique" ?
This only makes sense if "Research" and "recherce" are prefLabels for
the concept and the concepts are NT equivalent to each other, right?
Else the equivalence between the concepts instead of between the
labels does the trick.
Mark.