Standards

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Performance issues with OWL Reasoners => subclass vs instance-of

    14 answers - 1892 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    With InstanceStore, the genes and gene products are treated as owl
    individuals - belonging to the ABox. However, the ontologically
    correct representation recognises that p53 is the name of a universal
    that is instantiated in trillions of cells, and not the name of an
    individual region of DNA in an individual nucleus, and thus best
    represented in the TBox. This is how we are thinking of presenting G
    anntations in WL. This is obviously problematic from a practical PV.
    It seems we need general patterns for transforming certain subsets of
    TBoxes into ABoxes for the purposes of reasoning. Any thoughts on how
    this should be done?
    Chris
    [VK] Actually this gets to the heart of one of the design issues identified by
    the BINT Task Force in designing the Parkinson's Disease
    Is a given gene a subclass or an instance-of a general "Gene" class.
    (point number 2)
    An initial draft of the ontology is available at:
    , if mapping into instances gives better performance for a given set of
    inferences, that might be the basis of choosing the instance-of relationship.
    Towards this end I have the following questions for Phil:
    1. What are the set of Abox inferences implemented in the G example?
    2. What would be the corresponding set of TBox inferences implemented if the
    design choice proposed by Chris was adopted, i.e., p53 is a subclass of Gene
    (assuming a general "Gene" class)
    3. What are the performance and scalability implications of (1) vs (2)
    4. What are the expressiveness implications of (1) vs (2), i.e., can we express
    some statements using subclass-of based modeling which are not possible using
    instance-of modeling; or vice versa
    Look forward to a good use case illustrating the above and discussing its
    possible consequences.
    Thanks,
  • No.1 | | 2755 bytes | |

    Chris is right, but the IS itself has no view on the matter. it does,
    I believe, play some tricks inside making instances classes to do the
    reasoning. What the user sees are instances. When we use the IS to
    classify proteins, we have a class "p53" and we translate all the
    genes in a genome into their protein and classify. In this way we
    find instances of the defined proteins classes (in our case
    phosphtases). it is not "real" in that the one gene is only
    trnslated into one protein, but it is still an instance, it is simply
    not realistic. We weren't doing syste4m bioogy!

    robert.
    At 01:12 15/09/2006, Kashyap, Vipul wrote:

    With InstanceStore, the genes and gene products are treated as owl
    individuals - belonging to the ABox. However, the ontologically
    correct representation recognises that p53 is the name of a universal
    that is instantiated in trillions of cells, and not the name of an
    individual region of DNA in an individual nucleus, and thus best
    represented in the TBox. This is how we are thinking of presenting G
    anntations in WL. This is obviously problematic from a practical PV.

    It seems we need general patterns for transforming certain subsets of
    TBoxes into ABoxes for the purposes of reasoning. Any thoughts on how
    this should be done?

    Chris
    >
    >[VK] Actually this gets to the heart of one of the design issues identified by
    >the BINT Task Force in designing the Parkinson's Disease
    >Is a given gene a subclass or an instance-of a general "Gene" class.
    >

    (point number 2)
    >
    >An initial draft of the ontology is available at:
    >
    >
    >
    >, if mapping into instances gives better performance for a
    >given set of
    >inferences, that might be the basis of choosing the instance-of relationship.
    >Towards this end I have the following questions for Phil:
    >
    >1. What are the set of Abox inferences implemented in the G example?
    >2. What would be the corresponding set of TBox inferences implemented if the
    >design choice proposed by Chris was adopted, i.e., p53 is a subclass of Gene
    >(assuming a general "Gene" class)
    >3. What are the performance and scalability implications of (1) vs (2)
    >4. What are the expressiveness implications of (1) vs (2), i.e., can
    >we express
    >some statements using subclass-of based modeling which are not possible using
    >instance-of modeling; or vice versa
    >
    >Look forward to a good use case illustrating the above and discussing its
    >possible consequences.
    >
    >Thanks,
    >
    >
  • No.2 | | 2755 bytes | |

    Chris is right, but the IS itself has no view on the matter. it does,
    I believe, play some tricks inside making instances classes to do the
    reasoning. What the user sees are instances. When we use the IS to
    classify proteins, we have a class "p53" and we translate all the
    genes in a genome into their protein and classify. In this way we
    find instances of the defined proteins classes (in our case
    phosphtases). it is not "real" in that the one gene is only
    trnslated into one protein, but it is still an instance, it is simply
    not realistic. We weren't doing syste4m bioogy!

    robert.
    At 01:12 15/09/2006, Kashyap, Vipul wrote:

    With InstanceStore, the genes and gene products are treated as owl
    individuals - belonging to the ABox. However, the ontologically
    correct representation recognises that p53 is the name of a universal
    that is instantiated in trillions of cells, and not the name of an
    individual region of DNA in an individual nucleus, and thus best
    represented in the TBox. This is how we are thinking of presenting G
    anntations in WL. This is obviously problematic from a practical PV.

    It seems we need general patterns for transforming certain subsets of
    TBoxes into ABoxes for the purposes of reasoning. Any thoughts on how
    this should be done?

    Chris
    >
    >[VK] Actually this gets to the heart of one of the design issues identified by
    >the BINT Task Force in designing the Parkinson's Disease
    >Is a given gene a subclass or an instance-of a general "Gene" class.
    >

    (point number 2)
    >
    >An initial draft of the ontology is available at:
    >
    >
    >
    >, if mapping into instances gives better performance for a
    >given set of
    >inferences, that might be the basis of choosing the instance-of relationship.
    >Towards this end I have the following questions for Phil:
    >
    >1. What are the set of Abox inferences implemented in the G example?
    >2. What would be the corresponding set of TBox inferences implemented if the
    >design choice proposed by Chris was adopted, i.e., p53 is a subclass of Gene
    >(assuming a general "Gene" class)
    >3. What are the performance and scalability implications of (1) vs (2)
    >4. What are the expressiveness implications of (1) vs (2), i.e., can
    >we express
    >some statements using subclass-of based modeling which are not possible using
    >instance-of modeling; or vice versa
    >
    >Look forward to a good use case illustrating the above and discussing its
    >possible consequences.
    >
    >Thanks,
    >
    >
  • No.3 | | 2772 bytes | |

    "KV" == Kashyap, Vipul <VKASHYAP1@PARTNERSRGwrites:

    KV, if mapping into instances gives better performance
    KVfor a given set of inferences, that might be the basis of
    KVchoosing the instance-of relationship. Towards this end I have
    KVthe following questions for Phil:

    KV1. What are the set of Abox inferences implemented in the G
    KVexample?

    In that example, there aren't any. At that stage, the instance store
    was not doing ABox reasoning at all, just TBox, made to look like
    ABox.

    The system is richer now, and you can express some relationship
    between individuals in the ABox (as well as any expressivity you like
    in the TBox). But, I don't have details, I am afraid.

    KV2. What would be the corresponding set of TBox inferences
    KVimplemented if the
    KVdesign choice proposed by Chris was adopted, i.e., p53 is a
    KVsubclass of Gene (assuming a general "Gene" class)

    I am presuming by "set of inferences" you mean, what can you express?
    The TBox supports WL-DL in full. Actually, as the InstanceStore punts
    much of the work to the reasoner, without limits this is constrainted
    by the reasoner not the instancestore per se. So it does what ever you
    reasoner does.

    KV3. What are the performance and scalability implications of (1)
    KVvs (2)

    ABox reasoning is harder than TBox. As is the way with DL, exactly
    what the implications are, depends on exactly what you express and I
    am not really an expert.

    KV4. What are the expressiveness implications of (1) vs (2), i.e.,
    KVcan we express
    KVsome statements using subclass-of based modeling which are not
    KVpossible using instance-of modeling; or vice versa

    KVLook forward to a good use case illustrating the above and
    KVdiscussing its possible consequences.

    The limitation is that if you're entities are in the ABox in this
    case, there are a very limited number of things that you can say about
    their relationships to other entities in the ABox, although you have
    the full expressivity of WL to relate them to the TBox. Flip side, is
    that if you put everything into the TBox, then you get nothing from
    the relational backend of the instancestore. In the G example, for
    instance, you could put all the associations into a reason as modelled
    as WL classes, but the reasoner will probably not scale to 6 million
    instances.

    Separating entities into ABox and TBox depending on how many of them
    there are is, of course, unsatisfying from an ontological perspective,
    but as you are asking about scalability of computational reasoning I
    don't think you have any choice but to be pragmatic.

    Phil
  • No.4 | | 5872 bytes | |

    Hi All,

    Just as a clarification for the less informed - myself included -
    we're discussing the subtle and extremely difficult aspects of
    creating knowledge maps/annotation repositories/KBs/KR repositories
    (what have you) ultimately capable of supporting reasoning (simple
    classification through more complex reasoning) for both UNIVERSALS
    and INSTANCES.

    Some DEFINITINS:

    CLASSes represent UNIVERSALs or TYPEs. The TBox is the set of
    CLASSes and the ASSERTINs associated with CLASSes.

    INSTANCEs represent EXISTENTIALs or INDIVIDUALs instantiating a CLASS
    in the real world. The ABox is the set of INSTANCEs and the
    ASSERTINs associated with those INSTANCEs.

    Properly specified CLASSes are defined in the context of the
    INSTANCEs whose PRPERTIES and RELATINs they formally represent.

    Properly specified INSTANCEs are defined via their reference to an
    appropriate set of CLASSes.

    Reasoners (RacerPro, Pellet, FACT++) generally have optimizations
    specific to either reasoning on the TBox or reasoning on the ABox,
    but it's difficult (i.e., no existing examples experts such as Phil
    and others can cite) to optimize both for reasoning on the TBox, the
    ABox AND - most importantly - TBox + ABox (across these sets).

    All of us trying to apply ontology-based formalisms to create machine-
    parsable representations of real world biomedical continuants and
    occurents have banged our heads bloody against this UNIVERSAL-
    EXISTENTIAL border. Even determining which of the many biomedical
    informatic resources to employ when you seek to reference relevant
    UNIVERSALs can be an very difficult task. We're in the midst an
    extended debate within the BIRN Task Force on how best to do
    this for proteins relevant to cross-species representation of
    neurodegenerative disease such as Glial Fibrillary Acidic Protein
    (GFAP)).

    I strongly encourage the experts to please clarify, embellish, or
    correct the above definitions as they see fit for the edification of
    all us disciples. :-)

    Cheers,
    Bill

    Sep 15, 2006, at 8:30 AM, Phillip Lord wrote:

    "KV" == Kashyap, Vipul <VKASHYAP1@PARTNERSRGwrites:

    KV, if mapping into instances gives better performance
    KVfor a given set of inferences, that might be the basis of
    KVchoosing the instance-of relationship. Towards this end I have
    KVthe following questions for Phil:

    KV1. What are the set of Abox inferences implemented in the G
    KVexample?

    In that example, there aren't any. At that stage, the instance store
    was not doing ABox reasoning at all, just TBox, made to look like
    ABox.

    The system is richer now, and you can express some relationship
    between individuals in the ABox (as well as any expressivity you like
    in the TBox). But, I don't have details, I am afraid.
    --
    KV2. What would be the corresponding set of TBox inferences
    KVimplemented if the
    KVdesign choice proposed by Chris was adopted, i.e., p53 is a
    KVsubclass of Gene (assuming a general "Gene" class)

    I am presuming by "set of inferences" you mean, what can you express?
    The TBox supports WL-DL in full. Actually, as the InstanceStore punts
    much of the work to the reasoner, without limits this is constrainted
    by the reasoner not the instancestore per se. So it does what ever you
    reasoner does.
    >
    >
    >

    KV3. What are the performance and scalability implications of (1)
    KVvs (2)

    ABox reasoning is harder than TBox. As is the way with DL, exactly
    what the implications are, depends on exactly what you express and I
    am not really an expert.
    --
    KV4. What are the expressiveness implications of (1) vs (2), i.e.,
    KVcan we express
    KVsome statements using subclass-of based modeling which are not
    KVpossible using instance-of modeling; or vice versa

    KVLook forward to a good use case illustrating the above and
    KVdiscussing its possible consequences.
    --
    The limitation is that if you're entities are in the ABox in this
    case, there are a very limited number of things that you can say about
    their relationships to other entities in the ABox, although you have
    the full expressivity of WL to relate them to the TBox. Flip side, is
    that if you put everything into the TBox, then you get nothing from
    the relational backend of the instancestore. In the G example, for
    instance, you could put all the associations into a reason as modelled
    as WL classes, but the reasoner will probably not scale to 6 million
    instances.

    Separating entities into ABox and TBox depending on how many of them
    there are is, of course, unsatisfying from an ontological perspective,
    but as you are asking about scalability of computational reasoning I
    don't think you have any choice but to be pragmatic.

    Phil

    Bill Bug
    Senior Research Analyst/ Engineer

    Laboratory for Bioimaging & Anatomical Informatics
    www.neuroterrain.org
    Department of Neurobiology & Anatomy
    Drexel University College of Medicine
    2900 Queen Lane
    Philadelphia, PA 19129
    215 991 8430 (ph)
    610 457 0443 (mobile)
    215 843 9367 (fax)

    Please Note: I now have a new email - William.Bug (AT) DrexelMed (DOT) edu

    This email and any accompanying attachments are confidential.
    This information is intended solely for the use of the individual
    to whom it is addressed. Any review, disclosure, copying,
    distribution, or use of this email communication by others is strictly
    prohibited. If you are not the intended recipient please notify us
    immediately by returning this message to the sender and delete
    all copies. Thank you for your cooperation.
  • No.5 | | 5872 bytes | |

    Hi All,

    Just as a clarification for the less informed - myself included -
    we're discussing the subtle and extremely difficult aspects of
    creating knowledge maps/annotation repositories/KBs/KR repositories
    (what have you) ultimately capable of supporting reasoning (simple
    classification through more complex reasoning) for both UNIVERSALS
    and INSTANCES.

    Some DEFINITINS:

    CLASSes represent UNIVERSALs or TYPEs. The TBox is the set of
    CLASSes and the ASSERTINs associated with CLASSes.

    INSTANCEs represent EXISTENTIALs or INDIVIDUALs instantiating a CLASS
    in the real world. The ABox is the set of INSTANCEs and the
    ASSERTINs associated with those INSTANCEs.

    Properly specified CLASSes are defined in the context of the
    INSTANCEs whose PRPERTIES and RELATINs they formally represent.

    Properly specified INSTANCEs are defined via their reference to an
    appropriate set of CLASSes.

    Reasoners (RacerPro, Pellet, FACT++) generally have optimizations
    specific to either reasoning on the TBox or reasoning on the ABox,
    but it's difficult (i.e., no existing examples experts such as Phil
    and others can cite) to optimize both for reasoning on the TBox, the
    ABox AND - most importantly - TBox + ABox (across these sets).

    All of us trying to apply ontology-based formalisms to create machine-
    parsable representations of real world biomedical continuants and
    occurents have banged our heads bloody against this UNIVERSAL-
    EXISTENTIAL border. Even determining which of the many biomedical
    informatic resources to employ when you seek to reference relevant
    UNIVERSALs can be an very difficult task. We're in the midst an
    extended debate within the BIRN Task Force on how best to do
    this for proteins relevant to cross-species representation of
    neurodegenerative disease such as Glial Fibrillary Acidic Protein
    (GFAP)).

    I strongly encourage the experts to please clarify, embellish, or
    correct the above definitions as they see fit for the edification of
    all us disciples. :-)

    Cheers,
    Bill

    Sep 15, 2006, at 8:30 AM, Phillip Lord wrote:

    "KV" == Kashyap, Vipul <VKASHYAP1@PARTNERSRGwrites:

    KV, if mapping into instances gives better performance
    KVfor a given set of inferences, that might be the basis of
    KVchoosing the instance-of relationship. Towards this end I have
    KVthe following questions for Phil:

    KV1. What are the set of Abox inferences implemented in the G
    KVexample?

    In that example, there aren't any. At that stage, the instance store
    was not doing ABox reasoning at all, just TBox, made to look like
    ABox.

    The system is richer now, and you can express some relationship
    between individuals in the ABox (as well as any expressivity you like
    in the TBox). But, I don't have details, I am afraid.
    --
    KV2. What would be the corresponding set of TBox inferences
    KVimplemented if the
    KVdesign choice proposed by Chris was adopted, i.e., p53 is a
    KVsubclass of Gene (assuming a general "Gene" class)

    I am presuming by "set of inferences" you mean, what can you express?
    The TBox supports WL-DL in full. Actually, as the InstanceStore punts
    much of the work to the reasoner, without limits this is constrainted
    by the reasoner not the instancestore per se. So it does what ever you
    reasoner does.
    >
    >
    >

    KV3. What are the performance and scalability implications of (1)
    KVvs (2)

    ABox reasoning is harder than TBox. As is the way with DL, exactly
    what the implications are, depends on exactly what you express and I
    am not really an expert.
    --
    KV4. What are the expressiveness implications of (1) vs (2), i.e.,
    KVcan we express
    KVsome statements using subclass-of based modeling which are not
    KVpossible using instance-of modeling; or vice versa

    KVLook forward to a good use case illustrating the above and
    KVdiscussing its possible consequences.
    --
    The limitation is that if you're entities are in the ABox in this
    case, there are a very limited number of things that you can say about
    their relationships to other entities in the ABox, although you have
    the full expressivity of WL to relate them to the TBox. Flip side, is
    that if you put everything into the TBox, then you get nothing from
    the relational backend of the instancestore. In the G example, for
    instance, you could put all the associations into a reason as modelled
    as WL classes, but the reasoner will probably not scale to 6 million
    instances.

    Separating entities into ABox and TBox depending on how many of them
    there are is, of course, unsatisfying from an ontological perspective,
    but as you are asking about scalability of computational reasoning I
    don't think you have any choice but to be pragmatic.

    Phil

    Bill Bug
    Senior Research Analyst/ Engineer

    Laboratory for Bioimaging & Anatomical Informatics
    www.neuroterrain.org
    Department of Neurobiology & Anatomy
    Drexel University College of Medicine
    2900 Queen Lane
    Philadelphia, PA 19129
    215 991 8430 (ph)
    610 457 0443 (mobile)
    215 843 9367 (fax)

    Please Note: I now have a new email - William.Bug (AT) DrexelMed (DOT) edu

    This email and any accompanying attachments are confidential.
    This information is intended solely for the use of the individual
    to whom it is addressed. Any review, disclosure, copying,
    distribution, or use of this email communication by others is strictly
    prohibited. If you are not the intended recipient please notify us
    immediately by returning this message to the sender and delete
    all copies. Thank you for your cooperation.
  • No.6 | | 2183 bytes | |

    "WB" == William Bug <William.Bug (AT) DrexelMed (DOT) eduwrites:

    WBCLASSes represent UNIVERSALs or TYPEs. The TBox is the set of
    WBCLASSes and the ASSERTINs associated with CLASSes.

    WBINSTANCEs represent EXISTENTIALs or INDIVIDUALs instantiating a
    WBCLASS in the real world. The ABox is the set of INSTANCEs and
    WBthe ASSERTINs associated with those INSTANCEs.

    I'd take a slight step back from this. You can think of classes and
    instances in this way. But in the WL sense, a class is a logical
    construct with a set of computational properties. "Instances" is a
    more difficult term. WL actually has individuals. The instance store
    uses "instances" because they are not really WL individuals.
    There is also a philosophical concept of what a class is, what a
    universal is an so on, which may be somewhat different, and is also
    open to debate.

    WBProperly specified CLASSes are defined in the context of the
    WBINSTANCEs whose PRPERTIES and RELATINs they formally
    WBrepresent.

    WBProperly specified INSTANCEs are defined via their reference to
    WBan appropriate set of CLASSes.

    Think this would be circular. An WL class is defined by the
    individuals that it might have in any model which fits the
    ontology. Not just the individuals it has an a specific model.

    WBReasoners (RacerPro, Pellet, FACT++) generally have
    WBoptimizations specific to either reasoning on the TBox or
    WBreasoning on the ABox, but it's difficult (i.e., no existing
    WBexamples experts such as Phil and others can cite) to optimize
    WBboth for reasoning on the TBox, the ABox AND - most importantly
    WB- TBox + ABox (across these sets).

    ABox is more complex than TBox, although I believe the difference is
    not that profound (ie they are both really complex). For a DL as
    expressive as that which WL is based on, the complexities are always
    really bad. In other words, no reasoner can ever guarantee to scale
    well in all circumstances. This does not mean that you cannot build
    reasoners which will scale well in practice.

    Make sense?

    Phil
  • No.7 | | 3781 bytes | |

    Fri, 15 Sep 2006, Phillip Lord wrote:

    >
    >
    >
    >

    "WB" == William Bug <William.Bug (AT) DrexelMed (DOT) eduwrites:

    WBCLASSes represent UNIVERSALs or TYPEs. The TBox is the set of
    WBCLASSes and the ASSERTINs associated with CLASSes.

    WBINSTANCEs represent EXISTENTIALs or INDIVIDUALs instantiating a
    WBCLASS in the real world. The ABox is the set of INSTANCEs and
    WBthe ASSERTINs associated with those INSTANCEs.
    >
    >
    >

    I'd take a slight step back from this. You can think of classes and
    instances in this way. But in the WL sense, a class is a logical
    construct with a set of computational properties. "Instances" is a
    more difficult term. WL actually has individuals. The instance store
    uses "instances" because they are not really WL individuals.
    There is also a philosophical concept of what a class is, what a
    universal is an so on, which may be somewhat different, and is also
    open to debate.

    WBProperly specified CLASSes are defined in the context of the
    WBINSTANCEs whose PRPERTIES and RELATINs they formally
    WBrepresent.

    WBProperly specified INSTANCEs are defined via their reference to
    WBan appropriate set of CLASSes.

    Think this would be circular. An WL class is defined by the
    individuals that it might have in any model which fits the
    ontology. Not just the individuals it has an a specific model.
    --
    WBReasoners (RacerPro, Pellet, FACT++) generally have
    WBoptimizations specific to either reasoning on the TBox or
    WBreasoning on the ABox, but it's difficult (i.e., no existing
    WBexamples experts such as Phil and others can cite) to optimize
    WBboth for reasoning on the TBox, the ABox AND - most importantly
    WB- TBox + ABox (across these sets).

    ABox is more complex than TBox, although I believe the difference is
    not that profound (ie they are both really complex). For a DL as
    expressive as that which WL is based on, the complexities are always
    really bad. In other words, no reasoner can ever guarantee to scale
    well in all circumstances.

    again: pure production/rule-oriented systems *are* built to
    scale well in *all* circumstances (this is the primary advantage they
    have over DL reasoners - i.e., reasoners tuned specifically to DL
    semantics). This distinction is critical: not every reasoner is the same
    and this is the reason why there is interest in considerations of using
    translations to datalog and other logic programming systems (per Ian Horrocks suggestion below):

    >Another interesting approach that has only recently been presented by
    >Motik et al is to translate a DL terminology into a set of disjunctive
    >datalog rules, and to use an efficient datalog engine to deal with
    >large numbers of ground facts. This idea has been implemented in the
    >Kaon2 system, early results with which have been quite encouraging (see
    >http://kaon2.semanticweb.org/). It can deal with expressive languages
    >(such as WL), but it seems to work best in data-centric applications,
    >i.e., where the terminology is not too large and complex.


    I'd go a step further and suggest that even large terminologies aren't a
    problem for such systems as their primary bottleneck is memory (very
    cheap) and the complexity of the rule set. The set of horn-like rules
    that express DL semantics are *very* small.

    Chimezie
    Lead Systems Analyst
    Thoracic and Cardiovascular Surgery
    Cleveland Clinic Foundation
    9500 Euclid Avenue/ W26
    Cleveland, 44195
    : (216)444-8593
    ogbujic (AT) ccf (DOT) org
  • No.8 | | 3781 bytes | |

    Fri, 15 Sep 2006, Phillip Lord wrote:

    >
    >
    >
    >

    "WB" == William Bug <William.Bug (AT) DrexelMed (DOT) eduwrites:

    WBCLASSes represent UNIVERSALs or TYPEs. The TBox is the set of
    WBCLASSes and the ASSERTINs associated with CLASSes.

    WBINSTANCEs represent EXISTENTIALs or INDIVIDUALs instantiating a
    WBCLASS in the real world. The ABox is the set of INSTANCEs and
    WBthe ASSERTINs associated with those INSTANCEs.
    >
    >
    >

    I'd take a slight step back from this. You can think of classes and
    instances in this way. But in the WL sense, a class is a logical
    construct with a set of computational properties. "Instances" is a
    more difficult term. WL actually has individuals. The instance store
    uses "instances" because they are not really WL individuals.
    There is also a philosophical concept of what a class is, what a
    universal is an so on, which may be somewhat different, and is also
    open to debate.

    WBProperly specified CLASSes are defined in the context of the
    WBINSTANCEs whose PRPERTIES and RELATINs they formally
    WBrepresent.

    WBProperly specified INSTANCEs are defined via their reference to
    WBan appropriate set of CLASSes.

    Think this would be circular. An WL class is defined by the
    individuals that it might have in any model which fits the
    ontology. Not just the individuals it has an a specific model.
    --
    WBReasoners (RacerPro, Pellet, FACT++) generally have
    WBoptimizations specific to either reasoning on the TBox or
    WBreasoning on the ABox, but it's difficult (i.e., no existing
    WBexamples experts such as Phil and others can cite) to optimize
    WBboth for reasoning on the TBox, the ABox AND - most importantly
    WB- TBox + ABox (across these sets).

    ABox is more complex than TBox, although I believe the difference is
    not that profound (ie they are both really complex). For a DL as
    expressive as that which WL is based on, the complexities are always
    really bad. In other words, no reasoner can ever guarantee to scale
    well in all circumstances.

    again: pure production/rule-oriented systems *are* built to
    scale well in *all* circumstances (this is the primary advantage they
    have over DL reasoners - i.e., reasoners tuned specifically to DL
    semantics). This distinction is critical: not every reasoner is the same
    and this is the reason why there is interest in considerations of using
    translations to datalog and other logic programming systems (per Ian Horrocks suggestion below):

    >Another interesting approach that has only recently been presented by
    >Motik et al is to translate a DL terminology into a set of disjunctive
    >datalog rules, and to use an efficient datalog engine to deal with
    >large numbers of ground facts. This idea has been implemented in the
    >Kaon2 system, early results with which have been quite encouraging (see
    >http://kaon2.semanticweb.org/). It can deal with expressive languages
    >(such as WL), but it seems to work best in data-centric applications,
    >i.e., where the terminology is not too large and complex.


    I'd go a step further and suggest that even large terminologies aren't a
    problem for such systems as their primary bottleneck is memory (very
    cheap) and the complexity of the rule set. The set of horn-like rules
    that express DL semantics are *very* small.

    Chimezie
    Lead Systems Analyst
    Thoracic and Cardiovascular Surgery
    Cleveland Clinic Foundation
    9500 Euclid Avenue/ W26
    Cleveland, 44195
    : (216)444-8593
    ogbujic (AT) ccf (DOT) org
  • No.9 | | 4568 bytes | |

    Thanks, Phil.

    This all makes perfect sense.

    Please see below for a brief clarification.

    Cheers,
    Bill

    Sep 15, 2006, at 11:13 AM, Phillip Lord wrote:

    >
    >
    >
    >

    "WB" == William Bug <William.Bug (AT) DrexelMed (DOT) eduwrites:

    WBCLASSes represent UNIVERSALs or TYPEs. The TBox is the set of
    WBCLASSes and the ASSERTINs associated with CLASSes.

    WBINSTANCEs represent EXISTENTIALs or INDIVIDUALs instantiating a
    WBCLASS in the real world. The ABox is the set of INSTANCEs and
    WBthe ASSERTINs associated with those INSTANCEs.
    >
    >
    >

    I'd take a slight step back from this. You can think of classes and
    instances in this way. But in the WL sense, a class is a logical
    construct with a set of computational properties. "Instances" is a
    more difficult term. WL actually has individuals. The instance store
    uses "instances" because they are not really WL individuals.
    There is also a philosophical concept of what a class is, what a
    universal is an so on, which may be somewhat different, and is also
    open to debate.

    Admittedly, some of the issues addressed in discussions of biomedical
    ontology theory - especially those derived directly from formal Frege-
    style FL expressions may not be supported in WL. What is supported
    in WL are expressions one can construct in the specific DL WL is
    based on, which I take it from what I've read on the WL normative
    syntax page is roughly equivalent to SHIF(D) and SHIN(D) (http://
    #2).

    WBProperly specified CLASSes are defined in the context of the
    WBINSTANCEs whose PRPERTIES and RELATINs they formally
    WBrepresent.

    WBProperly specified INSTANCEs are defined via their reference to
    WBan appropriate set of CLASSes.

    Think this would be circular. An WL class is defined by the
    individuals that it might have in any model which fits the
    ontology. Not just the individuals it has an a specific model.

    I didn't mean it as literally as it sounds - more in the sense of how
    you'd express this in Frege-style formalism:
    For the CLASS "mitochondrion", there exists some biomaterial entity
    for which all the assertions associated with the CLASS
    "mitochondrion" are true.

    What this admittedly open ended definition allows is for the
    definition of CLASS "mitochondrion" to evolve as more detailed
    existential entities are described and new universal properties and
    relations for the CLASS "mitochondrion" are identified in the lab.

    You are absolutely right, of course, in the case of WL, to say a
    CLASS == {the set of all INDIVIDUALs who derive from that CLASS}
    would definitely be a circular definition that would likely confound
    a reasoner.

    --
    WBReasoners (RacerPro, Pellet, FACT++) generally have
    WBoptimizations specific to either reasoning on the TBox or
    WBreasoning on the ABox, but it's difficult (i.e., no existing
    WBexamples experts such as Phil and others can cite) to optimize
    WBboth for reasoning on the TBox, the ABox AND - most importantly
    WB- TBox + ABox (across these sets).

    ABox is more complex than TBox, although I believe the difference is
    not that profound (ie they are both really complex). For a DL as
    expressive as that which WL is based on, the complexities are always
    really bad. In other words, no reasoner can ever guarantee to scale
    well in all circumstances. This does not mean that you cannot build
    reasoners which will scale well in practice.
    --
    Make sense?

    Phil

    Bill Bug
    Senior Research Analyst/ Engineer

    Laboratory for Bioimaging & Anatomical Informatics
    www.neuroterrain.org
    Department of Neurobiology & Anatomy
    Drexel University College of Medicine
    2900 Queen Lane
    Philadelphia, PA 19129
    215 991 8430 (ph)
    610 457 0443 (mobile)
    215 843 9367 (fax)

    Please Note: I now have a new email - William.Bug (AT) DrexelMed (DOT) edu

    This email and any accompanying attachments are confidential.
    This information is intended solely for the use of the individual
    to whom it is addressed. Any review, disclosure, copying,
    distribution, or use of this email communication by others is strictly
    prohibited. If you are not the intended recipient please notify us
    immediately by returning this message to the sender and delete
    all copies. Thank you for your cooperation.
  • No.10 | | 4568 bytes | |

    Thanks, Phil.

    This all makes perfect sense.

    Please see below for a brief clarification.

    Cheers,
    Bill

    Sep 15, 2006, at 11:13 AM, Phillip Lord wrote:

    >
    >
    >
    >

    "WB" == William Bug <William.Bug (AT) DrexelMed (DOT) eduwrites:

    WBCLASSes represent UNIVERSALs or TYPEs. The TBox is the set of
    WBCLASSes and the ASSERTINs associated with CLASSes.

    WBINSTANCEs represent EXISTENTIALs or INDIVIDUALs instantiating a
    WBCLASS in the real world. The ABox is the set of INSTANCEs and
    WBthe ASSERTINs associated with those INSTANCEs.
    >
    >
    >

    I'd take a slight step back from this. You can think of classes and
    instances in this way. But in the WL sense, a class is a logical
    construct with a set of computational properties. "Instances" is a
    more difficult term. WL actually has individuals. The instance store
    uses "instances" because they are not really WL individuals.
    There is also a philosophical concept of what a class is, what a
    universal is an so on, which may be somewhat different, and is also
    open to debate.

    Admittedly, some of the issues addressed in discussions of biomedical
    ontology theory - especially those derived directly from formal Frege-
    style FL expressions may not be supported in WL. What is supported
    in WL are expressions one can construct in the specific DL WL is
    based on, which I take it from what I've read on the WL normative
    syntax page is roughly equivalent to SHIF(D) and SHIN(D) (http://
    #2).

    WBProperly specified CLASSes are defined in the context of the
    WBINSTANCEs whose PRPERTIES and RELATINs they formally
    WBrepresent.

    WBProperly specified INSTANCEs are defined via their reference to
    WBan appropriate set of CLASSes.

    Think this would be circular. An WL class is defined by the
    individuals that it might have in any model which fits the
    ontology. Not just the individuals it has an a specific model.

    I didn't mean it as literally as it sounds - more in the sense of how
    you'd express this in Frege-style formalism:
    For the CLASS "mitochondrion", there exists some biomaterial entity
    for which all the assertions associated with the CLASS
    "mitochondrion" are true.

    What this admittedly open ended definition allows is for the
    definition of CLASS "mitochondrion" to evolve as more detailed
    existential entities are described and new universal properties and
    relations for the CLASS "mitochondrion" are identified in the lab.

    You are absolutely right, of course, in the case of WL, to say a
    CLASS == {the set of all INDIVIDUALs who derive from that CLASS}
    would definitely be a circular definition that would likely confound
    a reasoner.

    --
    WBReasoners (RacerPro, Pellet, FACT++) generally have
    WBoptimizations specific to either reasoning on the TBox or
    WBreasoning on the ABox, but it's difficult (i.e., no existing
    WBexamples experts such as Phil and others can cite) to optimize
    WBboth for reasoning on the TBox, the ABox AND - most importantly
    WB- TBox + ABox (across these sets).

    ABox is more complex than TBox, although I believe the difference is
    not that profound (ie they are both really complex). For a DL as
    expressive as that which WL is based on, the complexities are always
    really bad. In other words, no reasoner can ever guarantee to scale
    well in all circumstances. This does not mean that you cannot build
    reasoners which will scale well in practice.
    --
    Make sense?

    Phil

    Bill Bug
    Senior Research Analyst/ Engineer

    Laboratory for Bioimaging & Anatomical Informatics
    www.neuroterrain.org
    Department of Neurobiology & Anatomy
    Drexel University College of Medicine
    2900 Queen Lane
    Philadelphia, PA 19129
    215 991 8430 (ph)
    610 457 0443 (mobile)
    215 843 9367 (fax)

    Please Note: I now have a new email - William.Bug (AT) DrexelMed (DOT) edu

    This email and any accompanying attachments are confidential.
    This information is intended solely for the use of the individual
    to whom it is addressed. Any review, disclosure, copying,
    distribution, or use of this email communication by others is strictly
    prohibited. If you are not the intended recipient please notify us
    immediately by returning this message to the sender and delete
    all copies. Thank you for your cooperation.
  • No.11 | | 2805 bytes | |

    "C" == Chimezie <ogbujic (AT) bio (DOT) ri.ccf.orgwrites:

    >ABox is more complex than TBox, although I believe the difference
    >is not that profound (ie they are both really complex). For a DL
    >as expressive as that which WL is based on, the complexities are
    >always really bad. In other words, no reasoner can ever guarantee
    >to scale well in all circumstances.


    Cagain: pure production/rule-oriented systems *are* built to
    Cscale well in *all* circumstances (this is the primary advantage
    Cthey have over DL reasoners - i.e., reasoners tuned specifically
    Cto DL semantics). This distinction is critical: not every
    Creasoner is the same and this is the reason why there is
    Cinterest in considerations of using translations to datalog and
    Cother logic programming systems (per Ian Horrocks suggestion
    Cbelow):

    Well, as I am speaking at the limit of my knowledge I cannot be sure
    about this, but I strongly suspect that what you say is wrong.

    Any computational system can only be guaranteed to work well in all
    circumstances if it is of very low expressivity. If a system
    implements expressivity equivalent to Turing/Lambda calculus, then no
    such guarantees are ever possible, nor can you determine
    algorithmically which code will perform well and which not.

    Part of the problem with DL reasoners and their scalability is,
    indeed, their relative immaturity. But, part of the problem is because
    that is just the way that universe is built. Ain't much that can be
    done about this.


    >Another interesting approach that has only recently been
    >presented by Motik et al is to translate a DL terminology into a
    >set of disjunctive datalog rules, and to use an efficient datalog
    >engine to deal with large numbers of ground facts. This idea has
    >been implemented in the Kaon2 system, early results with which
    >have been quite encouraging (see
    >http://kaon2.semanticweb.org/). It can deal with expressive
    >languages (such as WL), but it seems to work best in
    >data-centric applications, i.e., where the terminology is not too
    >large and complex.


    CI'd go a step further and suggest that even large terminologies
    Caren't a problem for such systems as their primary bottleneck is
    Cmemory (very cheap) and the complexity of the rule set. The set
    Cof horn-like rules that express DL semantics are *very* small.

    Memory is not cheap if the requirements scale non-polynomially.
    Besides, what is the point of suggesting that large terminologies
    are not a problem? Why not try it, and report the results?

    Phil
  • No.12 | | 2805 bytes | |

    "C" == Chimezie <ogbujic (AT) bio (DOT) ri.ccf.orgwrites:

    >ABox is more complex than TBox, although I believe the difference
    >is not that profound (ie they are both really complex). For a DL
    >as expressive as that which WL is based on, the complexities are
    >always really bad. In other words, no reasoner can ever guarantee
    >to scale well in all circumstances.


    Cagain: pure production/rule-oriented systems *are* built to
    Cscale well in *all* circumstances (this is the primary advantage
    Cthey have over DL reasoners - i.e., reasoners tuned specifically
    Cto DL semantics). This distinction is critical: not every
    Creasoner is the same and this is the reason why there is
    Cinterest in considerations of using translations to datalog and
    Cother logic programming systems (per Ian Horrocks suggestion
    Cbelow):

    Well, as I am speaking at the limit of my knowledge I cannot be sure
    about this, but I strongly suspect that what you say is wrong.

    Any computational system can only be guaranteed to work well in all
    circumstances if it is of very low expressivity. If a system
    implements expressivity equivalent to Turing/Lambda calculus, then no
    such guarantees are ever possible, nor can you determine
    algorithmically which code will perform well and which not.

    Part of the problem with DL reasoners and their scalability is,
    indeed, their relative immaturity. But, part of the problem is because
    that is just the way that universe is built. Ain't much that can be
    done about this.


    >Another interesting approach that has only recently been
    >presented by Motik et al is to translate a DL terminology into a
    >set of disjunctive datalog rules, and to use an efficient datalog
    >engine to deal with large numbers of ground facts. This idea has
    >been implemented in the Kaon2 system, early results with which
    >have been quite encouraging (see
    >http://kaon2.semanticweb.org/). It can deal with expressive
    >languages (such as WL), but it seems to work best in
    >data-centric applications, i.e., where the terminology is not too
    >large and complex.


    CI'd go a step further and suggest that even large terminologies
    Caren't a problem for such systems as their primary bottleneck is
    Cmemory (very cheap) and the complexity of the rule set. The set
    Cof horn-like rules that express DL semantics are *very* small.

    Memory is not cheap if the requirements scale non-polynomially.
    Besides, what is the point of suggesting that large terminologies
    are not a problem? Why not try it, and report the results?

    Phil
  • No.13 | | 2585 bytes | |

    Well, as I am speaking at the limit of my knowledge I cannot be sure
    about this, but I strongly suspect that what you say is wrong.

    Any computational system can only be guaranteed to work well in all
    circumstances if it is of very low expressivity. If a system
    implements expressivity equivalent to Turing/Lambda calculus, then no
    such guarantees are ever possible, nor can you determine
    algorithmically which code will perform well and which not.

    Part of the problem with DL reasoners and their scalability is,
    indeed, their relative immaturity. But, part of the problem is because
    that is just the way that universe is built. Ain't much that can be
    done about this.

    I disagree and my point is that the universe you speak of is framed by a
    specific reasoning algorithm. But your point is taken (below) that
    experimentation and results are what is needed. The reality is that the
    world of production systems and DL/FL reasoning are somewhat isolated
    from each other and both can benefit greatly from the other.

    >Another interesting approach that has only recently been
    >presented by Motik et al is to translate a DL terminology into a
    >set of disjunctive datalog rules, and to use an efficient datalog
    >engine to deal with large numbers of ground facts. This idea has
    >been implemented in the Kaon2 system, early results with which
    >have been quite encouraging (see
    >http://kaon2.semanticweb.org/). It can deal with expressive
    >languages (such as WL), but it seems to work best in
    >data-centric applications, i.e., where the terminology is not too
    >large and complex.
    >

    CI'd go a step further and suggest that even large terminologies
    Caren't a problem for such systems as their primary bottleneck is
    Cmemory (very cheap) and the complexity of the rule set. The set
    Cof horn-like rules that express DL semantics are *very* small.
    --
    Memory is not cheap if the requirements scale non-polynomially.
    Besides, what is the point of suggesting that large terminologies
    are not a problem? Why not try it, and report the results?

    I plan to. I simply don't think the assumption that Tableau Calculus
    represents the known limitations of DL reasoning is a very useful one.

    Chimezie
    Lead Systems Analyst
    Thoracic and Cardiovascular Surgery
    Cleveland Clinic Foundation
    9500 Euclid Avenue/ W26
    Cleveland, 44195
    : (216)444-8593
    ogbujic (AT) ccf (DOT) org
  • No.14 | | 1735 bytes | |

    "C" == Chimezie <ogbujic (AT) bio (DOT) ri.ccf.orgwrites:

    >Part of the problem with DL reasoners and their scalability is,
    >indeed, their relative immaturity. But, part of the problem is
    >because that is just the way that universe is built. Ain't much
    >that can be done about this.


    CI disagree and my point is that the universe you speak of is
    Cframed by a specific reasoning algorithm.

    No, it isn't. Part of the issues with scalability is that the
    complexity of solving WL is fairly high -- it's just a hard
    problem. This is independent from the algorithm being used to solve
    it. It's a guarentee of worst case time performance for any algorithm,
    including those not developed.

    Again, this is not necessarily a problem. Almost all programming
    languages have unbounded worst case complexity -- for a arbitary java
    program you can never guarentee that it will complete.


    >Memory is not cheap if the requirements scale non-polynomially.
    >Besides, what is the point of suggesting that large terminologies
    >are not a problem? Why not try it, and report the results?


    CI plan to. I simply don't think the assumption that Tableau
    CCalculus represents the known limitations of DL reasoning is a
    Cvery useful one.

    I didn't say this. I am sure someone will come up with algorithms
    which run quicker than at present. But there are fundamental
    limitations there also. My current experience systems which reason
    over WL significantly faster tend to not be doing the same thing, but
    something simpler.

    Phil

Re: Performance issues with OWL Reasoners => subclass vs instance-of


max 4000 letters.
Your nickname that display:
In order to stop the spam: 2 + 2 =
QUESTION ON "Standards"

EMSDN.COM