Andrew,
thanks again for pointing out this error in the semantics of the distance
functions. Sorry for the late response.
Here is how the function fts:ApplyFTWordDistanceExactly should be. Please
note the change in the return clause. As a result your query[2] will then
evaluate to False, because SE-3 will not be eliminated.
declare function fts:ApplyFTWordDistanceExactly(
$ as element(,
fts:FTM),
$allMatches as element(allMatches, fts:AllMatches),
$n as xs:integer)
) as element(allMatches, fts:AllMatches) {
<allMatches>
{
for $match in $allMatches/match
let $sorted = for $si in $match/stringInclude
order by $si/tokenInfo/@pos ascending
return $si
where every $idx in (1 to fn:count($sorted) - 1)
satisfies fts:wordDistance(
$sorted[$idx]/tokenInfo,
$sorted[$idx+1]/tokenInfo,
$) = $n
return
<match>
{$match/stringInclude}
{
for $stringExcl in $match/stringExclude
where some $stringIncl in $match/stringInclude
satisfies fts:wordDistance(
$stringIncl/tokenInfo,
$stringExcl/tokenInfo,
$) = $n
return $stringExcl
}
</match>
}
</allMatches>
}
So, yes, as you pointed out it is sufficient for a StringExclude to be in
the required distance with one of the remaining StringIncludes to be kept.
Actually the same correction has to be applied to the other distance
functions (replacing "where every $stringIncl" with "where some
$stringIncl" in the return clause).
The corrections will be included in the next Working Draft.
I add some more examples showing how distance and negation are intended to
interact.
query[2] = . ftcontains ("word1" && "word2" && ! "word3") with distance
exactly 0 words
The query matches, for example:
<nodeword0 word1 word2 word4 </node>
and also
<nodeword0 word2 word1 word4 </node>
in case none of the given words are matched by "word3". Loosely speaking,
that query returns true for a node, if it contains word1 and word2
adjacently in any order and not preceeded or succeeded by an occurrence of
word3.
Hence, the following do not match:
<nodeword1 word2 word3 </node>
<nodeword2 word1 word3 </node>
<nodeword3 word2 word1 </node>
<nodeword3 word1 word2 </node>
<nodeword1 word4 word2 </node<!-- word1 and word2 need to be adjacent
<nodeword13 word2 </node<!-- where word13 is matched by both word1 and
word3
Yours sincerely / Mit freundlichen G,
Jochen D
IBM Germany B Laboratory
DB2 Information Management Software
Phone: +49-7031-16-2992, Fax: -4891, Email: doerre (AT) de (DOT) ibm.com
Dear editors,
When I have a node: <Node>word1 word2 word3</Node>
I apply the query[1]:
/Node ftcontains ("word1" && "word2" && "word3") with distance exactly 0
words
I will get the AllMatches[1] as:
AllMatches
Match
StringInclude (pos = 1)
StringInclude (pos = 2)
StringInclude (pos = 3)
The final result is True.
I apply the query[2]:
/Node ftcontains ("word1" && "word2" && ! "word3") with distance exactly
0 words
I seem to get the AllMatches[2] as:
AllMatches
Match
StringInclude (pos = 1)
StringInclude (pos = 2)
The final result is also True.
The reason for AllMatches[2] is that the StringExclude (pos = 3) which
is generated by ! "word3" has been dropped, according to semantics of
ApplyFTWordDistanceExactly, because SE-3 does not have a word distance 0
with both SI-1 and SI-2.
Are my two results correct? If they are correct, would this be
inconsistent? what is the intuition when "word3" is a don't-care?
Can I compare SE-3 to any one of SI-1 and SI-2, not to both of them?
Thanks,