Some special characters are interpreted by Recoll in search strings to expand or specialize the search. Wildcards expand a root term in controlled ways. Anchor characters can restrict a search to succeed only if the match is found at or near the beginning of the document or one of its fields.
All words entered in Recoll search fields will be processed for wildcard expansion before the request is finally executed.
The wildcard characters are:
*
which matches 0 or more
characters.
?
which matches
a single character.
[]
which allow
defining sets of characters to be matched (ex:
[
abc
]
matches a single character which may be 'a' or 'b' or 'c',
[
0-9
]
matches any number.
You should be aware of a few things when using wildcards.
Using a wildcard character at the beginning of
a word can make for a slow search because Recoll will have to
scan the whole index term list to find the
matches. However, this is much less a problem for field
searches, and queries
like author:*@domain.com
can
sometimes be very useful.
For Recoll version 18 only, when working with a raw index (preserving character case and diacritics), the literal part of a wildcard expression will be matched exactly for case and diacritics. This is not true any more for versions 19 and later.
Using a *
at the end of a
word can produce more matches than you would think, and
strange search results. You can use the
term
explorer tool to check what completions exist for
a given term. You can also see exactly what search was
performed by clicking on the link at the top of the result
list. In general, for natural language terms, stem
expansion will produce better results than an
ending *
(stem expansion is turned off
when any wildcard character appears in the
term).
Due to the way that Recoll processes wildcards
inside dir
path filtering clauses, they
will have a multiplicative effect on the query size. A clause
containg wildcards in several paths elements, like, for
example,
dir:
/home/me/*/*/docdir
,
will almost certainly fail if your indexed tree is of any realistic
size.
Depending on the case, you may be able to work around
the issue by specifying the paths elements more narrowly, with
a constant prefix, or by using 2
separate dir:
clauses instead of multiple
wildcards, as
in dir:
/home/me
dir:
docdir
. The
latter query is not equivalent to the initial one because it
does not specify a number of directory levels, but that's
the best we can do (and it may be actually more useful in
some cases).
Two characters are used to specify that a search hit should
occur at the beginning or at the end of the
text. ^
at the beginning of a term or phrase
constrains the search to happen at the start, $
at the end force it to happen at the end.
As this function is implemented as a phrase search it is possible to specify a maximum distance at which the hit should occur, either through the controls of the advanced search panel, or using the query language, for example, as in:
"^someterm"o10
which would force
someterm
to be found within 10 terms of the
start of the text. This can be combined with a field search as in
somefield:"^someterm"o10
or
somefield:someterm$
.
This feature can also be used with an actual phrase search,
but in this case, the distance applies to the whole phrase and
anchor, so that, for example, bla bla my unexpected
term
at the beginning of the text would be a match for
"^my term"o5
.
Anchored searches can be very useful for searches inside somewhat structured documents like scientific articles, in case explicit metadata has not been supplied (a most frequent case), for example for looking for matches inside the abstract or the list of authors (which occur at the top of the document).