If you want to perform a detailed search over multiple EAF-files, but the options offered by Search multiple EAF (see Searching through multiple annotation files) are not comprehensive enough, you can use yet another search mode. This allows you to restrict the search domain to certain tiers, to use regular expressions, etc. while examining multiple annotation files at once. This search function will search for (whole) words to match the given query, but also will match parts of words that match the query.
The function can be reached via Search > Structured search multiple
eaf.... When you click on this option for the first time, you will be
asked to define a search domain in the form of one or more .eaf
files. The next time you open the Structured search, it uses the last defined search
domain. The search window offers the possibility to define a new search domain: click
on
Define Domain and do one of the following:
*.eaf
) to select it.
It now appears in the rightmost box. Alternatively, you can click on the
annotation file name and click the >>
button.
Repeat this for every annotation file you want to include.
It is also possible to select a complete directory. All
.eaf
files in a selected directory will be
included.
After defining a search domain for the first time or when you open the Structured search with a search domain from the previous usage, the following window will open:
As you can see there are three tabs offering different kinds of search:
This tab offers the simplest search. It just asks for a search string. After entering the search string you can click on Find (or press Enter) to start the search process. This will result in a screen like the one below:
It shows tokens that contain the search string and some tokens in the context printed in italic typeface. The default number of tokens in the context is three on both sides. When the number of hits exceeds the maximum number the window can contain, you can view the rest of the hits by clicking the < and > button that appear above the list of hits to go back or forward one page. To view an annotation in the timeline view of the main window simply double click it:
For further investigation of the results the search window offers a context menu that enables you to view the results in other manners and to save the results. To open the context menu right click on one of the results. The menu has the following options:
After clicking OK you can enter a file name and click Save to save the statistics file.
When you are in frequency view or frequency view (by frequency) (Figure 4.21, “Frequency View”), the context menu (right-click) has the following options:
The alignment view allows you to view your search results in an aligned time-based view. For detailed information about the Alignment View, see View search results in Alignment View.
The Single Layer tab offers a more elaborate search than the Substring Search tab. The first thing that is different from the Substring Search tab is that the Single Layer Search tab has a query history. Clicking the < and > button makes the tab respectively go backward and forward one query. There is also the possibility to save queries, as well as loading previously saved queries.
Furthermore, the tab offers different modes to restrict the search. The first mode lets you choose the form of the results. There are three options:
The following mode offers the straightforward distinction between case sensitive and case insensitive search. The third mode lets the user choose if the element of the first mode should contain the search string (substring match), if the element should exactly match the search string (exact match) or if some regular expression should be used in the match (regular expression).Finally, one can choose to restrict the search to one tier, a tier type or a participant.
When you choose an N-gram to be the form of the result, you can use two more
options: a wild card and a negation. The wildcard takes the form of a #-sign. For
instance, the search string the # man
with the mode
N-gram over annotations would return three annotations per
hit: the first annotation contains the
(or exactly
matches that, if the mode exact match is chosen), the second
annotation may contain anything due to the use of the wildcard and the third annotation
contains or exactly matches man
. If the mode
N-gram within annotation is chosen, each hit contains one
annotation. In this annotation there is a N-gram consisting of three tokens where
the
first token contains or exactly matches the
, the second
may be anything and the third contains or exactly matches
man
.
If you want to find N-grams where a token matches anything but one string, you can
use the negation operator NOT(...), where you can fill in the search string not to
be
matched on the dots. For instance, the search string the NOT(strange)
man
would return 3-grams in same way as describe above, but the hits where
the second annotation or token matches strange are left out.
The Multiple Layer Search tab houses the most comprehensive search in ELAN. Similar to the Single Layer Search tab a Query History is kept, enabling the user to go back and forward a query by clicking the < and > respectively. It it also possible to either save or load a previously saved query. To do so, click either the Save query or the Load query button. Queries are saved in XML format.
The two modes case sensitive/case insensitive and substring match/exact match/regular expression are also similar to the second tab. The first new element is the Clear-button. Clicking this button will clear all data of a query.
A new option has been included into the menu containing all the different types of matches (i.e. substring match, exact match, regular expression): variable match. As the name says, it has to do with using variables, and it can be used every time you want to search for two or more annotations, contained in two or more different tiers, reporting the same text and/or the same time alignment. See the image below for an example:
As you can see in the example, the variable 'X' can match any same value of annotations that meet all other constraints. They are in the same time-frame (overlap) and reside in the same file (the base constraint is Must be in the same file) . In this case 'BONE' is found in the tier 'Gloss RH English' and in 'Gloss LH English', the same for the value '(p-) leg dog'.
It is possible to use more than one variable, e.g. X and Y. This is especially useful in those cases where more than two query fields are filled in.
X and Y can either match different values or the same value. If a variable should be unique, i.e. should never match the same value as any of the other variables, it should be preceded by an exclamation mark, e.g. !Y.
The buttons Minimal Duration and Maximal Duration enables you to constrict the minimal and maximal duration of each result. When you click on one of the buttons, a dialog window appears, e.g.:
Here you can enter the minimal or maximal duration as the total number of
milliseconds or in hours:minutes:seconds.milliseconds. A value of 0 milliseconds or
00:00:00.000 yields as undefined. Searching for annotations with a maximum duration
being less then the minimum duration is impossible. Hence, entering conflicting values
results in an error message saying that the combination is impossible. After entering
a
correct duration, it will be displayed in the corresponding button.
The buttons Begin After and End Before give a dialog similar to that of the previous two buttons. They give the possibility to restrict the annotations in the result to begin after a certain time and end before a certain time. Entering a Begin After-time that is greater than the End Before-time or vice versa results in an error message saying it is impossible. After entering a correct time, it will be displayed in the corresponding button.
Beneath the buttons discussed above, you will find a table consisting of white and green fields. Search strings are entered in the white fields while a green field between two non-empty white fields must contain a constraint. The fields on one row give the search strings and constraints to be matched by annotations on one tier. The result of having two or more rows in the query table is that the search engine may find annotations on two or more tiers as one hit. Furthermore, it is possible to restrict the search to one (type of) tier for each row by choosing the appropriate option in the pull-down menu on the right of each row.
Let us first take a look at search strings and constraints in one row. If you enter two search strings in two white fields separated by a green field, you must fill in that green field i.e. make a constraint. Clicking the arrow on the green field gives a menu offering the following constraints:
When you click on Find and there is an empty constraint between two non-empty search string fields, you will get an error message. You will also get an error message if there is an empty search string field and constraint fields between two non-empty search string fields.
As we saw earlier the search mechanism on this tab has the possibility to construct a query for two or more tiers (up to eight). Besides the constraints on annotations on a tier, one can also apply constraints on annotations on different tiers. This means that if the search engine has found an annotation that matches a search string on one tier, the engine looks if the search string for another tier can be matched on another tier while considering the constraint that is between the two search strings.
The top down hierarchy of the rows in the query table does not reflect the hierarchy of the tiers in your data. That means, for instance, that search strings and constraints in the upper query table row may be matched by a child tier of the tier that matches search strings and constraints in the middle query table row.
Clicking the arrow in the green field between two search strings gives a menu with the following constraints:
or
An example of a Multiple Layer Search with constraints is shown below:
As you can see the tiers in the result are indicated by #1 and #2, corresponding to the first and second query table row respectively. The annotations in a tier are surrounded by vertical bars indicating their start and end.
It is possible to add or remove columns and/or layers to your search query. To do so, click the respective button:
It is also possible to hide the query once there are search results. This allows you to see more query results within a single window. This can be helpful when using the Alignment View View search results in Alignment View.
Figure 4.26, “Multiple Layer query” also illustrates what to do if you
would like to use both Exact match and Substring
match in one query: use the Regular
expression. In places where you would like to have an exact match use
the ^
and $
signs to match the beginning
and end of a string (e.g. ^of$
) otherwise just enter a word for
the substring match.
The figure also show how to use a wildcard to match anything. Instead of using the
#
as in the Single Layer Search, you can use the regular
expression .+
to indicate any character (the dot) one or more
times (the plus). See also Appendix A, REGULAR EXPRESSION SEARCH for more on regular
expressions. The NOT(...) construction on the other hand can be used in the Multiple
Layer Search in the same way as describe in Single Layer Search tab.
One final but not less important remark concerns the placing of more and less
restrictive search strings. Figure 4.26, “Multiple Layer query” shows a very
restrictive search string in the upper row: ^n$
. The less
restrictive, or should we say non-restrictive, search string .+ is in the middle row.
As
we saw earlier, the hierarchy of the rows in the query does not reflect the hierarchy
in
the data. That means that the search string ^n$
could also be
placed in the lower row and not affect the outcome of the search. While this is
perfectly true, we advise you to place restrictive search strings in the left most
field
on the upper most row possible and the least restrictive search string in the right
most
field of the lowest row possible. The reason for this is the order in which the search
engine considers the search strings in the query. If it finds a restrictive search
string it can filter out all the other possibilities, but if it finds a less restrictive
search string it has to consider all the matches of this search string. In the example
of Figure 4.26, “Multiple Layer query” it is clear that if
^n$
is in the bottom row, the search engine first considers all
annotations matching .+
which is in fact all
annotations in the search domain. Because of this, the search takes much more time
than
if ^n$
was in the upper row.
From the context-menu (right-click the search results), you van view query results from the Multiple Layer Search in Alignment View:
There are a number of options you can set when viewing the query results. Firstly, you can adjust the time scale of the results:
When choosing 'Scale to fit', every query result will be scaled to fit the window, which means the time scale for every result will differ.
There is also the possibility to hide the alignment time scale altogether. To do so, go to the context-menu (right-click) and uncheck Show alignment timesby clicking on it.
You can set the visible columns to the right of the query results through the context-menu (right-click anywhere in the results). You can show or hide the following columns:
The blue bars above every query result graphically show the duration of each annotation and the position of the annotations with respect to each other.
There are also two indicators visible, depending on the length of the query result and the setting of the time scale. These indicators are either red or green.
A green indicator means that the annotation does not fit in the current time scale. In the example above, the bottom annotation 'and then you see um a man in maybe his fifties' has a duration of 5.060 seconds. The time scale is set to 1 second, so 4.060 seconds are outside the current view.
The red indicator means that the annotation in the query result starts outside of the current time scale. The top annotation 'fifties' overlaps the bottom annotation, but starts at 9.177 seconds. This causes it not to be visible in the current time scale, which is set to display 1 second. You would need to set the time scale to 10 seconds to see both annotations visualised completely (as the blue bars) and how they overlap.