Interlinearization mode is a text oriented mode designed for parsing and glossing annotations to one or more lines of interlinearized text. This can be done manually or with the use of one or more so-called Analyzers. The segmentation and (typically) the transcription of speech events need to be done in one or more of the other modes before interlinearization can be added in this mode.
Analyzers are software modules that accept an annotation as input and produce suggestions for one or more annotations, on one or more tiers, as output. Examples of the type of processing analyzers can perform are tokenization, morphological parsing and lookup of glosses. The behavior of some analyzers can be configured in a settings panel. Some analyzers need a connection to a lexicon, others can perform their task based on the input alone. Analyzers are implemented as extensions so that third party users and developers can create and add their own analyzers. At least eventually: the LEXAN API, as it is called, still has to be finalized, documented and published.
Part of the user interface of this mode is a Lexicon panel, the front-end of a Lexicon Component module. It allows to create, import and edit a lexicon and its entries. Lexicons are stored separately from annotation data in a new data format. These are the lexicons that analyzers can get access to.
To start the Interlinearization mode, click Options > Interlinearization Mode from the main window.
The main screen is split in two, the left side containing 2 panels, the right side consisting of a single panel.
To start working in Interlinearization Mode, you need to have already set up a tier structure and have to have some segmentations (annotations on a top level tier). The values of annotations can be edited in this mode and annotations on dependent tiers, including subdivisions, can be created, but not primary segmentations on top-level, independent tiers. This can be done in Annotation mode and/or Segmentation mode. It is still possible to add new tier types and tiers in this mode (please refer to How to define a tier type and How to define a tier and its attributes for more information about tier structures).
If you want to use an analyzer that requires a connection to a lexicon, you should first create or import a lexicon and link one or more tier types to specific fields in a lexical entry (see Adding new tier types and Set the Lexicon Service and entry field information for a Tier Type).
In order to analyze/interlinearize annotations with the assistance of an analyzer (i.e. other than strictly manual annotation), some configuring is required first. The top-left panel, titled Configure Analyzer Settings, shows the number of current configurations below the Analyzer & Source-Target Configuration button which opens a separate window when clicked, see below.
Configuration of analyzer settings consists of two parts:
By ticking the Show tier mapping checkbox the table shows to which tiers each "analyzer-tier type" configuration applies. E.g. in case there are three speakers and the speech tier for all speakers use the same tier type, three tiers will be listed if that type is selected as source of an analyzer etc.
To add new configurations or edit existing ones click the Edit configuration... button to the right. The Remove configuration... removes the selected configuration, if any.
In the dialog that appears, you can, working from left to right, choose the analyzers you would like to use and set the source and target tiers for each analyzer.
First, you choose a certain analyzer, as described in Types of analyzers and their settings. You can configure multiple analyzers, one per line.
Each chosen analyzer will need a source and at least one target tier type for it to function. The source and target tier should not be the same. By default, the user interface tries to assist in the setup by only listing types in the source and target columns if certain constraints are met. E.g. in the column for the source a tier type is only listed if there is at least one tier based on that tier type and if that tier has at least one dependent tier (which can then be selected as target).
In the columns of the target tier types only tier types are listed for which there is at least one tier created as dependent tier of a tier of the source type. If the analyzer supports two target tiers the rightmost column will allow selection of the type for the second target, otherwise this column will be disabled.
When the List all available types... checkbox is ticked, the check on source and target types is not performed and all tier types of the transcription are listed. After selecting the target type(s) a warning message might still be shown that there are possible issues with the configuration, but the user can choose to ignore this.
Figure 3.28. A warning concerning a missing link to a lexicon field or the absence of suitable tiers
Some constraints are not checked:
If an analyzer needs access to a certain field in a lexicon the selected type for the source and/or the target should be linked to the proper field in the right lexicon, see Adding new tier types. This way the analyzer knows which lexicon and which field to query.
When you are done with the configuration, click Apply to finish and go back to the main dialog with the table listing the current configurations.
Sometimes, especially after changing an existing configuration, it is necessary to save the file and open it again to see the effect of the changes.
A selected configuration can be removed here too by clicking the Remove configuration button. If the selected analyzer supports customization of settings, the Configure <analyzer name> button will be active and clicking it will show a Configure Analyzer Settings window (double clicking the analyzer in the table has the same effect). But before the actual analyzer settings window opens, you can choose whether global settings or configuration specific settings are going to be updated. This allows for different settings for an analyzer for different source-target combinations, e.g. depending on the language of involved tiers. In case of doubt choose Global Settings.
The next section describes some analyzers and their settings.
The following analyzers are distributed with ELAN:
The names are somewhat misleading; all of the Parse, Gloss and Lexicon analyzers require access to a lexicon. The Parse analyzer morphologically parses annotations from a word (or token) level tier, based on lexical units (prefixes, stems, suffixes etc.) available in the lexicon (internally the parser is implemented as a state machine with a stack). The results are shown as parse suggestions in a suggestion window from which the user can select one. This analyzer requires one source tier and one target tier, where the target is of a subdivision tier type.
The Gloss analyzer looks up the source annotation in the lexicon and lists all glosses found in the matched entries. The results are again presented as suggestions from which the user can select one. This analyzer requires one source tier and one target tier, where the target is of a symbolic association tier type.
The Lexicon analyzer is a combination of the parse and the gloss analyzer. By configuring the lexicon analyzer, the source tier containing the annotations will both be parsed and glossed in one action. This analyzer requires one source tier and two target tiers.
The Whitespace analyzer splits the selected source annotation at white spaces and places the result on the target tier. It does not need any user confirmation. This analyzer requires one source tier and one target tier, where the target is of a subdivision tier type. Currently the behavior of this analyzer can not be configured (e.g. with respect to treatment of punctuation marks), this might be added in the future.
When configuring analyzers and their source and target tiers, it is possible that the target tier from one analyzer, is the source tier for the next analyzer. The configuration of the tiers is based on tier types rather than on individual tiers.
Configuration on the basis of individual tiers might be added later as an option as well.
The Lexicon analyzer is a combination of the parse and the gloss analyzer. When a lexical entry matches a part of the input token during the matching process (and thus becomes part of one of the suggestions), the glosses of that entry are added to the suggestions too (these "glosses" can be the from any field of the entry, depending on the tier-typ configuration), By configuring the lexicon analyzer, the source tier containing the annotations will both be parsed and glossed in one action. This analyzer requires one source tier and two target tiers. (The LEXAN API currently limits the number of target tiers to two, this might be too restrictive and may need to be reconsidered in a future release.)
The Lexicon analyzer supports the following configurable settings:
Changes in these settings will only be passed to the analyzer after clicking Apply Settings!
This analyzer is largely the same as the parser part of the Lexicon analyzer, with the same configurable settings. It does not support the Match entry field language against tier content language option, which belongs to the glossing part of the process.
This analyzer performs a look-up of the input token in the lexicon and returns all values of the lexical entry field it is configured for (via the tier type). This doesn't have to be the "gloss" field of the lexical entries, but can be any field.
This analyzer supports the following configurable setting:
This analyzer splits the input text it receives into multiple tokens based on white spaces. It allows to configure how e.g. punctuation marks should be treated.
The + (Add) and - (Remove) buttons can be used to add or remove a category of characters, represented by a row in the table. A category can contain one or more characters; if there are more than one, each character is separately treated according to the setting for that category. The table has two columns, one labelled Marks, where the special characters or marks can be entered, and one labelled Action, specifying the way those characters should be handled in the tokenization process. When clicked on, the second column shows a dropdown list with predefined actions:
The Apply button has to be clicked to inform the analyzer of the changes and to put them into effect.
The main purpose of the Lexicon Component in ELAN is to support the (semi-automated) interlinearization process. It is not intended as a full-fledged lexicon tool, though the data model supports a bit more than strictly necessary for its main purpose. The data model and XML-based data format are similar to the LIFT format (Lexicon Interchange Format)), but simplified. These are the main fields of a lexical entry:
The main field is lexical-unit (equivalent to lemma, headword, the primary lexical form). morph-type indicates the word part (e.g. stem, prefix, suffix), analyzers can use this information when processing the input text. The grammatical-category field is the category of the lexical item, the part of speech. The Edit Entry window shows which other fields can be added at the moment. (The data model defines more fields than visible in the entry window, but support and documentation hereof is still pending.) The user can add custom fields at the level of the entry and at the level of sense, these will be visible as field: name.
In the lexicon panel of the interlinearization mode the contents of lexicons can be displayed, one lexicon at the time. But it is possible to store and manage multiple lexicons on disc and choose which one to display. The leftmost drop-down box above the table lists all lexicons that have created or imported.
If a lexicon has been selected and is displayed in the panel, lexical entries can be added, edited or removed.
The lexicon overview can be adjusted to display or hide certain columns. To do so, you have to right-click on a lexical entry to display the context-menu. From there, you can show or hide columns of the lexicon. Some fields can occur more than once in an entry (like variant or gloss), these will be displayed in a single cell, each value surrounded by square brackets. The context menu also has Add, Remove and Edit items, which have the same function as the buttons below the table (see Editing lexical entries).
The order of the entries in the table can be changed by clicking on any of the column headers, a little arrow indicates whether the items are sorted in ascending or descending order.
Actions for creating and editing lexicons are available in the Lexicon Actions drop-down menu. The actions always apply to the lexicon that is selected in the lexicons list and visible in the table. The actions will be discussed below.
When the details have been filled out (a Name and a Language are required),
click Apply to create the lexicon. The lexicon file
will be stored in a predefined folder labeled LexanLexicons inside the Special ELAN data folder. The file name is based on the name of
the lexicon and the file extension is .xml
.
The Custom Fields tab allows to specify the names of custom fields (at the level of entry or sense or both) to be used in lexical entries of this lexicon.
In the Sort Order tab a preferred sort order can be specified by entering an ordered list of tokens consisting of one or more characters. The sort order will be applied to the lexical-unit field but also for the variant and citation fields. After changing the sort order it might be necessary to click the header of the lexical-unit column once to enforce re-sorting of the table.
It is advised to use the Save Lexicon with Current Entry Order option to apply the new sort order to the underlying data structure as well (not only to the view). After that, new lexical entries will be inserted according to the custom sort order.
.dic, .db,
.txt
), a lexicon in LIFT format (.lift
) or a
CorpAfroAs lexicon (.eafl
). In case a Toolbox file has been
selected, a configuration window will be shown (see below), otherwise a converter
will import as much as possible from the original lexicon data into an ELAN
lexicon and will add the new lexicon to the list of lexicons.
This window allows to specify mappings from Toolbox field markers to ELAN lexicon entry fields. The main element in the window is a tabel with in the first column the list of markers that have been found in the Toolbox dictionary file. In the second column the corresponding field name can be selected from a list or a custom field name can be entered (either custom-field-name or sense/custom-field-name). For some fields it will be possible to enter a language code in the third column, depending on the value in the second column. Markers that don't have a mapping in the second column will be ignored during the import process.
Other elements in this configuration window are:
After clicking the OK button several warning messages might be shown, e.g. if required information is missing, if a required field has not been selected in the second column (e.g. lexical-unit) or if a field has been selected more often than allowed.
There is currently no Undo/Redo mechanism for lexicon edit actions!
If multiple transcription windows are open, using the same lexicon, modifying the lexicon should preferably be done in only one window. On Windows there is no guarantee that all windows are updated correctly after a change in a lexicon made in a different window!
More documentation about the structure of lexical entries, about which fields are required, what is hard-coded etc. will follow. The same for native, import and export formats etc.
After creating or importing a lexicon, lexical entries be added, edited or removed. Adding a new entry can be started by clicking the Add button on the bottom of the Lexicon main panel or by choosing the Add menu item from the context menu, if there are already entries in the table. A new dialog will appear, allowing to create a new entry by entering values for, at least, the required fields.
When you are finished, click Apply to add the entry to the lexicon. Required fields will be highlighted if you click Apply while not all required fields are filled in.
Editing a lexical entry can be done by either clicking the Edit button on the bottom of the panel, by right-clicking the entry and choosing Edit from the context-menu or by double-clicking the entry in a cell that can not be edited directly in the table (i.e. the lexical-unit field or any field of which there can be more than one). A dialog will open, displaying the chosen lexical unit as a tree structure. Some general information is shown, such as ID and date of creation.
Removal of a lexical entry is done by highlighting the entry in the lexical entries table and then either clicking the Remove button on the bottom of the Lexicon panel, or by right-clicking the entry and choosing Remove from the context-menu.
The Lexicon Editor combines the editing actions described in the previous sections in a separate, blocking window. The window is split in two with on the left side the same lexical entry table as in the main window and on the right side the lexical entry editor.
The menu bar has three menus:
The left panel shows the name of the current lexicon at the top, with visual highlighting when there are unsaved changes to the lexicon. The lexical entry table covers most of the space, it has the same column showing/hiding and sorting mechanisms as mentioned above. A single click on an entry loads the entry in the entry edit panel to the right. The UP and DOWN keys select the entry in the previous or the next row and loads it. The Add button creates a new entry and loads it. The required fields in an entry are filled with template text, to be modified by the user.
Navigation from one field to the next is again performed with the TAB key, the ENTER key applies the current changes to the entry. If another entry is selected and loaded, changes in the current entry will be applied without a prompt.
If there are multiple ELAN windows open with the same lexicon visible, it might be necessary to reload the lexicon in those windows to see the modifications made in the Lexicon Editor.
It is possible to filter the entries in the table by entering a search string in the Filter Entries text field and pressing Enter. The input is treated as a regular expression. The filter can either be applied to all visible columns or to a single column, selected in the Column drop down box. This allows, for example, to only show the entries with a citation form starting with a 'c'. The filter can be removed again by clicking the Reset button and all entries will be listed again.
The interlinearization panel shows the tier structures and the annotations that were created in e.g. the annotation mode. The annotations are displayed in cells, like in a table or a list, each cell containing a top level annotation and its dependent annotations. The annotations of all visible top level tiers are sorted on time and then added to the cells. This is the main editing area in this mode, for manual and for assisted annotation. In this panel the configured analyzers can be invoked, either for an individual annotation or for a sequence of annotations, via the Analyze/Interlinearize button. The result(s) produced by the analyzer will either be shown here in a Suggestions Window (if there are suggestions to choose from) or immediately applied and added to the designated tier(s) as determined by the analyzer configuration.
The panel has a small "toolbar" at the top with the following options:
The following properties can be set:
The tiers that are visible in the editor can be configured via the right-click context menu of the tier names area.
The Show / Hide More... option opens the same window as described in Switching tiers on/off and in View tiers by Type/Participant/Annotator. The Speaker tier is not a real tier but it shows the Participant attribute of the top level tier in this cell. The TC (time code) tier shows the begin and end time of the top level annotation in this cell.
Hiding the top level tier(s) hides all depending tiers, effectively removing the corresponding cells.
A context-menu will also be shown when right-clicking an annotation. Depending on the annotation there can be different options. This allows you to start interlinearization of an annotation, delete an annotation, or add new annotations.
If the active annotation is on a tier that is the source tier for any of the analyzers, there will be the Analyze / Interlinearize option, which invokes the analyzer with this annotation as input. The option Deleting annotations will always be there, while Entering annotations before/after other annotations and Creating depending annotations for a active annotation are available depending on the type of tier the annotation is on.
Another option is also only available for some annotations.
The Add to Lexicon option is only available for annotations that are on a tier that is linked to a field in a lexicon (via its tier type). This action opens the new entry window (Figure 3.44, “Add a Lexical Entry”) and adds the value of the annotation to the corresponding lexical entry field.
When you click the Analyze / Interlinearize button or right-click and choose Analyze / Interlinearize from the context-menu, the process of analyzing or interlinearization will commence. If the analyzer produces multiple, alternative suggestions that need to be disambiguated, a window will appear showing the suggestions in a layout similar to that of the interlinearization panel. This is called the Suggestion View. The window is positioned just below the source annotation but it can be moved and resized. If you tick the Remember window position checkbox, the Suggestion View will show up with the size and location of the last time it was shown.
The image above displays the Suggestion View with suggestions produced by the Lexicon analyzer, which has two target tiers. It both suggests possible parsings of the input annotation as well as glosses, based on lexical entries found in the current lexicon. You can select the suggestion that best matches your expectations by clicking on it. This is recorded by the analyzer and the next time the same input occurs, it will move the suggestions that have been selected most often to the top of the list. A little header saying "chosen x times before" will appear in those suggestions. If the width of a suggestion makes part of the header invisible then hovering the mouse over the header will show the text in a tooltip. Hovering the mouse over a part of the suggestions will show relevant fields of the lexical entry that part is based on.
There are several keyboard shortcuts for mouse-less interaction with the Suggestion View.
It can happen that there are so many suggestions that they are (too) hard to overview. There may be many similar-looking options, e.g. similar looking parses. To get some visual aid, you can press Shift and hover the mouse pointer over the fragments of a suggestion. This will trigger a colouring effect: all suggestions with the same value at that position will be displayed with the same background colour (i.e. there will be as many different colours as there are different values in that position). By then clicking one of the suggestions (Shift still pressed), only the suggestions with the same colour will remain in view, all others are removed. This can be a way to narrow down the available choices.
Alternatively it is possible to switch to Incremental selection mode. In that mode disambiguation of the fragments is supported by showing only the alternatives for one specific position, starting with the first one from the left. After choosing the best option, the remaining alternatives for the second fragment are shown, and so on. When filtering for alternatives for a fragment, only the surface form is taken into account (different entries with the same value in the relevant field are shown as one).
The right mouse button context menu of a suggestion contains one option Don't show this suggestion again, which, if the analyzer supports this, will expel this output or this suggestion (i.e. this combination of elements in the suggestion) for this input from future suggestions.