ELAN offers various export options. To export, click on File > Export As and one of the options.
Apart from these export options for single files, ELAN also supports multiple file exporting options. More details regarding these options can be found here: Multiple file export options
Different ways to select tiers :
Select the tiers by checking the boxes before each tier name.
This tab shows a list of the tier types available in the current transcription. Select the types by checking the boxes before each type name. Selecting the types will select all the tiers of the each selected types. To modify the selected tiers switch back to By Tier Names.
This tab has a list of all the participants in the transcription. Select the participants by checking the boxes before each type name. Selecting the participants will select all the tiers of the each selected participants. To modify the selected participant switch back to By Tier Names.
This tab has a list of all the annotators in the transcription. Select the participants by checking the boxes before each annotator name. Selecting the annotators will select all the tiers of the each selected annotators. To modify the selected tiers switch back to By Tier Names.
This tab has a list of all the languages in the transcription. Select the language(s) by checking the boxes before each language name. Selecting the languages will select all the tiers of the each selected language. To modify the selected tiers switch back to By Tier Names.
To select multiple tiers, press Shift and click on the successive tiers or click and drag the mouse along the tiers to select them
Other options :
Similar to exporting a document to Shoebox (see Shoebox file) ELAN data can be exported to a Toolbox document with an UTF-8 encoding. This export provides more options for output customization.
To export a file into Toolbox, do the following:
The Toolbox Export dialog box appears:
Only the left part of ELAN tier names containing an @ are identified as
tier markers for Toolbox. These markers form a block in the exported file. The
right part of the ELAN tier names are identified as participant names. These are
exported with the marker ELANParticipant see the figure below:
If you use a Shoebox *.typ file to specify the Toolbox database type ELAN
extracts the database type name from the first line of the type file (e.g. the
database type name Text in \+DatabaseType Text
)
and puts is in the first line of the exported file (e.g. \_sh
v3.0 400 Text
).
When there is only one root tier (tier without a parent tier) in the transcription (e.g. ref) this will be used as the record marker by default. When there are multiple root tiers "\block" will be added as record marker. In both cases it is possible to specify a custom record marker instead.
Some options not touched up in Figure 1.33, “Toolbox Export dialog window”:
Make a choice and click on OK to continue.
The file is exported as a *.txt
|
*.sht
| *.tbt
file.
If there already exists a file of the same name, ELAN will ask you whether or not it should overwrite the existing file.
It contains the following information:
Each ELAN parent annotation (including all its referring annotations) corresponds to one Toolbox record. E.g., in the illustration below, the ELAN parent annotation “CLLDCh3R02S01.001” corresponds to the Toolbox record “CLLDCh3R02S01.001”.
Each ELAN parent annotation (i.e., each Toolbox record) contains the additional field markers \ELANBegin and \ELANEnd (i.e., the begin and end time of the parent annotation).
This time code information allows you to import the Toolbox file back into ELAN, without having to manually re-align the file (see Shoebox file).
ELAN allows you to export your project to the SIL Fieldwork Language Explorer
software, also referred to as FLEx. The data exchange is realized through
.flextext
files, a file type that defines several container
elements and attributes (see below), onto which ELAN's tiers (via their tier type)
and
annotations have to be mapped. For the configuration of these mappings the complex,
multiple step export window described below, is provided. Configuration will be less
complicated in case the .eaf was created by importing a FLEx
.flextext
file. On import, some FLEx attributes are "encoded"
in the names of tiers, on export these attributes are reconstructed by "decoding"
the
tier names. To better understand the options in the user interface, a simplified
representation of the structure of a .flextext
file follows
here.
<interlinear-text> <item lang="" type="">...</item> <paragraph> <phrase> <item lang="" type="">...</item> <word> <item lang="" type="">...</item> <morph type=""> <item lang="" type="">...</item> </morph> </word> </phrase> </paragraph> </interlinear-text>
All elements can occur multiple
times, e.g. there can always be multiple item
child elements for any
parent element.
If your .eaf file contains multiple participants, make sure you have given each participant a name value. You can set a participant value under Tier > Change Tier Attributes....
Choosing File > Export as > FLEx file … will give you the following screen:
In this screen you can specify:
interlinear-text
element and, if so,
which tier it is. This determines whether a tier and its dependent tiers provide
the contents for item
child elements of
interlinear-text
.
paragraph
element. If so, its
segmentation is used for grouping phrase
child elements, if not,
each phrase
will be embedded in its own paragraph
element.
The second screen allows to:
item
child element of the correct,
corresponding container element
item
type
attribute of the .flextext
morph
element. This should be a valid FLEx morph type. If this
option is deselected each morph
element will be exported with
attribute type="root"
.
The third screen allows to customize the FLEx lang
(language) and
type
attributes output:
type
is based on a FLEx controlled vocabulary, which could be
out-of-date at the time of use, therefore new values can be added manually. The
list of languages currently is based on "decoding" the tier names and on the
content languages of the tiers. The list can be empty, it should be filled
manually in that case.
FLEx requires that for languages that have both a two letter ISO 639-1 code and a three letter ISO 639-3 code, the two letter code should be used. This is not enforced by the export function.
The final screen allows you to save the file as a flextext file, so it can be used in FLEx.
On the third-party resources page of ELAN (https://tla.mpi.nl/tools/tla-tools/elan/thirdparty/ ), you can find a teaching-set which covers the aspects of importing from FLEx to ELAN and back to FLEx.
Chat labels must be preceded by * (for root tiers) or % (for dependent tiers). While root tiers have to contain exactly 3 characters, dependent tier names can have up to 7 characters.
All documents can be exported into a tabular format for purposes of further analysis and/or printing. This includes documents that were created by ELAN itself (see Creating a new document and Opening an existing document) as well as documents that were imported into ELAN from any of the supported formats. Do the following:
The Export as tab-delimited text dialog window is displayed, e.g.:
In the Export as tab-delimited text dialog window, select those tiers that you want to export. A check mark appears next to any selected tier.
By default, the output contains one annotation per row, with the tier name in one of the columns, time information in several following columns and then the annotation value.
Repeat values of annotations spanning other
annotations
the spanning annotation is put in each row
containing an annotation it spans. The spanning annotation is not in a row
by itself.
Only repeat within annotation hierarchies
limits the previous option. An annotation is only repeated if it is on one
of the ancestor tiers in the annotation hierarchy.
Sliced annotation output showing temporal
co-occurrences
is an alternative way to repeat annotation
values based on overlaps. In this export all unique begin and end times of
all annotations in the export are placed in one list, creating new
intervals (between each two successive time values). Each interval is
exported if there is at least one annotation overlapping that interval and
in the column of each tier the value of the overlapping annotation, if
any, is exported.
Include the annotation id
appends the
annotation identifier between brackets to the annotation value (e.g.
[a13]). This makes it possible to distinguish annotations in the output,
which is hard to do in the case of repeated values.
If you choose the SMPTE (hh:mm:ss.ff) format, the selected video standard (PAL or NTSC) just indicates the way seconds and milliseconds are converted to frame numbers. This is independent of the actual video standard of the associated video(s).
*.txt
saves a
tab-delimited text file, *.csv
saves the annotations in a
comma separated values file, placing all text values between double quotes. Make
an appropriate choice and click on Save.
Some Mac applications, like TextEdit, have difficulties to load UTF-8 encoded files. This is most noticeable for “special” characters, e.g. IPA. Using UTF-16 is recommended in that case.
A message appears to inform you that the file has been exported.
The contents and the layout of the exported file depends on the selected options. It can be opened with any program that can handle tab-delimited or comma separated texts, e.g., Microsoft Excel.
Some versions of Excel seem to have problems importing tab-separated files (white rectangles are shown instead of the column borders). As a workaround you can open the text file first in a text editor (e.g. Notepad) and copy and paste the content into Excel.
If your ELAN annotations contain syntactic elements, it is possible to export these to Synpathy[2] (see https://tla.mpi.nl/tools/tla-tools/older-tools/synpathy/). This function is available via File > Export as > Tiger-xml…
First select out of the candidate tiers the one you want to be exported.
Afterwards, map the tiers onto the correct description ("word" or "pos"). Finally
enter the name of the file (*.tig
).
This function (File > Export as > Interlinearized Text...) is very similar to ELAN’s printing system. Therefore more information can be found in Previewing the printed pages. The main difference is that the width of the exported text depends in this case on the number of characters that fits on one line.
After selecting an appropriate layout click on Save as and choose a location
and file name. These files can afterwards easily be edited with any text editor
(preferably using a fixed-with font). Optionally tick the Insert tabs
between annotations box if you prefer to have the white space between
annotations to be filled with tabs instead of spaces (especially useful when importing
a text file into Word). If Insert tabs between annotations
is selected, you could also have single tab instead of multiple white spaces. To do
that tick Tabs Instead of Spaces box if you prefer to have
tabs instead of multiple white spaces.
Similarly to the export to interlinear text (see Interlinear text file) you can also export annotations to a HTML file, through the File > Export as > HTML... menu.
The only extra option for the HTML export is
To play the media HTML 5 is required. It is necessary to place the exported html in the same location as the media file in order to play the file from the html export.
In some situations a straight-forward list of the annotation units, one after another, can be handy. For that cause an export option to a “traditional transcript text” has been added to ELAN. In its simplest form it just will create a text file containing the successive annotations of several tiers, in chronological order. This feature can be found under File > Export as > Traditional Transcript Text....
"Restrict to the selected time interval' allows you to export only the data that is currently selected. (see Making a selection on an independent tier).
'Wrap lines' sets a maximum number of characters before the line gets wrapped.
'Merge annotations on the same tier...' makes it possible to merge annotations on the same tier if the gap in between these annotations is less than a certain amount of milliseconds.
You can number the annotations, each wrapped line, and include or exclude tier labels or participant labels in the export.
One of the options enables you to include silences with a minimal duration. The figure shows there is a silence of 0.2 seconds between 'yeah' on the tier K-Spch and 'and then you go the other ...' on the tier W-Spch. The first annotation ends at 00:00:04.400 seconds and the next annotation begins at 00:00:04.600 seconds, resulting in a silence of 0.2 seconds. If this silence was shorter than the minimal silence duration entered in the export dialog window (20 ms in the figure), the silence will not be included in the exported file. The silence duration indication can have 1, 2 or 3 numbers of digits after the decimal.
Empty lines after each annotation (block) can also be included or excluded in the generated output file. Lastly, you can set a fixed width (in number of characters) for the tier labels.
The option to use Jefferson-style alignment based on "[" characters in overlapping annotations, can change the position of parts of annotations by vertically aligning corresponding "[" characters. (Alignment of matching "]" characters is not supported yet.)
This export function (File > Export as > Time-aligned Interlinear Text...) produces interlinear output but, unlike standard Interlinear Gloss, the formatting is based on time alignment . This is achieved by using a monospaced (fixed width) font in combination with a customizable character-to-milliseconds calculation factor. As a consequence, depending on this factor, the export might cut off part of the annotation value.
The export offers a few text styling options (underline, bold, italic) and the output format is (simple) HTML.
The ouput can be customized in various ways:
After changes in settings the Apply Changes button updates the preview. The Save As... button starts the actual export, currently html is the only supported format.
When you wish to work with your annotations in Praat, ELAN enables you to export your annotation to a Praat TextGrid. To do this, click File > Export as > Praat TextGrid.... In the dialog window that appears you can select the tiers you wish to export(How to select tiers) and specify whether you want to restrict the output to the selected interval.
After clicking OK, you can enter a file name and select an encoding. In addition to TextGrid files in the default encoding for the operating system, ELAN supports Praat TextGrid files with UTF-8 and UTF-16 encoding. Finally click on Save.
The preliminary export function File > Export as > WebAnnotation JSON... stores annotations according to the W3C Web Annotation Data Model specifications. This model and format are intended to enable sharing and reuse of annotations across applications and platforms.
The export window offers a few options to customize the output. Apart from the
possibility to select the tiers to export and to only export the selected interval,
there are a few format specific options which determine which information is included
and how it is structured. After changing settings, the Update
button applies the settings and updates the preview on the left side of the window.
The Export button initiate the actual export to a
.json
text file.
Sometimes it can be very useful to have a alphabetical list of (unique) words from one or more tiers. ELAN offers a way to generate such lists. Go to File > Export as > List of Words ... and select the tiers(see How to select tiers) from which you want to extract the words. The annotations of the selected tiers will be tokenized (split into words) using either a default set of delimiters or a user definable set. Check Count occurrences if you want the list to include the number of occurrences for each token. The Include overall totals in the export file option results in some basic overall statistics at the end of the file. The Include frequency percentages in the export option adds another column to the output, containing the percentage of each unique word (or annotation) of the total word count. After selecting tiers (or better, deselecting unwanted tiers) you can click OK and choose a file name. Clicking Save will save the word list.
ELAN supports export to SMIL[3]-compliant clips. With a suitable player this enables you to view media files and the associated annotations as a subtitled movie.
.
Exporting SMIL for Quick time is very much the same as exporting SMIL for real player (see Export SMIL for Real Player). To export SMIL for Quick time, go to File > Export As > QuickTime.... This will bring up a dialog box very similar to export SMIL for Real player . The only extra option which is not available for real player is Merge tiers into one QuickTime text file.If selected, all tiers are merged into one file and if not selected a separate text file will be generated for each tier. It is also possible to set a transparent background for the subtitles. This is done by selecting Transparent background in the dialog (see Figure 1.48, “Change subtitle text settings”) which pops up by clicking the Edit Font and Display Settings... button. Finally click on OK to export.
Another format you can export to from ELAN is QuickTime subtitle Text. To do this, go to File > Export As > QuickTime Text.... Select the tiers(see How to select tiers ) you want to be included in the subtitles. Optionally specify the following options:
Finally click on OK. By default the subtitles are
stored in a QTtext .txt
file. If you enter a file name with the
extension .xml
the subtitles are stored in a TeXML - tx3g
formatted XML file (the merge tiers option is ignored in that case).
Besides the QuickTime subtitle Text (see QuickTime Text) ELAN can export annotations to there
are few other subtitle formats: SubRip (.srt
), Spruce
(.stl
), Timed Text Markup Language(ttml)
(.xml
) and LRC (.lrc
) . Click on
File > Export As > Subtitle Text... and select the
tiers(see How to select tiers ) you want to include in the
subtitle file. Specify whether the subtitles should be restricted to annotations in
the selected time interval, whether the time of the selected interval should be
recalculated form zero and if the master media time offset should be added to the
annotations times. The third option lets you specify the minimal display duration
of a
subtitle. For instance, if a annotation is only 0.3 seconds long, but you want to
display a subtitle at least 0.5 seconds, enter 500 (ms).
After you have selected tiers and specified the options, click on OK. Enter a file name in the next window and click on Save.
Tiers for the recognizers are exported in the AVATech tier format. For more information on the AVATech tier format see https://tla.mpi.nl/projects_info/avatech/. Files can be exported as .txt, .csv and xml.
ELAN supports any command line tool that can extract clips from a video (or audio)
file. For that purpose it uses a script file named
"clip-media.txt
" which can be found in the folder where ELAN is
installed. In most cases some configuration needs to be performed in the script file,
e.g. which command line tool to use, before clipping can succeed. Therefore ELAN first
checks the (see Special ELAN data folder) for the presence of
the "clip-media.txt
" file, before trying this file in its
installation folder. By copying the customized "clip-media.txt
"
file to the data folder, the changes are accessible to all versions of ELAN.
Mac OS users will have a default execution line in
"clip-media.txt
" looking like this:
osascript ./scripts/qtp_clip_10_7_export.scpt $in_file $out_file $begin(sec.ms) $end(sec.ms)
Which means that an AppleScript script in the "scripts" folder will be executed when clipping media. There is also a pdf file in the ELAN installation folder to help Mac OS users with editing the syntax.
Windows users can e.g. put a copy of ffmpeg.exe (or ffmbc.exe for clipping mp4 files) in the folder where ELAN is installed (or modify the execution line such that the full path to ffmpeg is included). You can find ffmpeg and ffmbc online.
If you want to use the syntax for ffmpeg, remove the # in front of the line starting with 'ffmpeg.exe -i ......... If you want to use the syntax for ffmbc, remove the # in front of 'ffmbc.exe -vcodec copy....... Make sure the syntax you do not want to use has a # in front of it, this comments the line out.
The syntax for ffmpeg can be: ffmpeg.exe -i $in_file -vcodec copy -acodec copy -ss $begin(sec.ms) -t $duration(sec.ms) $out_file
ffmpeg.exe : the path of the application
$in_file : specifies the input file
$out_file : output file
vcodec copy -acodec copy : copy both the video- and audiocodec
$begin(sec.ms) : specifies the begin time frame of the clip
$duration(sec.ms) : the duration of the clip.
Look in the script file for more explanation and examples. If it is not possible
to edit the script file due to file permissions, copy
"clip-media.txt
" to the Special ELAN data folder (and modify it to use an absolute path to
the clipping application).
A few examples for command line tools are:
C:\ffmpeg.exe -i $in_file -vcodec copy -acodec copy -ss $begin(sec.ms) -t $duration(sec.ms) $out_file
C:\ffmbc.exe -vcodec copy -acodec copy -ss $begin(hour:min:sec.ms) -t $duration(hour:min:sec.ms) -i $in_file $out_file
To clip a media file first make a time selection and choose File > Export As > Media Clip using Script.... A dialog will appear in which you can set the file name and the location to save the clipped file to. You can specify more options for clipping in the Preferences dialog, see Editing preferences.
If you have more media files to be clipped, typing a file name with a extension in the 'Save as' dialog will use the same extension for all the files that will be clipped. If you want to use the same extension from the original media file for the clipped files, then don't type an extension with the file name in the 'Save as' dialog which prompts you to set the file name and location for the clipped media files.
To export an image from the ELAN window (i.e. to make a screenshot):
*.jpg
,
*.jpeg
, *.png
or
*.bmp
)
If you are using Windows, it sometimes happens that ELAN’s video window is black on the picture created using this function. This can be solved by temporary disabling the hardware video acceleration:
Don’t forget to re-enable the hardware acceleration afterwards, because this has a strong effect on the system’s graphical performance.
To export a Filmstrip Image first select the time segment you want the
filmstrip of. Then click File > Export As > Filmstrip
Image.... In the dialog window (see Figure 1.52, “Exporting to a filmstrip image”) you can define the width of each
video frame, which frames to include and whether ELAN must add a time code in each
frame. Moreover, ELAN can add the waveform, with or without a ruler, and specify the
height. You can also specify whether the stereo channel should be displayed separately
or merged or blended. Click on OK to generate the image.
Finally select a destination folder, enter a file name and click on
Save.
An example or an exported filmstrip image can be seen in Figure 1.51, “An exported filmstrip image”.
This option allows to save an image of a graphical representation of the density of annotations on selected tiers. This is the same functionality, with the same customization options, as in View > Annotation Density Plot...(Annotation Density Plot).
All Shoebox files that were imported into ELAN (see Shoebox file) can be exported back into Shoebox. In this case, the time code information is kept.
To export a file into Shoebox, do the following:
The Shoebox Export dialog box appears. Make a choice and click on OK to continue.
The file is exported as a *.txt
|
*.sht
| *.tbt
file.
If there already exists a file of the same name, ELAN will ask you whether or not it should overwrite the existing file, e.g.:
It contains the following information:
Each ELAN parent annotation (including all its referring annotations) corresponds to one Shoebox record. E.g., in the illustration below, the ELAN parent annotation “Ligya-001” corresponds to the Shoebox record “Ligya-001”.
Each ELAN parent annotation (i.e., each Shoebox record) contains the additional field markers \ELANBegin and \ELANEnd (i.e., the begin and end time of the parent annotation).
This time code information allows you to import the Shoebox file back into ELAN, without having to manually re-align the file (see Shoebox file).
ELAN supports importing file from :
There are also options in ELAN available to import multiple files at once. More details regarding these options can be found here: Multiple file import options
ELAN supports the import of documents from Toolbox, allowing you to link transcribed and/or interlinearized documents to the time axis of media files. In order to import from Toolbox, you need at least the following two files:
*.txt, *.sht, *.tbt
);
*.mpg
, *.mov
,
*.wav
etc.);
Optionally you can use the corresponding Toolbox database type file
(*.typ
). If this is not available, one has to provide a list
with field markers (= tier names).
If you do not know the Toolbox database type file, do the following:
*.txt |*.sht
|*.tbt
file in Toolbox. Make sure it is the active
window (click on it to activate it).
To import a Toolbox file into ELAN, do the following:
*.eaf
documents, the Toolbox file and the media
file(s) do not necessarily need to have the same name, and
they do not need to be in the same directory (see Basic Information).
If the Toolbox file contains both aligned (i.e. containing time information) and non-aligned records, the aligned ones will maintain the timing, whereas the location of the non-aligned records will be interpolated automatically.
An ELAN window containing the imported Toolbox file appears.
Instead of using a Toolbox *.txt
file, there is also an option in ELAN to define the field
markers yourself when importing a Toolbox file.
|*.sht |*.tbt
*.txt|*.sht |*.tbt
file
Some markers are already 'built-in' in ELAN and must not need to be set: ELANBegin, ELANParticipant, ELANEnd.
Once you have manually created a set of field makers, you might want to reuse them later on. ELAN provides support for this:
Once the import has succeeded, you can add a reference to a media file via the Edit > Linked Files… menu, as described in Changing the links to media files. If the imported Toolbox file was exported from ELAN before, you won’t need to establish the link to the media file(s) again, as in that case the location information is stored in the file.
ELAN imports Toolbox files according to the following conventions:
This addition is necessary because ELAN and Toolbox differ in how they code information about multiple speakers:
When importing texts by multiple speakers, ELAN splits each Toolbox field into several ELAN tiers (one for each speaker) and adds the speaker-ID to the tier label.
If speaker information is not specified in the Toolbox file, the extension @unknown is added.
The following screenshot illustrates how ELAN treats texts by multiple speakers:
Note that ELAN can only read speaker information if:
When the file is exported back to Toolbox (see Toolbox file(UTF-8)), the extension @‘Speaker-ID’ is automatically dropped from the field marker, and the Toolbox records are sorted according to their record marker (e.g., in the above illustration, “test 001” is sorted before “test 002” etc.)
If you define the markers yourself, then there also is the possibility to choose the Time Subdivision stereotype. For example:
The time alignment has to be done manually for each Toolbox record. Do the following:
If you do not activate the Bulldozer mode, you will inadvertently overwrite and thereby delete existing annotations. Make sure that Bulldozer Mode is enabled in the Options > Propagate Time Changes menu.
The parent annotation (together with all its referring annotations) is assigned to the new time interval. All other parent annotations are moved to the right.
The following screenshot illustrates steps 1 to 4:
After you have done the time-alignment, you can export the file back to Toolbox – in this case, the time code information will be kept (see Toolbox file(UTF-8)). If you then re-import the file back into ELAN, ELAN automatically assigns the Shoebox records to their correct time intervals.
An imported Toolbox file can be saved as an ELAN file (see Re-open recently accessed files), exported back into Shoebox (see Toolbox file(UTF-8)), or exported as a tab-delimited text (see Tab-delimited text file).
ELAN can import documents from the SIL Fieldworks Language Explorer (FLEx). This involves a few steps:
.flextext
file and relevant media files by clicking the
...-buttons.
.flextext
file
exported from FLEx. Optionally also add media files here (if not already in your
.flextext
file). There are options to exclude the
interlinear-text
and paragraph
elements from the
import, as well as the option to import participant information. When as
smallest time-alignable element the word
element is selected, the
time-alignment for that level will be lost when exported again to FLEx. In
.flextext
time alignment is stored on the
phrase
level.
phrase
, word
, morph
etc.) or, more
fine-grained, for each combination of major element plus item type
up to a combination of major element, the type and the language.
phrase
element in milliseconds. This has to be
set if the FLEx export files do not contain timestamps. When importing a FLEx file
that was
edited in ELAN before and exported as a .flextext
file, time duration
information has already been stored in the file.
The tier structure created after import in ELAN is roughly like in the example above. The mapping of the FLEx structure onto ELAN tiers follows the schema: <Speaker>_<element>-<item-type>-<language> Where the Speaker prefix is a generic label (A, B, C, ...).
FLEx tiers and their representation in .flextext
:
Word | <word> | <item type=”txt”> |
Morphemes | <morph> | <item type=”txt”> |
Lex. Entries | <morph> | <item type=”cf”> |
<morph> | <item type=”hn”> | |
Lex. Gloss | <morph> | <item type=”gls”> |
Lex. Gram. | <morph> | <item type=”msa”> |
Word Gloss | <word> | <item type="gls"> |
Word Cat. | <word> | <item type=”pos”> |
On the third-party resources page of ELAN (https://tla.mpi.nl/tools/tla-tools/elan/thirdparty/ ), you can find a teaching-set which covers the aspects of importing from FLEx to ELAN and back to FLEx.
It is possible to import CHAT files (used in e.g. the Childes project) in ELAN:
Some remarks about this import feature:
Remaining issues:
The feature to import Transcriber annotation files into ELAN works as follows:
*.trs
) and click on
Open
The transcriber tiers will be mapped on the ELAN equivalents:
A CSV (Comma Separated Values) or Tab-delimited Text (or Tab Separated Values) file is a text file in which one can identify rows and columns. Rows are represented by the lines in the file and the columns are created by separating the values on each line by a specific character, like a comma or a tab. CSV or Tab-delimited Text files can be compared to spreadsheets like the ones in Microsoft Excel in that they also have rows and columns. Note that .csv files can be created by Excel.
Take a look at Figure 1.68, “Tab-delimited Text”. The first row represents the event of a person saying 'so from here'. The first value (as well as the first column of the complete file) represents the tier name, the second and third represent begin time in different formats, the fourth and fifth represent the end time, the sixth an seventh represent the duration and the last value represents the annotation.
You are able to import CSV or Tab-delimited Text files in ELAN: File > Import > CSV / Tab-delimited Text File.... In the dialog window browse to and select a file that contains CSV or Tab-delimited data and click Open.
The second dialog window contains two sections (see Figure 1.69, “Import CSV / Tab-delimited Text”). The upper section shows a sample table containing data from the selected file. Both rows and columns are numbered. The lower section enables you to specify which columns to include and what data type they represent. This means that the format of the files is flexible: it is not prescribed what data is expected nor how it is formatted. The numbers of the columns in the Import Options section correspond to the numbers of the columns in the sample table. The data types you can select are:
Select at least one column with data type 'Annotation'. If you select a column for begin time, end time and duration, the latter will be ignored in the import process.
The option Specify first row of data
enables you to exclude a
header by excluding the first few lines. The option Specify delimiter
lets you specify the delimiter if ELAN did not guess the correct delimiter. The
delimiters supported by ELAN are comma, tab, colon, semi-colon and the vertical line
(vertical bar).
If you enable the option Default annotation duration
ELAN creates
all annotations from the selected file with durations equal to the number of
milliseconds specified. This option works only if there is no time data or only the
begin or end times.
Default annotation duration
will create annotation units with the
specified duration.
Skip empty cells
will leave out the cells in the csv that are
empty. Different tiers can be imported with different segmentations with this option.
Finally click OK to import the data. If a transcription document was open when starting the import, the imported tiers and annotations will be added to the already open document, otherwise a new transcription document is created with the imported annotations as its contents.
To demonstrate that the format of the imported file can be flexible, take a look at the following tab-delimited text:
In this example each column represents a tier with the tier names in the
first row and the annotation in the other rows. This file can be imported by selecting
the following import options:
Note that the Specify first row of data
option is set to 2.
As a consequence ELAN starts importing annotations from row 2 instead of row 1.
Furthermore, ELAN tries to extract tier names from the first line of the file if the
column they are part of is specified as 'annotation'. This results in this example
in
two tiers: K-Spch and W-Spch.
To merge a CSV file with an existing *.eaf
file, open the
*.eaf
file first and then choose Import
CSV/Tab-delimited Text File. For information on merging a CSV file
that has been imorted into a new document with an existing *.eaf file, please seeMerging transcriptions.
It is possible to import subtitles that are stored in the SubRip
*.srt
format: File > Import > Subtitle /
Audacity Label File.... HTML and similar formatting tags are filtered
out and multiple speakers are merged into one. The correct encoding of the file has
to
be specified in the import window.
Audacity Label files are a specific kind of tab-delimited text
(*.txt
) files. They can be imported here without the
configuration step that is part of the general Import CSV/Tab-delimited
Text File import.
If this import is started when a document is already open, the imported contents is added to that transcription. Otherwise a new transcription document is created.
ELAN offers the possibility to import a Praat TextGrid file: click on File > Import > Praat TextGrid File.... In the dialog window that now appears, you can browse to the file you wish to import. You are also able to include Praat PointTiers. When selecting this option, specify the default PointTiers annotation duration in milliseconds. Finally, check Skip empty intervals / annotations if you want to do so.
If there is already a annotation document opened in ELAN, the imported TextGrid is added to the document in one or more new tiers. If there is no annotation document opened, a new document consisting of the TextGrid data is generated.
In addition to TextGrid files in the default encoding for the operating system, ELAN supports Praat TextGrid files with UTF-8 and UTF-16 encoding.
It is possible to import a WebAnnotation JSON file via File >
Import > WebAnnotation JSON File..., the file extension is
.json
or .jsonld
. There are no
configuration options. The contents of the file should comply with the W3C Web Annotation Data
Model specifications, even though the import function only supports a subset
of those specifications (those elements that map quite naturally to ELAN elements).
Importing Tiers from recognizers will import the tiers in a new file if there is no file currently open in elan. But if a file is open, the tiers will be in the currently open file. To import the tiers from recognizers, go to File > Import > Tiers from Recognizer.... Selecting this option, first will prompt for the import file. If there is no file is open, the tiers are directly imported to the new file. But if a file is already open, then a 'Create tiers from segments' dialog appears. For more information about this dialog see Figure 2.14, “Silence Recognizer”.
Importing a document from Shoebox is very much the same as importing a document
from Toolbox (see Toolbox file). As with
the Toolbox import, information about the tier relations can be provided by means
of a
.typ
file or by using a marker file.
When reconstructing the vertical alignment of words on interlinearized markers, the position is recalculated based on the number of bytes per character. But in some files this leads to incorrect alignment, therefore this recalculation can be turned off by unchecking Correct alignment based on the number of bytes per character. This import also tries to take non-spacing characters into account.
[2] Synpathy is a tool for annotating, analyzing, and graphically editing the syntactical structure of sentences (e.g. Linguistically annotated text corpora), developed at the Max Planck Institute for Psycholinguistics. The application is based on the SyntaxViewer from the TIGER search project developed by the IMS (Institute für Maschinelle Sprachverarbeitung, University of Stuttgart).
[3] For a description of this standard and players see http://www.w3.org/AudioVideo/