Note: this recognizer runs on a web server, the audio file will be uploaded using HTTP.
How to use the Tag Vowels recognizer from within ELAN
The Tag Vowels recognizer written at the Max Planck Institute for
Psycholinguistics uses the Praat analysis of f0 (pitch) and volume
of a given audio recording to spot timespans which are likely to
correspond to spoken vowels. This can be interesting for measuring
speech rate (counting syllables per time) or for gathering
information about prosody: The raw Praat data is available as
additional output. The web service version has Praat installed on
the same server, but it is also possible to install the recognizer
on the computer where ELAN is running. In that case, Praat has to be
installed first. The Praat software is available for Linux and
Windows from UvA, it
is free open source software. The recognizer itself is available,
also as free open source software, from the MPI for
Psycholinguistics. This recognizer also demonstrates the various
types of possible input, output and setting objects within the
AVATecH and AUVIS frameworks.
- Input: A single audio recording in a file format
supported by Praat, for example uncompressed PCM *.wav files.
When installed locally, it is also possible to provide
precomputed Praat output as additional input file, overriding
the default processing of the recognizer.
- Settings:
- max_f0 defines a pitch ceiling: For example if a
recording is known to have no young or female speakers, this
can be set to 300 Hz to keep Praat from matching pitch in a
higher octave by accident.
- epsilon selects how much the volume (in decibels) has to
change to mark the edge of a peak. High epsilon only
triggers on sharp volume peaks, for example.
- Silence threshold selects the minimum amplitude (on a
scale from 0 to 1) required for pitch analysis. It is often
not interesting to assign pitch to relatively silent
timespans.
- Voicing threshold selects how regular a bit of sound has
to be to count as voiced. Noise is very irregular, vowels
are quite regular, a sine wave beep is very regular.
- Loglevel gives a choice of three verbosity settings.
Lowest is normal, verbose gives some additional information
and debug gives a lot of not normally interesting additional
information.
- Output: Two different sets of tiers, in two different
formats (default XML or compact CSV) each are available: One set
annotates average pitch and volume, the other set annotates
initial, final, minimal and maximal pitch. In addition, a
timeseries output option is available. This data can be
visualized by ELAN as curves of pitch and volume. The recognizer
produces it by gathering Praat output for each segment of
1/100th second. Unvoiced segments have undefined pitch, which
ELAN may visualize as gaps in the curve or as '0 Hz' pitch.
AVATecH and AUVIS compatible recognizers have the following
categories of settings, input and output elements:
- input media: ELAN automatically uses the first suitable media
file of your current annotation session, but you can change
that to other supported files belonging to the session. Very
few recognizers expect multiple input media files or extra
input files in 'timeseries' or recognizer-specific formats.
- input tiers: Some recognizers need input in the form of an
annotation tier, for example to select timespans of interest.
For some recognizers, the input is expected to be the output
of another recognizer. This gives you a chance to edit and
correct data - often simply tiers - between the two steps.
- numerical input: Recognizers can be configurable by
numerical 'knobs'. ELAN can show those as slider or field.
Recognizers often work well enough with defaults already.
- choice input: Recognizers can give you the option
to select settings from a pre-defined list. An example can
be 'verbose/normal/silent' messages or 'high/low' sensitivity.
ELAN shows drop down selectors here. In special cases, a
recognizer can also have 'any text' configuration items.
- output: Recognizers often produce one or more annotation
tiers. ELAN will offer to add those to your annotation
session as new tiers. It is also possible for recognizers
to output timeseries (which ELAN can show as curves) or
even audio, video or other files. Most recognizers only
produce zero or more tiers (plus log messages) as output.
It is often possible to selectively skip some output steps.
- log: You can open a window showing general messages from
the recognizer, tagged by type (e.g. DEBUG, INFO, WARN,
ERROR, RESULT or PROGRESS). Messages of higher priority
also update the processing status display, so they can
be seen directly without having to review the log text.
- basic or advanced recognizer settings: ELAN gives you
the choice to either hide or show 'advanced' settings. Default
values will be used for those settings which are hidden.
Your default ELAN configuration invokes a
CLAM
REST
web service wrapper on catalog.clarin.eu to have your files analyzed.
In other words, your media files and, if applicable, input tiers will
be uploaded for processing and ELAN will process the downloaded (tier
or other) results as if you had done the processing locally. For use
in situations where a web service can not be used (too large files or
no internet available) you can also request a copy of the recognizer
for local installation on Linux or Windows, protected by USB dongle.
For this and for general support with the use of this recognizer,
please contact auvis@mpi.nl or use
the ELAN and AUVIS forums on the website of
The Language Archive.
CLAM, ELAN and the client-side recognizer proxy are free open source
software under the
GNU
General Public License - however, some of the recognizers can be
propietary closed source software. Licenses for academic use are
available on request. Use of the web services is free at the moment,
but may be limited to the academic community if it becomes necessary.