Note: this recognizer runs on a web server, the audio file will be uploaded using HTTP.

How to use the Speaker Diarization recognizer from within ELAN

The Speaker Diarization recognizer written by the Fraunhofer IAIS (Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme) analyzes an audio file to group audio segments provided as second input by similarity. The used similarity measure aims to group all utterances by the same speaker as similar. By using a tier of speech segments as input and splitting the output into multiple tiers by their speaker number, a draft segmentation of an audio recording featuring multiple speakers can be created. This helps to reduce the time required to annotate common recordings.

AVATecH and AUVIS compatible recognizers have the following categories of settings, input and output elements:

Your default ELAN configuration invokes a CLAM REST web service wrapper on catalog.clarin.eu to have your files analyzed. In other words, your media files and, if applicable, input tiers will be uploaded for processing and ELAN will process the downloaded (tier or other) results as if you had done the processing locally. For use in situations where a web service can not be used (too large files or no internet available) you can also request a copy of the recognizer for local installation on Linux or Windows, protected by USB dongle.

For this and for general support with the use of this recognizer, please contact auvis@mpi.nl or use the ELAN and AUVIS forums on the website of The Language Archive.

CLAM, ELAN and the client-side recognizer proxy are free open source software under the GNU General Public License - however, some of the recognizers can be propietary closed source software. Licenses for academic use are available on request. Use of the web services is free at the moment, but may be limited to the academic community if it becomes necessary.