SFL-TXT Converter

Introduction

This program is designed to allow for extracting the labels associated with regions of a sound track from SFL files generated by Sound Forge and exporting them to TXT files as recognized by Audacity*. The reason for such a task is as follows: I heard about a certain linguistic lab, where the research assistants had to use Sound Forge as archaic as v4.5 to markup sound files containing human speech. The goal of such a markup is rather obscure to me, but in layman's terms it has something to do with determining the composition of the sound flow (different sounds, pauses, etc). A brief investigation has proven Audacity to be no less applicable for the said task (and in some respects even superior), except that the existing markup files are already abundant in numbers and stored in binary SFL files, whereas Audacity comprehendeth them not. Naturally, being a penguin-fearing Linuxoid I decided to write a simple converter that would let those people embrace free software instead of worshipping proprietary.

A sample result

Features

Currently, the program can import labels from Sound Forge SFL, Audacity TXT and SubRip SRT files, as well as export them therein. As Sound Forge relies on sample rate to store time-related parameters, the user can additionaly specify the sampling frequency rate (defaults to 22050 Hz, can be altered in converter.h file before compiling). SRT support is auxiliary, for SRT cannot contain overlapping regions and needs the regions to be sorted in chronological order, both traits being not the case of the first two formats. However, the initial files originated from TV series sound track, so I decided to leave an option for making a subtitle file. Also I thought it may be handy to be able to use an existing SRT file as a draft for more detailed markup. Nonetheless, working with SRT files is not the first priority of this converter, and you most likely will need to polish the result in a specialized subtitle editor of some kind, e.g. Gaupol.

A note on character encoding: the converter treats the labels "as is", not performing conversion of character sets of any kind. Therefore, if the encoding of source and target files should be different, you should use iconv or similar software do change the encoding of the TXT or SRT files. Trying to convert SFL files in the same manner will be futile.

Since the version of Sound Forge in question was 4.5, I cannot guarantee compatibility with SFL files generated by any other version of Sound Forge, although I cherish the hope that this format is unlikely to change. SFL files are assumed to be created by saving the regions list into a separate SFL file; should there be some other kind of SFL files, those most probably have nothing to do with the issue here.

The development system was Debian Lenny on x86, compiling on x86_64 not tested.

Draft SFL format description here (in Russian): sfl_description_ru.html

Download

Grab the latest tarball with the source code from the project's files page:
https://sourceforge.net/projects/sfltxtconverter/files/

Manual

The current manpage can be viewed here: sfl2txt.html

Usage

To compile the converter type

$ make

in the directory containing its source code.
If you so desire, you can install the program to /usr/bin by

# make install

However, this is not necessary and the converter will work right from the build directory.

To remove installed software, type

# make uninstall

To apply the program, use

$ sfl2txt [-i <infile>] [-o <outfile>] [-r <sample_rate>] [-f <source_format> [-t <target_format>]

Depending on the file extentions, the converter will perform the appropriate task. The sample_rate should be set equal to the sample rate of the corresponding sound file, as the SFL format relies on it to store starting positions and lengths of the regions. The default value is 22050 Hz, as was the case in the original files to be processed. Failure to set the correct sample rate will result in wrong timing in the outfile.

Example:

$ sfl2txt -i samplefile.sfl -o samplefile.txt -r 44100

-- will convert from SFL to TXT assuming the sample rate of 44.1 KHz. As of version 0.5, in absense of -i or -o option, the program will read from standard input and write to standard output, respectively. In that case, use -f and -t options to change input and output formats if the default values of "from sfl" and "to txt" are not what you want. If filenames are specified in -i or -o options, setting -f and -t options will override formats determined by filename extension.

Example:

$ sfl2txt -i ../sample.sfl | iconv -f cp1251

-- will convert from SFL to TXT and convert the encoding from CP1251 to the current system encoding.

Also, an -s option generates the output filename based on the input filename, changing its extension as appropriate (e.g. from 'sfl' to 'srt', if that is the conversion mode). If there was no extension, it will be simply appended. Finally, you'll have to rely on -f and -t options to set desired conversion mode.

$ sfl2txt -i ../sample.sfl -s

-- will save the output to '../sample.txt' (default setting is sfl to txt, 22050 Hz).

Terms of use

This piece of software is licensed under GNU GPL v.3 or later, refer to COPYING file for full details.

* — Strictly speaking, the format is as follows:
start_sec\tend_sec\tlabel_text
where start_sec and end_sec are floating point numbers of seconds with precision up to 6th decimal digit.