User:Demi/audio

This documents my audio setup and how I use different tools to record spoken articles.

Hardware/ Platform

I have a Dell Latitude D600 running OpenBSD 3.5 (-current as of 6/1/2005). I use a Radio Shack clip-on condenser microphone (it was about $11), and Sony over-the-ear bud-style headphones.

Software

I use sox, a fairly common "swiss-army" knife for processing digital audio, the tools from the Ogg Vorbis toolkit and supporting libraries, and flac.

Recording

I use the following bash function to record an audio file:

 soxrec()
 {
    sox -t ossdsp -w -s -c 2 -r 44100 /dev/sound -t raw -w -s -c 1 -r 44100 "$@"
 }

 soxrec Article_name-sect.cdda

The sox command has the general form "sox [input options] input-file [output-options] output-file. The options above require some explanation:

-t: The filetype. The input file is the sound device and requires this filetype.
-w: Take 16-bit samples ("words").
-s: Record samples in signed format.
-c: Number of channels. My sound system is only capable of producing a stereo (-c 2) stream of samples. Thus, I'm specifying -c 2 on input and -c 1 on output, which causes sox to downmix the two channels to one.
-r: Sample rate. 44100 Hz (that is, taking 44100 samples per second) is the CD digital audio standard, and what we're using for Spoken Wikipedia. My sound system has no problem sampling at 44100 Hz or other rates; I understand some audio systems can only sample at specific rates, so you may need to use a rate of 48000 (more common for PC audio systems) or another rate instead. If you specify a different input and output sampling rate, sox will downsample appropriately.

The ".cdda" extension is because of the resemblance to CD digital audio; this may not literally be the case, but these files are ultimately temporary and it's just a convention I'm using.

Method

I record the article paragraph by paragraph, numbering each section (the section number is given in its "edit" link) as I go. So, for example, the first paragraph of the introduction is Article_name-00a.cdda, the second paragraph is Article_name-00b.cdda, etc.

Since the resulting sound files are raw audio data (i.e. they lack a descriptive header like a .wav file or any form of compression or encapsulation) they can be simply concatenated using cat.

The sox command also allows you to manipulate the files in various ways, I use the "trim" effect, for example, to cut the file up on time-boundaries, such as when only part of it needs to be re-recorded or I've left too long a pause at the end.

I use the "compand" filter to level and normalize the audio. First, I concatenate the raw files into a .wav file, as shown below, with the "stat" filter. The "stat" filter will print a Volume Adjustment: recommendation for normalizing the audio--that is, making it as loud as possible without "clipping." I use that as the third argument to the compand filter. The argument list for "compand" is then "0.1,0.3 -60,-60,-30,-15,-20,-12,-4,-8,-2,-7 volume". I also find that applying a lowpass filter with an argument of 4000 Hz helps to soften sibilants and reduce some noise.

Anyway, to produce a .wav file (this makes it slightly more convenient to encode, though the encoders all accept raw audio data):

 cat Article_name-*.cdda | sox -t raw -w -s -c 1 -r 44100 - -t wav -w -s -c 1 -r 44100 Article_name.wav lowpass 4000 compand 0.1,0.3 -60,-60,-30,-15,-20,-12,-4,-8,-2,-7 vol

I always listen to the full resulting .wav file to make sure it sounds okay before encoding. I usually decide to re-record one or two parts.

Encoding

I use oggenc to encode the .wav file with the appropriate bitrate (I'm currently using an average bitrate of 64 kb/s, not the project's recommended 48 kb/s, because it sounds better to me).

 oggenc -d 'YYYY-MM-DD' \
   -a 'Demi @ Wikipedia' \
   -t 'Article title' \
   -c source="From Wikipedia, the free encyclopedia: http://www.wiki.x.io" \
   -c copying="Licensed under GNU Free Documentation License: http://www.fsf.org/licensing/licenses/fdl.txt" \
   -c article="http://en.wikpedia.org/wiki/Article_title" \
   -c version="HH:MI, YYYY Mon DD UTC" \
   -b 64 \
   -o Article_title.ogg \
   Article_title.wav

Archiving

I want to make an attempt to keep article readings up-to-date as the articles changes (or at least if they change significantly). So, I keep the individual section files around, compressed with the flac utility:

 flac --channels=1 --bps=16 --sample-rate=44100 --sign=signed --endian=little Article_title-*.cdda

Then, I remove the *.cdda and .wav files. When an article changes, I can re-record the changed sections, concatenate them back together and re-encode.

To re-produce the appropriate raw audio stream for input to sox to make a .wav file, as above, I do something like:

 flac -dc --force-raw-format --endian=little --sign=signed *.flac | ...

Note that I'm using an Intel computer: your endianness may vary.