Memory allocation error in WAV files created by arecord

19 June 2023

The ALSA tool arecord provides a very easy way to record audio. You can run it in the terminal to start recording audio, which is progressively written to a WAV file. You then press Ctrl+C to stop recording. The resulting WAV file can then be played back or edited in a variety of programs.

For my purposes, I run it in the background, then send a SIGTERM signal to stop the recording, which is a valid way to stop according to the man page. However, I found that certain programs gave some variation of “memory allocation error” when attempting to open a WAV file that had been generated using this SIGTERM method of stopping the recording.

As a workaround, opening the WAV file in Audacity and saving it again proved to “fix” the WAV file, and allowed all programs to open it successfully. Taking a diff between the two WAV files in a hex viewer showed a lot of bytes being changed, so didn’t help narrow down what exactly was causing the issue, but did lead to hypothesizing that something in the WAV header was perhaps responsible.

This turned out to be true. Specifically, the culprit was the part of the WAV header that indicates the size of the audio content. It was indicating a size of ~2GB, which is far bigger than correct! The actual size of the audio content was in the order of dozens of kilobytes. This would explain the various “memory allocation error” messages; the programs which trust the WAV header attempt to allocate 2GB of memory just for a 12kb WAV. Presumably Audacity has some special handling for this case.

At a guess, the root cause of this issue is that when arecord first creates the WAV file it does not know the size of the audio content, so puts 2GB as a placeholder. Presumably when the SIGTERM is received, arecord stops writing the WAV file but does not update the total size in the header to reflect how much was actually written. Note that I have not confirmed this guess.

Ultimately, the fix was to use this very helpful fixwav Python script, which reads the WAV file to determine the size of the audio content, then makes minimal changes in-place to update the header. Thank you kcarnold for sharing your script!

Update 2024-05-01 §

The original fixwav Python script was published by kcarnold on 2011-09-17 (13 years ago!). No changes were made until 3 months ago when the following updates were applied:

On 2024-04-03 a license was added to the Python script.
On 2024-04-16 the script was refactored to work for Python 3.9+ (previously was for Python 2.7). Thanks to KarelVesely84 for this improvement.

I’ve included a copy of the script with this update below:

`fixwav` §

fix_wav_length.py

#!/usr/bin/env python3
# python 3.9+ (using PEP585 type hints)

import struct

from typing import BinaryIO

def readAt(f: BinaryIO, at: int, num_bytes: int) -> bytes:
    f.seek(at)
    b = f.read(num_bytes)
    if len(b) != num_bytes:
        raise RuntimeError('At %d, insufficient bytes read!' % (at,))
    return b

def expect(f: BinaryIO, at: int, content: bytes) -> None:
    b = readAt(f, at, len(content))
    if b != content:
        raise RuntimeError('At %d, expected %r, got %r' % (at, content, b))

def replace(f: BinaryIO, at: int, content: bytes) -> None:
    f.seek(at)
    f.write(content)

def readChunk(f: BinaryIO, pos: int) -> tuple[bytes, int]:
    chunkId = readAt(f, pos, 4)
    chunkSize = struct.unpack('<I', readAt(f, pos+4, 4))[0]
    return chunkId, chunkSize

def fix_wav_length(f: BinaryIO) -> str:
    """
    Fix wav length in the wav header.
    This the length value might be incorrect,
    especially when reading wav from a streaming pipeline.

    The length is corrected "in-place".
    It can be also used as a library function.

    :param f: Opened file descriptior with `mode='rb+'`, or os.BytesIO object.
    :return: String "DONE".
    """
    f.seek(0, 2) # seek to end
    fileSize = f.tell()
    f.seek(0)
    expect(f, 0, b'RIFF')
    pos = 8
    while True:
        chunkId, chunkSize = readChunk(f, pos)
        effSize = (chunkSize + 3) & (~3) # round up to 4
        if chunkId == b'WAVE':
            pos += 4
            continue
        if chunkId == b'data':
            replace(f, pos+4, struct.pack('<I', fileSize - pos - 8))
            replace(f, 4, struct.pack('<I', fileSize - 8))
            return "DONE"
        pos += 8 + effSize

if __name__ == '__main__':
    import sys
    for filename in sys.argv[1:]:
        try:
            with open(filename, 'rb+') as fd:
                fix_wav_length(fd)
        except RuntimeError as e:
            print("Error processing %r: %s" % (filename, e.message), file=sys.stderr)

README.md

What happens when you lose power during recording? Well...

This little tool fixes the length information in the WAV file
header. To use it, just run `python fix_wav_length.py 1.wav 2.wav ...`

The WAV format is flexible, meaning the header can include arbitrary
other data as well. All the other "fixer" tools I found ignore this,
thinking that all WAV headers are 44 bytes. A few even check before
corrupting your file. This tool is a quick and dirty coding job and
I'm not proud of it, but it can deal with at least a few different
kinds of WAV headers and tries not to corrupt your file. Make a backup
anyway, though.

Oh, and be careful with files over 2 or 4 GB. Up to almost 4 GB should
probaly work, but the WAV format only gives you 4 bytes for a file
size including headers. Go ahead and fork and patch this to work like
http://offog.org/darcs/misccode/fix-wav-length if you need; it should
be easy.

LICENSE

MIT License

Copyright (c) 2024 Kenneth C. Arnold

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Tagged
Python

Update 2024-05-01 §

fixwav §

`fixwav` §