Reading a Waveform Audio File

By: Kevin Scardina

Abstract: This article discribes what a waveform audio file is and how to read one.
Reading a Waveform Audio File

Table Of Contents
Objective
What is a Waveform Audio File?
The Structure of a Waveform Audio File
The WaveFormat Structure
Reading the Waveform Audio File
Conclusion
References

Objective TOC

        The objective of this article is to explain how digital audio data is stored in waveform audio files. The article explains what a waveform audio file is, how the data is stored, and how to read the data out. This article assumes that the reader has the following knowledge:
  • C/C++
  • Windows 32 API
  • File Streams (File I/O)

What is a Waveform Audio File? TOC

        A waveform audio file is the most common audio file format on Windows PC's. It is the simplest way to store sounds on your computer. They are in Resource Interchange File Format (RIFF), which stores headers and data in variable length "chunks", that always start with a four letter code like "WAVE", or "data". It stores the "data" as PCM (Pulse Code Modulation) data. Almost all of the time a waveform audio file does not compress its PCM data, this makes the files large, but easy and quick to read.

Structure of a Waveform Audio File TOC

        The structure of a waveform audio file is quite elegant and logical. The file has three different "chunks" or sections to it. The first section is the "RIFF" section. The reason I call it the "RIFF" section is because all of the "chunks" start with a four byte key, or "id", that the file must have for it to be valid. The first sections key is the four byte's "RIFF". After the first key comes the file size. This is a long value that describes the size in bytes of the file. Next we have the second part of the "RIFF" "chunk", the form-type id. This is always a four byte value, and for waveform audio this value must be equal to "WAVE".

That finishes the "RIFF" "chunk", which takes us to the "fmt " "chunk". This chunk stores the structure that contains the format information for the PCM data. So, the next four bytes after "WAVE" must be the four bytes "fmt ". Note that I am quoting the space after fmt. This space is important and must be there. After the "id" "fmt " comes a long value that represents the size of the wave format structure, followed by data block that contains the wave format structure information. The wave format structure info ends the "fmt " "chunk" and brings us to our last "chunk" the "data" "chunk".

The "data" "chunk" of course starts with the four bytes "data", and is followed by a long that describes the length of the data. After that all that is left is the actual PCM data.

This table show the structure of the Waveform Audio file:
Waveform Audio File Structure
Type Size Value
First "Chunk" ID 4 bytes (char [4]) "RIFF"
File Size 4 bytes (long) Size of the file in bytes
Form Type ID 4 bytes (char [4]) "WAVE"
Second "Chunk" ID 4 bytes (char [4]) "fmt "
Wave Format Size 4 bytes (long) Size in bytes of the Wave Format Info
Wave Format Info Wave Format Size Information on the format of the PCM data.
Third "Chunk" ID 4 bytes (char [4]) "data"
Data Size 4 bytes (long) Size in bytes of the PCM data
Data Data Size The actual PCM data

The WaveFormat Structure TOC

        The WaveFormat Structure, is a struct that contains the format of the PCM data that is stored by a Waveform Audio File. Windows gives you this structure in the mmreg.h header file. The structure looks as follows:

typedef struct {  
    WORD  wFormatTag; 
    WORD  nChannels; 
    DWORD nSamplesPerSec; 
    DWORD nAvgBytesPerSec; 
    WORD  nBlockAlign; 
    WORD  wBitsPerSample; 
    WORD  cbSize; 
} WAVEFORMATEX; 
 

Members:

	wFormatTag

		Waveform-audio format type. Format tags are registered with Microsoft Corporation for many compression
		algorithms. A complete list of format tags can be found in the MMREG.H header file.

	nChannels

		Number of channels in the waveform-audio data. Monaural data uses one channel and stereo data uses two
		channels.

	nSamplesPerSec

		Sample rate, in samples per second (hertz), that each channel should be played or recorded. If wFormatTag is
		WAVE_FORMAT_PCM, then common values for nSamplesPerSec are 8.0 kHz, 11.025 kHz, 22.05 kHz, and 44.1 kHz. For
		non-PCM formats, this member must be computed according to the manufacturer's specification of the format
		tag.

	nAvgBytesPerSec

		Required average data-transfer rate, in bytes per second, for the format tag. If wFormatTag is
		WAVE_FORMAT_PCM, nAvgBytesPerSec should be equal to the product of nSamplesPerSec and nBlockAlign. For
		non-PCM formats, this member must be computed according to the manufacturer's specification of the format
		tag.
		Playback and record software can estimate buffer sizes by using the nAvgBytesPerSec member.

	nBlockAlign

		Block alignment, in bytes. The block alignment is the minimum atomic unit of data for the wFormatTag format
		type. If wFormatTag is WAVE_FORMAT_PCM, nBlockAlign should be equal to the product of nChannels and
		wBitsPerSample divided by 8 (bits per byte). For non-PCM formats, this member must be computed according to
		the manufacturer's specification of the format tag.
		Playback and record software must process a multiple of nBlockAlign bytes of data at a time. Data written
		and read from a device must always start at the beginning of a block. For example, it is illegal to start
		playback of PCM data in the middle of a sample (that is, on a non-block-aligned boundary).

	wBitsPerSample

		Bits per sample for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM, then wBitsPerSample should
		be equal to 8 or 16. For non-PCM formats, this member must be set according to the manufacturer's
		specification of the format tag. Note that some compression schemes cannot define a value for
		wBitsPerSample, so this member can be zero.

	cbSize

		Size, in bytes, of extra format information appended to the end of the WAVEFORMATEX structure. This
		information can be used by non-PCM formats to store extra attributes for the wFormatTag. If no extra
		information is required by the wFormatTag, this member must be set to zero. Note that for WAVE_FORMAT_PCM
		formats (and only WAVE_FORMAT_PCM formats), this member is ignored.

Reading the Waveform Audio File TOC

        Now that we know how the file is structured, and know all about the WaveFormat structure, all we have to do is read the file in. The following code does just that.

#include <mmreg.h>
#include<string>
#include<fstream>

/*--------------------------------------------------------------------------------**
** This function takes in a filename, a pointer to a WAVEFORMATEX structure that  **
** will be set, and a char* databuf, which will be allocated and filled with the  **
** PCM data in the file filename.                                                 **
**                                                                                **
** On error it will throw a string with a proper error announcement               **
**--------------------------------------------------------------------------------*/

void OpenWaveFile (const std::string &filename, WAVEFORMATEX *pwfx, char* databuf){

    std::ifstream file;
    char csID[4];
    long fsize;
    long wfxsize;
    long datasize;

    file.open (filename.c_str (), std::ios_base::binary);
    if (file == NULL){
        std::string s ("Unable to open file ");
        s += filename;
        throw s;
    }

    file.read (csID, 4);
    if (std::string (csID) != "RIFF"){
        std::string s (" does not have a valid RIFF ID.");
        s = filename + s;
        throw s;
    }

    file.read ((char*)&fsize, 4);
    file.read (csID, 4);
    if (std::string (csID) != "WAVE"){
        std::string s (" is not a valid WAVE form-type.");
        s = filename + s;
        throw s;
    }

    file.read (csID, 4);
    if (std::string (csID) != "fmt "){
        std::string s (" does not have a valid wave-form chunk ID.");
        s = filename + s;
        throw s;
    }

    file.read ((char*)&wfxsize, 4);
    file.read ((char*)pwfx, wfxsize);
    if (wfm ().wFormatTag != WAVE_FORMAT_PCM){
        std::string s (" is not of type PCM wave format.");
        s = filename + s;
        throw s;
    }

    file.read (csID, 4);
    if (std::string (csID) != "data"){
        std::string s (" does not have a valid data chunk ID.");
        s = filename + s;
        throw s;
    }

    file.read ((char*)&datasize, 4);
    databuf = new char [datasize];
    file.read (databuf, datasize);
}

Conclusion TOC

        Thats all there is to a it. You now have a buffer with all your PCM data in it and a WaveFormat structure that tells you what you need to know about the PCM data. You can now take the PCM data and play it though the sound device.

Here is a list of all the main points in this article:
  • Waveform Audio is in RIFF.
  • There are 3 "chunks" in a waveform audio file. The three "chunks" are the "RIFF", "fmt ", and "data" sections
  • The WaveFormat structure contains all the PCM format information.
  • The "data" section contains all the PCM data.

References TOC

Inside DirectX, Bradley Bargen and Peter Donnelly, "Microsoft Press", 1998.


Published on: 3/21/2000 9:08:08 AM

Server Response from: ETNASC04

Copyright© 1994 - 2013 Embarcadero Technologies, Inc. All rights reserved.