Every audio format makes a different trade-off between file size, audio quality, and compatibility. Whether you are archiving a studio recording, planning podcast storage, or estimating how many songs fit on a phone, understanding how file size is calculated helps you make the right format choice. This guide explains the math behind each format and the practical implications for recording, streaming, and storage.

Uncompressed vs. Compressed Audio

Audio files fall into three categories: uncompressed (WAV, AIFF), losslessly compressed (FLAC, Apple Lossless), and lossily compressed (MP3, AAC, OGG). Uncompressed files store raw PCM data โ€” every sample value is preserved exactly. The formula is simple: size = sample_rate ร— bit_depth ร— channels ร— duration รท 8. A 3-minute stereo recording at 44.1 kHz / 16-bit takes exactly 31.1 MB regardless of the actual audio content.

Lossless compression (FLAC) applies algorithms that reduce file size without discarding any audio information โ€” like a ZIP file for audio. The compression ratio depends on the content: a sustained sine wave compresses dramatically, while dense broadband noise compresses little. A typical reduction is 40โ€“55% compared to uncompressed WAV. Lossy compression (MP3, AAC, OGG) goes further by permanently discarding audio information that psychoacoustic models predict humans won't notice โ€” primarily at high frequencies and quiet sounds masked by louder ones. This allows 10:1 or greater compression ratios with minimal perceptible quality loss at high bitrates.

Sample Rate and Bit Depth: What They Actually Affect

The sample rate determines the highest frequency that can be captured. According to the Nyquist theorem, a 44.1 kHz sample rate captures frequencies up to 22.05 kHz โ€” slightly above the 20 kHz upper limit of human hearing. The common 48 kHz rate used in video production provides the same audible bandwidth with a small safety margin. The 96 kHz and 192 kHz rates used in professional recording don't provide audible benefit for playback, but they are useful in production because digital processing (equalization, pitch shifting) introduces artifacts that are easier to manage when working at a higher sample rate.

The bit depth controls dynamic range โ€” the ratio between the loudest and quietest sounds a recording can contain. Each bit adds approximately 6 dB of dynamic range. 16-bit audio provides 96 dB of dynamic range, which matches the quietest listening environments. 24-bit audio provides 144 dB, well beyond the physical limits of any listening environment, but the extra headroom is useful in recording when input levels are uncertain. For distribution, 16-bit is the standard; 24-bit is an archival and production choice.

Choosing the Right Format for Your Use Case

For recording and archival, WAV 24-bit at 48 kHz or 96 kHz is the professional standard. Storage cost is the only trade-off โ€” a 2-hour session at 96 kHz / 24-bit stereo is about 41 GB. FLAC at the same settings reduces this to roughly 23 GB while remaining bit-for-bit identical for future re-editing. Avoid MP3 or AAC for archival because every re-encode introduces additional quality loss.

For podcast distribution, mono MP3 at 128 kbps is the industry standard. Voice intelligibility is excellent at this bitrate, and a one-hour episode is about 58 MB โ€” small enough for fast downloads globally. Stereo adds no benefit for voice-only content and doubles the file size.

For music streaming, AAC at 256 kbps is the format used by Apple Music and YouTube Music for their standard tier, while services like Tidal and Amazon Music HD stream FLAC at 44.1 kHz or higher for lossless subscribers. At 256 kbps, a 3-minute song is about 5.5 MB; at FLAC 44.1 kHz / 16-bit stereo, the same song is about 31 MB โ€” six times larger, with truly lossless quality.