Guidelines for adjusting the sound (audio) level when editing video

Published @2020-12-02 (Wednesday) Updated @2020-12-19 (Saturday)

Table of Content

When I produced a video to be distributed outdoors the other day, I took a research on the standards for volume adjustment (level adjustment) in video editing.

The purpose of my research was to find out what should be the actual standard when combining human voices (dialogues and interview voices), background music, sound effects, and environmental sounds in video editing. However, as I researched more and more, I found other information that I should understand, so I'll try to sum them up as well.

Conclusion

Let's start with the conclusion. I think you can create a well-balanced video/audio work if you use the following range as a guide for editing.

Overall target level: -6dBFS to -10dBFS
Dialogue (conversation, narration, etc.): -8dBFS to -12dBFS
Music (BGM) : -18dBFS to -30dBFS
Sound effects: -6dBFS to -10dBFS

However, the important thing is to make sure that the overall balance and the intention of the editing are expressed, so keep in mind that you should not become obsessed with this too much and lose the ability to add strength and weakness.

And finally, you will need to make adjustments according to the regulations of the platform you are going to distribute it on.

This is also related to the final adjustment of the loudness value.

Therefore, I think the following is a good flow for editing.

edit the individual audio using the above guidelines
adjust the final overall audio to the target level
check the specifications of the delivery platform and adjust the loudness value
check at the delivery location and device *And of course, take into account the client's intentions.

What is dB (decibel) and dBFS (dBFS)?

db (decibel) is a unit of ratio.

First, let's talk about units. If you are editing a video, the unit of volume you will see is dB (decibel). I didn't understand it at first, but this unit is actually not an absolute value, but a relative value.

According to Wikipedia(Japanese), it is as follows.

Decibel (English: decibel symbol: dB) is a physical quantity expressed as the normal logarithm of its ratio to a reference quantity. (English: decibel) (Omitted) (generalization, abstraction) (To generalize and abstract it) "decibel" is a physical quantity such as sound pressure, power, gain, etc., expressed as a [level expression](https://ja. wikipedia.org/wiki/%E3%83%AC%E3%83%99%E3%83%AB%E8%A1%A8%E7%8F%BE "Level representation").

* Citation: decibel - Wikipedia Wikipedia.

This unit was used by Alexander Graham Bell, the inventor of the world's first telephone. And deci is a prefix attached to the unit. So d is in lower case and B is in upper case.

Let's not delve into this in detail and move on to dBFS.

The decibel used in digital editing is dBFS, a unit of measurement for the magnitude of a digital signal.

The dB listed on the level meter of video/audio editing software such as Premiere Pro, Audition, Audacity, etc. is abbreviated, but actually refers to dBFS.

dBFS stands for "Decibels relative to full scale," which in Japanese is read as "dB full scale" or "decibel full scale.

This dBFS is a unit that expresses the size of a digital signal. That's why when we say dB in video editing software, we mean dBFS, which goes without saying, so FS is omitted.

So, from now on, unless otherwise supplemented in this article, you can read it as dB = dBFS.

The basic premise of sound editing is not to exceed 0dB.

Now, in this dBFS, the maximum value is assumed to be zero, which means that 0dB is the maximum value that can be expressed. That's why sound editors say, "Don't exceed 0 dB".

So, what happens if you exceed 0dB? The waveform will be crushed, which is called sound clipping.

Where should the overall level be set?

As mentioned in the conclusion at the beginning of this article, it is best to adjust the overall level between ````-6dBFS to -10dBFS```.

Where should I adjust the level of each sound?

I don't often adjust the entire sound from the beginning. First, the various sounds that make up a piece should be adjusted individually, and then the whole piece should be adjusted.

The various sounds and the approximate values for each are as follows

Voice (conversation, narration, etc.): -8dBFS to -12dBFS
Music (BGM): -18dBFS to -30dBFS
Sound effects: -6dBFS to -10dBFS

After editing each of the above, if the overall level does not reach -6dBFS to -10dBFS, adjust each of them again, or raise the overall level.

Finally, adjust according to your use.

The important thing is to check the specifications of the distributed platforms and the intention of the client.

Whether it is outdoor distribution, TV, YouTube, or web advertising, you will need to make adjustments according to the final use.

Each of them may have their own rules and regulations, and the client may have their own wishes. I think it is important to confirm this in advance, and to actually test the distribution (if it is outdoors, distribute it there if possible and check it).

Also, when distributing multiple video and audio works on the same platform, it is necessary to make sure that the volume of each work does not vary.

In this sense, it is important to clarify your ideal audio level.

[Example] Netflix specifications and recommended values

As an example of the specifications provided by a platform, let's look at the Netflix example.

Netflix seems to state that it should be within the range of -2dBFS to -20dBFS.

I guess the reason for this wide range is to express dynamic expressions in movies and dramas.

Do not exceed +18db (-2 dbfs) maximum level (true peak) over reference of -20 dbfs, achieved by peak limiting and not lowering the mix level

Citation: Netflix Sound Mix Specifications & Best Practices v1.1 - Netflix | Partner Help Center

A reference degree "loudness" used in recent years in the broadcast industry and elsewhere.

We have been talking about dBFS so far, but starting with the broadcasting industry, it seems that the digital media industry, including the music industry, is now using a unit and value called loudness to balance the volume of works.

I heard that the standard value of loudness was actively discussed during the transition to digital terrestrial broadcasting.

Loudness is a standard that indicates the loudness of sound perceived by humans.

At the beginning of this article, I mentioned that the decibel is a unit of ratio and not an absolute value. In other words, since the decibel is a relative value, a reference value is required. This reference value is called reference level, and in the case of digital audio editing, -16 dB is often set as 0VU in most of the devices.

There's another new unit.

VU is an acronym for Volume Unit, meaning unit of sound. VU is an electrical signal expressed as a value, which is different from the sense of "how the human hearing perceives the sound".

This means that the dBFS used in digital editing is basically -16dB = 0VU, and as I mentioned at the beginning of this article, even if you have a reference value, you may feel that it is too high or too low depending on the type of sound.

"Biased DTM Loudness Dictionary" in the website "g200kg Music & Software", which provides a comprehensive explanation of words related to DTM, explains "loudness" clearly, so I quote it in this article.

In recent years, a more accurate way to measure the loudness value, which is the loudness of sound perceived by humans, has been defined as a standard, and the meter to measure this is called "loudness meter. In order to eliminate volume differences between programs in broadcasting and other fields, operational standards have been established to control the loudness value, and in Japan, these standards have been in operation since October 2012.

Citation: What is Loudness:Loudness | Biased DTM Dictionary - DTM / MIDI Terminology Meaning and Explanation | g200kg Music & Software.

In addition, the Japan Advertising Agencies Association's document "Application of the "Audio Level Operation Standards", the standard for carrying in TV commercial material," also discusses operation using loudness values.

As the human ear has frequency characteristics (e.g., high-pitched sounds are louder than low-pitched sounds), the actual perceived loudness of sound does not necessarily match the measured value of a VU meter. (Omitted) Tt is also possible to use a VU meter to measure the loudness of a sound, which takes into account the human hearing characteristics. Citation: Regarding the application of the "Sound Level Operation Standards", the standard for carrying in TV commercial materials.

Loudness value units

There are several units for this, but LKFS or LUFS seems to be used as an equivalent unit.

Let me quote the aforementioned "Biased Dictionary of DTM Terms".

The standards for "loudness" are "ITU-R BS.1770-2" and "EBU-R128", which are defined in Japan as "ARIB Technical Standard T032". The ITU-derived "LKFS" or EBU-derived "LUFS" is used as the unit for digital signal loudness, and although there have been some twists and turns in the revision of the standards, these two units are the same in the revised standards currently in use.

Quote: Loudness: What is Loudness? Meaning and explanation | g200kg Music & Software.

The recommended loudness value for TV broadcasting is -24 LKFS.

This seems to be determined by an international standard. The target level is set at -24 LKFS, and ±1 dB of that is acceptable.

Here, I cite data from the "Japan Science and Technology Information Dissemination and Distribution System" (J-STAGE), an electronic journal platform operated by Japan Science and Technology Agency (JST).

The target value of the average loudness value of a program is called the target loudness value, and this value is specified as -24 LKFS in the international exchange standard. The operationally acceptable range of the average loudness value of a TV program is the target loudness value ±1 dB. For programs where "creative policy requirements" are given the highest priority depending on the content of the program, it is possible to produce a program with a target loudness value lower than this standard.

Citation: Audio Level Management for TV Programs Using Loudness Measurement Method (J- STAGE).

However, it seems that these standard values and acceptable ranges vary depending on the region or country.

-14LKFS is the recommended loudness value for non-TV platforms/services generally?

The above was just the standard for TV programs.

So, what about video and music distribution platforms such as YouTube, Netflix, and Spotify?

As we will see later in the table, the standard seems to be generally around -14 LKFS.

Why is it larger than the "-24 LKFS" standard for TV programs?

This is just a guess, but the following may be the reason.

The TV remote control has a wide range of volume that can be controlled (so if the standard was little lower, not a problem).
YouTube, Spotify, Nico Nico Douga, etc, which volume is controlled with PCs, smartphones, etc. have a narrower range of volume control than TV remote controls, so they are set louder in advance.

I wonder if the reason why Netflix is so close to TVs at -27LKFS is because this service is often viewed on TVs and the Netflix app is built-in to modern TVs and game consoles that are plugged into TVs.

I'm curious about the truth.

Loudness Normalization to Reduce Volume Variation

Have you ever been watching a YouTube video and had to adjust the volume each time because the volume varied from video to video?

Unlike TV shows, YouTube is a platform where anyone can post a video. As a result, the standard of volume varies from person to person, and some videos may be posted without paying attention to the volume.

This has led to the phenomenon of different volume levels for different videos.

Loudness normalization is a mechanism or function to solve this problem.

Loudness normalization is a process or function that determines a standard loudness value, and then adjusts audio data that is above or below the standard value.

Modern streaming services use loudness normalization to keep the volume somewhat constant.

Depending on the application of the streaming service, the loudness normalization function can be turned on or off at will, but if you don't have any particular intention, I think it's fine to leave it on. I can't think of any disadvantages to turning it on.

[Example] Target Loudness Values for Four Services

The following table summarizes the target loudness values of four major video and music distribution services.

The following table summarizes the target loudness values of four major video and music distribution services. They seem to normalize their contents based on these values.

Service Name	Target Loudness Value	Remarks
YouTube	-14LKFS(LUFS)	Judged from the information on the Internet as there is no official data.
Spotify	-14LKFS(LUFS)	-
NicoNico Douga	-15LKFS(LUFS)	If the volume of the entire video is below the standard value, do nothing.
Netflix	-27LKFS(LUFS)	-

Should you care about the loudness value in video editing?

You should.

As mentioned above, YouTube and other streaming services normalize audio data according to their default loudness value, so if the content exceeds the standard value, the system will lower the value to an appropriate level, but it is probably best to edit with the loudness value in mind beforehand.

There's no need to change the dB-based editing you've been doing.

It is the final volume that we care about the loudness value. So, in the process, I think it's fine to edit based on dB as usual, referring to the audio level meter.

Therefore, we can use the concluding value mentioned at the beginning of this article, because the balance of sound in a piece of work can often be better achieved by using the reference value as a guide.

Then, in the final adjustments, you can use the loudness value to check the overall audio level and adjust it according to the specifications of the platform.

How to adjust the loudness value in Premiere Pro or Audition

So, if you can't check the loudness value in your editing software, no worry. Premiere Pro and Audition can display and adjust the loudness value.

It's impressing.

Both of them provide an effect called "Loudness Rader", so you can use it.

The usage of each seems to be similar, but check out the official video and article below for how to apply and set it up.

How to apply the Loudness Radar effect in Adobe Audition.

Below is the official tutorial video.

How to apply the Loudness Radar effect in Adobe Premiere Pro.

Here is the official tutorial article. <a href="https://helpx.adobe.com/premiere-pro/user-guide.html/premiere-pro/using/loudness-radar.ug.html#" target="_blank" rel="" noopener">Adobe Premiere Pro User Guide

Here is the summary of the official article.

Open the Audio Track Mixer. This can be found by clicking "Window > Audio Track Mixer".
In the Audio Track Mixer panel that appears, select the audio you want to apply the effect to, and from the drop-down menu in the "fx" location at the top, select "Special > Loudness Radar" and Loudness Radar.
After the effect is applied, double-click on the effect name to show the loudness radar.
If you open the "Settings" tab of the Loudness Radar, you can edit the target loudness value.

Summary

Sound is one of the most important factors in the production of video and audio works.

It is important to set a target loudness value for each individual, team, or project, because it will prevent the volume from varying each time you produce.

Then, adjust the volume according to final distribution platform, distribution location, environment, and client's intention.

[Aside] TV standards differ by 2dBFS between commercial broadcasters and NHK

The reference level (0VU) for broadcasters seems to be set as follows.

NHK: -18dBFS = 0VU
Japanese commercial broadcasters(The Japan Commercial Broadcasters Association): -20dBFS = 0VU

This means that there is a difference of about 2dB between NHK and commercial TV programs. Perhaps you feel NHK is a little bit louder.

[Aside] With terrestrial digital broadcasting, the sound quality produced by production studio reaches the viewers as it is without being compressed

According to the aforementioned data from J-Stage, in addition to compressors and limiters during program editing, compressors and limiters were applied to analog broadcasts while the program was being delivered to the home (during transmission). The reason for this was that if the audio level swings too much, the video would be affected by noise.

On the other hand, with digital broadcasting, this restriction has been removed, and the audio data of TV programs can now be delivered to the home as it is. This has the merit of delivering the sound exactly as producer intended, since no compressor or limiter is applied, but it also has the disadvantage of making variations in sound more noticeable if no standards are set.

This is one of the reasons why normalization of volume by loudness value has started to be considered.

Quote.

There should be much more I need to know.

How do I do actual audio editing?

I'd like to put this in a separate article.