You usually don't need a script like the following. I just finished writing it. Its use makes sense for old video recordings with considerable volume changes.
The audio is normalized second by second instead of all at once. The script uses a maximum amplification value (35dB) to prevent every silence from becoming a din.
For this script to work, you must have "ffmpeg" and "sox" installed.
Happy hacking!
#!/bin/bash input=input.mkv output=output.mp4 audio=audio.wav newaudio=combined.wav ffmpeg -i "$input" -vn -ar 44100 -ac 2 "$audio" ffmpeg -i "$audio" -f segment -segment_time 1 -c copy out%06d.wav for f in out*.wav do # detects volume in decibel MAX=$(ffmpeg -hide_banner -i "$f" -map 0:a -filter:a volumedetect -f null /dev/null 2>&1 | grep 'max_volume' | awk '{print $5}') # removes the minus sign (only if $MAX starts with a "-" (wildcard matching)) if [[ $MAX == -* ]]; then MAX="${MAX:1}"; fi # set a maximum volume amplification if (( $(echo "$MAX > 35.0" | bc -l) )); then MAX="35.0"; fi echo $f" -> "$MAX ffmpeg -i $f -af "volume="$MAX"dB" max$f done # Before merging the audio files with sox, we need to set up an high max number of files to be concatenated ulimit -n 16384 # https://www.spinics.net/lists/sox-users/msg00167.html sox maxout*.wav $newaudio rm *out*.wav rm $audio # now we replace the old audio with the new audio (https://superuser.com/a/1137613) ffmpeg -i "$input" -i "$newaudio" -c:v copy -map 0:v:0 -map 1:a:0 "$output" rm $newaudio