Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] How to Locate time at which peak volume occurs?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Multimedia
View previous topic :: View next topic  
Author Message
Skotlex
Guru
Guru


Joined: 13 Mar 2004
Posts: 306

PostPosted: Tue Nov 07, 2023 3:18 pm    Post subject: [SOLVED] How to Locate time at which peak volume occurs? Reply with quote

EDIT: The solution is to use ffmpeg rather than sox, sox's silence filter is bugged. To extract the time of peak, from pretty much any file, is to first normalize the file, and then use ffmpeg:
Code:

> sox Lucky.flac normalized.flac gain -n -1
> ffmpeg -loglevel error -hide_banner -i normalized.flac -af silenceremove=start_periods=1:start_threshold=-1.01dB:window=0 -y peak.flac

peak.flac will contain the audio file with the beginning removed up to the peak value (peak time in original is then `soxi -D Lucky.flac` - `soxi -D peak.flac`). -1.00dB as threshold works for a significant chunk of my library, but in some cases -1.01dB was needed as threshold.

----

In brief, I have a need to determined the time at which the Peak volume level happens in an audio file. If there are multiple peaks with the "same" value, finding the first one is good enough.

And I've reached a deadend, due to lack of understanding of the information provided (I've been unable to find information online on how to do this), so I am asking for help here now. :)

My current approach is using sox/loudgain. Loudgain can print out the peak value of an audio file.

For instance:
Code:

loudgain Lucky.flac
Track: Lucky.flac
 Loudness:   -14.76 LUFS
 Range:        1.90 dB
 Peak:     0.656641 (-3.65 dBTP)
 Gain:        -3.24 dB


Using the Peak value (which has no unit, and from my research online, seems to be percentage of digital peak volume, stated in range 0~1), I thought of using sox to trim "silence" from the beginning of the file, until said peak is found. Sox's documentation on silence states:
Quote:

silence [-l] above-periods [duration threshold[d|%]
[below-periods duration threshold[d|%]]

Removes silence from the beginning, middle, or end of the audio. `Silence' is determined by a specified threshold.

The above-periods value is used to indicate if audio should be trimmed at the beginning of the audio. A value of zero indicates no si‐
lence should be trimmed from the beginning. When specifying a non-zero above-periods, it trims audio up until it finds non-silence.
Normally, when trimming silence from beginning of audio the above-periods will be 1 but it can be increased to higher values to trim
all audio up to a specific count of non-silence periods. For example, if you had an audio file with two songs that each contained 2
seconds of silence before the song, you could specify an above-period of 2 to strip out both silence periods and the first song.

When above-periods is non-zero, you must also specify a duration and threshold. duration indicates the amount of time that non-silence
must be detected before it stops trimming audio. By increasing the duration, burst of noise can be treated as silence and trimmed off.

[...]

duration is a time specification with the peculiarity that a bare number is interpreted as a sample count, not as a number of seconds.
For specifying seconds, either use the t suffix (as in `2t') or specify minutes, too (as in `0:02').

threshold numbers may be suffixed with d to indicate the value is in decibels, or % to indicate a percentage of maximum value of the
sample value (0% specifies pure digital silence).


However, this is where I am stuck. Because when I use:
Code:

sox Lucky.flac Lucky2.flac silence 1 1 0.656

I get the whole song back. If I use
Code:

sox Lucky.flac Lucky2.flac silence 1 1 0.656%

I get the whole song back. If I use
Code:

sox Lucky.flac Lucky2.flac silence 1 1 65.6%

I get an empty song back.

Through trial and error, the maximum % from which I get a non-empty song is
Code:

sox Lucky.flac Lucky2.flac silence 1 1 25.3%

The other alternative, is to find the decibel value for sox's threshold, using the same approach as before, the highest two-decimal value which produces a non-empty file is:
Code:

sox Lucky.flac Lucky2.flac silence 1 1 -11.93d

(both give me a file of 1:50 duration when the original is 5:06, which allows me to reach my goal, since the peak value is roughly at timestamp 3:16 of the file)

Then, what's happening with these values given to me? How can
Code:

 Peak:     0.656641 (-3.65 dBTP)

be converted to 25.3% or -11.93d? :/

Trial & error (using successive numbers to converge to the answer) isn't a suitable solution since I need to apply this process to hundreds of files.

PS: Solutions to my issue using other tools are acceptable (just not java, most other languages are acceptable), as long as they can be used by a script (any non-gui tool that outputs to stdout the timestamp will do (or just print the total number of seconds until peak). The audio files I process are all either mp3, flac, ogg, or opus.


Last edited by Skotlex on Sat Nov 11, 2023 12:43 am; edited 1 time in total
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9693
Location: almost Mile High in the USA

PostPosted: Tue Nov 07, 2023 4:41 pm    Post subject: Reply with quote

an idea is that you should "normalize" the file before searching for peak. I'm sure there's a way in sox to basically louden a file such that the highest peak hits 100% (0dB) and then do peak detection, now this should be applicable to any file.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Skotlex
Guru
Guru


Joined: 13 Mar 2004
Posts: 306

PostPosted: Tue Nov 07, 2023 10:44 pm    Post subject: Reply with quote

eccerr0r wrote:
an idea is that you should "normalize" the file before searching for peak. I'm sure there's a way in sox to basically louden a file such that the highest peak hits 100% (0dB) and then do peak detection, now this should be applicable to any file.


This sounded like a good idea, until I realize that... I think sox is bugged. Its functioning defies my understanding.

If I first normalize a file, the peak should be very near -0db, and that, I can achieve. Just in case, I normalize it to peak at -1db (recommendation):
Code:

> sox --norm=-1  Lucky.flac Lucky2.flac
> loudgain Lucky2.flac -q

Track: Lucky2.flac
 Loudness:   -12.24 LUFS
 Range:        1.85 dB
 Peak:     0.892115 (-0.99 dBTP)
 Gain:        -5.76 dB


So yes, the peak now is at -1dB. However, in trying to use sox to clip silence, the maximum treshold which gives me a non-empty file, isn't -1d, -0.99d, or even -2db, but... -9.27d:
Code:

> sox Lucky2.flac Lucky3.flac silence 1 1 -9.26d
> soxi -D Lucky3.flac
0.000000
> sox Lucky2.flac Lucky3.flac silence 1 1 -9.27d
> soxi -D Lucky3.flac
110.378896

Why is the expected right parameter off by an order of magnitude? O_O
Using a % parameter, the maximum value that gives me a file with length >0 is 34.40%
Code:

> sox Lucky2.flac Lucky3.flac silence 1 1 34.40%
> soxi -D Lucky3.flac
110.378875
> sox Lucky2.flac Lucky3.flac silence 1 1 34.41%
> soxi -D Lucky2.flac
0.000000

It really should be a value near 100%, yet... I don't understand.

Probably will file a bug-report, and see if I can use this strange mystic number of -9.27db on other files, or maybe the bug is an order of magnitud off, so it will really work on most files. :x

Thanks for an idea that allowed me to make some sort of awkward progress!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Multimedia All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum