This model searches for a given set of keywords within speech in an audio file, and if they are found, returns the timestamps at which they occur. This capability can be especially useful for identifying sections and entities of interest within audio and video recordings.

