Whilst testing the FaceFX SDK to extract word and phoneme timings from audio and text, we encountered a case where a words timing is off quite a bit. We were wondering is this is a known issue, and if so if a fix is available / if newer versions fix this issue. The problem also happens in the same way in FaceFX Studio.
In this case for a recorded sentence "go I won't be long", the first word "go" is not identified at the correct time, but ends up as a very short word at the end of the pause leading up to the next of the sentence.
We checked the wave file and it does not seem to be clipped at the start and the text “go” is quite clearly audible.
We can send screenshots and audio, however at the moment emails to your support and info email address bounce from our site.
This is a known issue related to the fonix speech detection not initializing correctly. It normally happens when speech starts right away (which is common in game audio since it is normally aggressively clipped)
You can resolve the issue by setting the "a_detectspeech" variable to false, then reanalyzing the file:
set -n "a_detectspeech" -v "false"
You can also set this variable only when analyzing this file by setting the "detectspeech" variable in a .fxanalysis config files to "false":
https://facefx.com/documentation/2017/W267
This issue should be rare, and the speech detection (once initialized correctly) can improve results on other files, so it is recommended to keep speech detection turned on, but find the few cases that analyze poorly as a result of the bug and turn it off. The batch sumarrizer plugin is useful for finding those audio files that analyze poorly as a result of something like this.