Hi!
I would like to detect an echo delay in the following scenario: two microphones record the same voice. Their distance is arbitrary (not more then several meters), but well known. Both soundtracks are synchronized and overlayed. I would like to restore/detect the delay time between the two recordings. This task seems to be pretty simple with cepstrum analysis: I do the following (e.g. in Matlab):
fft(log(fft(x)))
where x is the vector from the two overlayed recordings from both mics. I just take the first/most obvious peak in the resulting cepstrum as time delay in samples.
My question: what about environments, where different people speak together at the same time, but at different locations? I would still like to isolate a single person due to a distinct time delay. But I cannot find apropiate peaks in the cepstrum. There are many peaks, but they don't seem to relate to the actual time delays I would expect.
I need to have a better understanding of cepstrum analysis and how to read a cepstrum.
Is there anyone who could help me/point me to literature about echo delay identification?
Are there alternatives rather then cepstrum analysis?