echo delay detection with cepstrum analysis?

Hi!

I would like to detect an echo delay in the following scenario: two microphones record the same voice. Their distance is arbitrary (not more then several meters), but well known. Both soundtracks are synchronized and overlayed. I would like to restore/detect the delay time between the two recordings. This task seems to be pretty simple with cepstrum analysis: I do the following (e.g. in Matlab):

fft(log(fft(x)))

where x is the vector from the two overlayed recordings from both mics. I just take the first/most obvious peak in the resulting cepstrum as time delay in samples.

My question: what about environments, where different people speak together at the same time, but at different locations? I would still like to isolate a single person due to a distinct time delay. But I cannot find apropiate peaks in the cepstrum. There are many peaks, but they don't seem to relate to the actual time delays I would expect.

I need to have a better understanding of cepstrum analysis and how to read a cepstrum.

Is there anyone who could help me/point me to literature about echo delay identification?

Are there alternatives rather then cepstrum analysis?