Assume you are given som 20,000 - 100,000 points of raw (x,y,z) navigation data as previously logged on some platform, and are asked to clean up the data.

My first, naive idea is to use some Kalman filter. So far so good.

However, the data might contain glitches (jumps) or other inconsistencies that need to be sorted out before feeding them to the Kalman filter.

How would you go about detecting glitches and maybe even do a first-pass correction of such data? A manual check of 100,000 data points doesn't seem as a very tempting prospect...

"Rune Allnor" kirjoitti viestissä: snipped-for-privacy@m73g2000cwd.googlegroups.com...

I don't know about the systems to exclude irregularities, I doubt if there are any without some presumptions, e.g. maximum distance between two consecutive points etc. I will stay tuned to here about such methods.

However, if you after getting rid of the exceptions (errors in data) want to reduce the number of points in your navigation data, you could apply my freeware GeoConv. With GeoConv you can reduce the number of points in your navigation data still maintaining the shape of the track as close to the original track as possible.

Anything that can be said to be a "glitch" or an "irregular feature". Some of the tracks seem to "jump" sideways on the order of 2Dx to 3Dx, where Dx is the along-track distance between measurements. Then they get back "on-track" within a variable number of samples, say 5 to 20 samples.

I have a couple of ideas about how to *detect* these things, for manual correction. But then, if there already exists a wheel, why re-invent it...

Sounds like you are witnessing innovation outliers in a small order AR process (since the signal take some time to get back on track after an outlier). You can try an m-estimate (robust) linear predictor, and threshold the prediction error for outlier detection. Are the outliers correlated in the three dimensions eg. you have one in x-axis, then you have one in y-axis?.

Alternatively, if you have to smooth the data afterwards in any case, you might be interested in the procedure described here:

G Doblinger: "Adaptive Kalman Smoothing of AR Signals distrubed by impulses and colored noise", Proc. 1998 IEEE Symp. on Advances in Dig. Filt. and Sig. Proc., June 5-6, 1998.

The paper is available on the net.

AR modeling can also be useful for automatic correction (interpolation).

If you are talking about several non-plausible points, try to check agains physical constraints like maximum velocity (1st derivative) or maximum acceleration/deceleration (2nd derivative). However, in any case you need the time. So either delta-t between two points must be always the same or you need tuples of {x,y,z,t}.

We'll see. My idea is a bit more simplistic/naive, to sheck deviations in forward direction between time steps. Yours is probably more computationally efficient.

I don't know. I haven't had a very close look at the data yet.

Thanks! I guess there are one or two journal papers based on this one, if the idea turned out to be good.

I was thinking along the lines of Wiener filters for data smoothing, but I guess we are in the same ball park.

The ideas are the same - differencing is the simplest form of linear prediction (where you predict the future value to be equal to the current value). The first difference filter can be interpreted as the prediction error filter. In general, predictors can be made data adaptive. However, differencing is a good first guess for a linear predictor. As Marcel writes, you can even threshold the prediction error based on physical criteria.

No, mine is more costly - you have to compute the data adaptive predictor (usually a new one every couple of samples).

This may not be of immediate help to you but there's a book titled "Detection of Abrupt Changes" online at

formatting link

.

Regarding correction, your problem sounds similar to that of click removal in audio which has been treated here before, I think, and is commonly attacked with LPC extrapolation.

Eliminating "stray" waypoints requires knowledge of the situation.

If you are travelling by foot and taking trackpoints every minute, then a point that is more than 250 metres off the general direction of travel is an exception. ( running at 15km/h for one minute) If it is an olympic runner, hen you need to increase that value.

If travelling on a bicycle, you need to know the condition. If on a main road, a trackpoint doesn't need to be that far off before it is an exception. But if travelling up a switchback on a mountain pass road in the alps, then a lot of your points will be "off course" but still very valid to map your track.

If travelling by car on the same road, your trackpoints may not be taken at sufficiently close intervals to log some of the switchbacks and log just a mostly straight road up the mountain with minor ddviations between each point.

If you have 3 points, A B and C, B can be considred an exception if the discance between B and the axis A-C is greater than some amount. (the aviation formulary has those equations to calculate distance between a point and an axis).

Now, if you have 4 points, A B C and D with B and C being odd, the above logic won't work because B will be within the limits of the A-C axis.

One thing you could do would be to compare the data with the Kalman filter output, and ignore it whenever there's too much difference. This causes problems if your system truly moves outside of your 'known good' band.

Another thing that works is to discount the data when it's glitchy, but not completely. You could either have two state evolution rules, one for when the data is 'bad' and another for when it's 'good'. This would result in the filter being influenced by genuinely bad data, but it would also result in the filter being able to find it's way home if it were way off track.

My understanding of Kalman filters is somewhat casual*, but I do know that the classic construction assumes linear systems excited by Gaussian noise, and quadratic cost functions. If your underlying systems aren't linear, if your noise isn't Gaussian or if your cost functions aren't quadratic then the optimal solution isn't linear anymore, so a plain ol' Kalman filter is ruled out -- search around on the term 'extended Kalman' for insight in that case.

this problem is the classical application for so called "robust statistical methods".

1) Decide for a window length of n data points. This window will be used to scan through your data for outlier and glitch detection.

2) Get the first n points of data (in case of your three dimensional problem, cover each dimension on its own)

3) Compute the MEDIAN of the data (= the 50% PERCENTILE) If in doubt how to compute the MEDIAN, check MATLAB help. The MEDIAN is a very good approximation to the mean value of your data. However, the MEDIAN is completely insensitive to outliers and glitches as long as the number of ourliers within your window is smaller then n/2. Note that you now have a outlier insensitive measure for the mean of your data. The MEDIAN is to robust counterpiece to the MEAN.

4) For all data within your window compute the absolute value of the difference between the data and the median and put the results of this operation into a buffer array.

5) Multiply all values in your buffer with a scaling constant 1/0.654.

6) Compute the MEDIAN of the the data in your buffer. The MEDIAN of the buffer values is the so called "MAD" (Median absolute deviation) of the original values within your window. The MAD is the outlier insensitive robust counterpiece to the normal standard deviation which is highly sensitive to outliers. The scaling factor of 1/0.645 makes the MAD directly comparable to the standard deviation for normal distributed values.

7) A rule of thumb in classical statistics is: If data is more far away from the mean than 5 time the standard deviation, it is most probably a outlier. Now that you have robust values available apply this rule to the robust values: If data is more than 5 time away from the MEDIAN than the MAD then it is most probably a outlier.

8) If a outlier is detected be sure to remove it from all diemensions.

9) Shift your window by one and go to step 3

10) Repeat for all data

Sounds as if a lot computing power is necessary for the algorithm and yes, indeed it is. Note that computing the MEDIAN involves ordering the data within your window in an ascending order and you need to compute a MEDIAN twice per window shift. Use a fast sorting algorithm!

I use the above method as my standard outlier detection routine whereever i expect to be confronted with outliers and it works very well. Note that the length of the window is a compromise between processing time and having enough values in the window to make significant statistics with. Note also, that this algorithm detects outliers fairly well but is not well suited if your your data contains inconsitencies like sudden steps (offsets that stay).

Regards

Ulrich

"Rune Allnor" schrieb im Newsbeitrag news: snipped-for-privacy@m73g2000cwd.googlegroups.com...

Right... things get messy when all the nice idealizations and assumptions turn out to be a bit too ideal... been there.

Let me know if you find one. All I have in my library (which I don't have access to for another couple of weeks) is an early 90s edition of Brown and Kwang(?).

You didn't say what kind of navigational system this is. I think that might affect the nature of what you see quite a bit.

I'm no nav system expert. I've simply had cause to use the output of nav systems for various purposes. Understanding the qualities of the nav data has been a somewhat low priority compared to my other tasks. That said.....

My experience with taking data from inertial navigation systems is that at first sight they look real glitchy. However, if you apply almost any simple LPF - say, a single pole LPF - with a suitable cut off frequency the data quietens down wonderfully. What appear at first sight to be glitches actually appear to be unbiased noise that simply has a very spiky quality.

No. The ideas were given to me here (sci.geo.satellite-nav) years ago while I was post processing some tracks data I had accumulated during a bike trip through australia. I built logic to simplify the number of points (remove those that do not define a change in direction, and remove those that are "stray".).

Another thing to consider: if you are a large cargo ship overating in the st-lawrence seaway, a stray point may be really easy to spot because a ship simply cannot move on a dime. But if you are a canoe on a rapidly flowing river, you may catch an eddy currnt that quickly pushes you off course.

But for an aircraft, you need to really define the requirements and handling os any glitch. Are you allowed to filter data out which will not get logged onto the flight recorder ? The advantage of aviation is that you can coorolate GPS with the primary instruments. (UINS and barometric altimetre). So if you see a glitch on the GPS but no glitch on the other systems, then the GOPS probably experienced a glitch. But if it is visible on all 3 systems, then it is probably turbulence etc etc.

PolyTech Forum website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.