Glitch/inconsistency detection in navigation data

Rune Allnor · 2006-06-26T10:21:37+00:00

Hi folks.Assume you are given som 20,000 - 100,000 points of raw (x,y,z)navigation data as previously logged on some platform, and are askedto clean up the data.My first, naive idea is to use some Kalman filter. So far so good.However, the data might contain glitches (jumps) or otherinconsistenciesthat need to be sorted out before feeding them to the Kalman filter.How would you go about detecting glitches and maybe even do afirst-pass correction of such data? A manual check of 100,000 datapoints doesn't seem as a very tempting prospect...Rune

J

JF Mezei 20 years ago

How would your method handle a situation where one is riding a bike at

30km on a very flat road for 6 hours in a steady speed, and for one minute, you have this kamakaze downhill where you reach 80km/ h and then ride up a hill at 10kmh for 7 minutes. Would it "filter out" those readings of going down and then up the hill ?

Vote

J

JF Mezei 20 years ago

Another aspect to consider:

When I processed my australian track logs, those points, taken at 1 minute intervals, that were very near to each other were in fact VERY significant because this indicated a place where I stopped ( in some cases, I want those to stand out to show places where whater was available).

So from a simplification of the trackpoints to draw a route, those points can be removed, but you want to mark those points as significant stops and possibly create a waypoint for them.

So one must really be familiar with the activity and how the data was collected before selecting some logic to simplify a series of points.

Another aspect: many GPS have an "auto" feature to write track points into the log. Generallly, track points are written when there is a change in speed/direction and possibly at some regular intervals. But if you do not know the exact logic used by the unit to decide if a trackpoint is to be written or not, you cannot really decide how to remove trackpoints statistically.

Vote

R

Rune Allnor 20 years ago

Ulrich Bangert wrote: lots of interesting stuff.

Thanks. Sounds like something to look into. Processing speed is (as of yet) insignificant if it can release man-hours for other duties. Where I am right now, man-hours are expensive. If a computer needs 12 hours for this sort of job, then so be it, if it can be done in the human operator's time off watch.

Rune

Vote

S

Steve Underwood 20 years ago

Related fun things happen in radar tracking. There you usually have reasonable x and y estimates (based on the r and theta you measure), and a weak z measurement (based on a very limit accuracy angular measurement). You can track a target and predict based on what is a reasonable side to side turn rate. However, a fighter can roll over then do a hard downward half loop and go back in the direction it came. Your nicely filtered track basically stops abruptly and goes backwards, and your weak height information doesn't always help that much in tracking in 3D. Airliners are easy to track, but the interesting targets are a real pain.

This kind of filtering is like the tale I heard of the world's most accurate weather forecasts. I was told a DJ in the Caribbean spent years saying the weather would be whatever it was yesterday. Because the weather there has long period of similar conditions punctuated by abrupt changes, his accuracy was very high, but his usefulness was zero. :-)

Regards, Steve

Vote

U

Ulrich Bangert 20 years ago

To Rune:

On a typical pc with a window width of 100 I process 600000 data points in

1-2 minutes, so it is not THAT slow that my first mail may have indicated. I use this algorithm for example in a freeware software named "Plotter". You can download "Plotter" from my homepage

formatting link

If you manage to load your data files with that (chances are..) you can immediatly test the quality and the speed of the outlier detection.

To JF Mezei:

I you managed to figure out exactly what the algorithm does, you will have noticed that for detecting outliers everything is significant, that is INSIDE the window, nothing else. For that reason, if this algorithm is applied to the scenario you present, the first thing to say is, that it does not matter at all whether you have been riding for 6, 12, 18 or anything hours before you meet the hill. The algorithm is completely insensitive to that!

The window is something like "If you want to detect outliers look only to values in the neighbourhood and decide what is normal and what is not for them". Please note also, that your scenario arises the question for a definition of "oulier". Other people would pehrhaps think that the "hill scenario" IS indeed a outlier that should be removed while you think it is very significant. Note, that the algorithm can fit BOTH kind of views by adopting the window length. If you make the window length greater than 2 X the "hill length" then the hill will be completely removed from the data. If you find that the hill is significant then make the window length smaller than 2X the "hill length", in this case the hill will not be filtered out. By applying the rule "a event shorter than n/2 may be a outlier" YOU decide what is an outlier not the algorithm.

I cannot accept your second objection, it is a outlier detection algorithm, not a biker's rest detection algorithm. But if you want to put forward the question whether the rest will be detected as an outlier or not, the same rules apply as above: If the window length is set to value so that the length of the braking action before stop and the window length "match" then the stop will be recognized as a "normal" change in data

Regards Ulrich

"Rune Allnor" schrieb im Newsbeitrag news: snipped-for-privacy@p79g2000cwp.googlegroups.com...

Vote

R

Rune Allnor 20 years ago

I'll definately have a look into this. Your first post indicated you have programmed these things in matlab? If so, there is a speed-up potential here. I usually get a speed-up on the order of 10-50x when I port from matlab to C or C++.

Rune

Vote

J

JF Mezei 20 years ago

Ok. fair enough. But that still leaves the requirement that the user know about the type of data that he has to process, the types of irregularities which must be retained, and those that can be removed because this will be needed to decide on the window size. And one also need to know how the data was collected.

Say on a long straight road, a car turns off and drives 100m to a water hole/pump. With periodic trackpoint recording, you could have a couple of stray points. With "auto" track recording, chances are very good that the GPS would record a point at the turnoff, one point at the stop for water, and again a point once the car gets back to main road and turns back into the normal direction.

Now, both would have a couple of stray points from a purely "mathematical" point of view. But in the second case, a human could more clearly see a path away from road and back to the road at the same intersection to resume course.

So one must really understand the "event" as well as how the data was recorded for that event before starting to process such data and eliminate points judged to be "bad".

Vote

U

Ulrich Bangert 20 years ago

Rune,

as a dedicated follower of PASCAL i program in Borland DELPHI which produces native code that i do not suspect to be significantly slower then C/C++ generated code. But over the years I have found that the Matlab help system gives me information about mathematical topics at exactly the level that seems to match me, that's why i pointed to it. If Plotter does not read your files, then (in case they are ASCII) send me a few lines of it. I am very interested to make my file read routines as universal as possible, so every no-go is a object of interest.

Regards Ulrich

"Rune Allnor" schrieb im Newsbeitrag news: snipped-for-privacy@d56g2000cwd.googlegroups.com...

Vote

U

Ulrich Bangert 20 years ago

Hello JF Mezei,

Agreed!

I am not sure if i interprete the term "auto track recording" in the right way. Perhaps it is even a "standard" term in navigation that i am not aware of (I have seen the question for outlier detection purely from a mathematical point of view). But if it is some kind of "event driven" track recording you are of course right that the proposed algorithm can not handle data acquired in this way because some frontend entity has already made the decision what an event is and what not and has missed to acquire the "surrounding data" that are necessary for the algorithm.

Regards Ulrich

"JF Mezei" schrieb im Newsbeitrag news: snipped-for-privacy@teksavvy.com...

Vote

J

Jerry Avins 20 years ago

...

One must always understand data in order to analyze and interpret it meaningfully. Believing otherwise is like believing that someone can manage a business without understanding its nature.

Jerry

Vote

M

Mogens Beltoft 20 years ago

I read somewhere, that some GPS units use a boundary and time check when recording track points in auto mode.

It went something like this:

If the new sampled track point n is outside the "road" defined by track points n-1 and n-2 plus a margin to each side of the line n-2 to n-1, or the unit has not recorded a track point for "this long", then record track point n.

/Mogens

Vote

J

JF Mezei 20 years ago

Change in speed also causes a track point to be recorded on sime Garmin units. And I think that change in heading also does. I don't think Garmin ever documented the algorythm.

Vote

S

Steve Underwood 20 years ago

Tell that to an MBA. :-)

You are quite right. When I asked what kind of nav system this is, there was no response. All the discussion has been about hypothetical something or others, rather than real world improvement of specific problems in the data.

Steve

Vote

R

Rune Allnor 20 years ago

My apologies for that, Steve. These *are* real-world data from real-world systems, but as I don't know exactly where the limits go for corporate "hush-hush" I'll rather play it safe for now. And, of course, I can guess but I don't necessarily know the important details.

No offence to you or any other. I saw a way of doing things and I wanted to know what the alternatives, preferably commercially available, are. As there seems to be no canned solution available, I'll probably have to program some Kalman filters myself and play with them until I get a sense for how these things work and how to incorporate the various ideas. Rune

Vote

Glitch/inconsistency detection in navigation data

Join the Discussion

Didn't find your answer?