(This is partially in response to Suresh's wonderful series on clustering.)

Recently, my advisor (Dave Mount) and I have been kicking around the idea of how to define the problem of clustering over time. Imagine birds traveling in a flock from Canada to Mexico, and suppose that you are able to perfectly observe their locations over time (of course, for any practical use we'd need to come up with a better model, but for now...). Given all these locations, you'd like to be able to identify the flock as a cluster. If this flock crosses another flock (say, stupidly traveling from California to Maryland), you'd like to be able to identify these as two separate flocks, even if the "current" time is the one at which they're all in the same place.

Now, what about if the birds' pattern is a bit more complicated... Suppose that instead of staying nicely within the same approximate radius as they fly, the radius of the cluster varies drastically. Perhaps all the birds fly away from each other for some amount of time, then converge back to a central point, then fly away again, etc. Or perhaps there are many flocks all flying from Canada to Mexico that follow parallel linear paths, but spaced many miles apart. Are these clusters that we want to identify?

These are certainly questions that biologists are interested in (understanding infections over time), as well as folks in data mining and statistical modeling (general ideas about changing data over time). I've seen algorithmic approaches about clustering data streams (Cormode, Muthukrishnan, and Zhuang) or kinetic data structures models that maintain a clustering snapshot (Gao, Guibas, and Nguyen, and my own work) and database papers about essentially this question (Kalnis, Mamoulis, and Bakiras), but haven't seen algorithmic approaches to this question. So, I ask the internet, are there any papers I should know about? Anything else I should be thinking about?

Subscribe to:
Post Comments (Atom)

## 10 comments:

You want to google for "clustering time series". I believe the data-mining community has looked at such problems for a while.

--a

You may find this paper from KDD 2007 relevant:

http://www.cs.uic.edu/~tanyabw/research/pubs/TantipathananandhEtal_NetworkCommunities07.pdf

Great question ! You should look at the general body of work of Dimitri Gunopoulos and George Kollios. (UCR and BU resp.).

Clustering time series comes close to what you want, and there's always time warping, and the frechet distance etc. However, in your model of birds swarming and dispersing, I suspect even these will fail.

In the realm of motion tracking, there's been work on tracking people in crowds, which i guess is a slightly different problem (maybe the input to yours?). But the reason I bring that up is that the "tracks" that you might want to cluster are represented probabilistically, as clouds of uncertainty, and that might influence how you choose to the model the problem.

Depending on what kind of data you have you might be able to use ideas from motion subspace clustering to achieve this. Some of them require the number of motions (swarm motions here, not individual bird motions) to be given.

This might be of relevance to you http://perception.csl.illinois.edu/coding/motion/ . It also has a decent summary of the work done in this domain.

It uses some ideas from sparse representations to take care of noise and outliers in the data too. This might be useful if you are clustering real time series data.

The only problem I see in the above work is that they aren't as concerned about the local spatial clustering of points. They are only interested in motion subspaces, therefore two flocks of spatial separate birds flying in same direction might end up being clustered as a single motion. However, you could always use this information to your advantage as it maybe interesting to note that some flocks have similar flying patterns. You can also follow up this clustering with a separate spatial clustering using meanshift or other techniques to get individual flocks.

Hope this helps.

There is a lot going on in this area I happen to be at a workshop that Marc van Kreveld organized with geometer Joachim Gudmundsson and geographer Patrick Laube, who each have several papers on patterns in flocks. There was a Dagstuhl seminar in Nov'08 and will be another in Dec '10.

You might also want to look at "tracking multiple object".

I found some interesting videos here (http://www.cs.washington.edu/ai/Mobile_Robotics/mcl/).

This is a shameless plug on my own work, presented in ICPR 2008:

V. Hautamäki, P. Nykänen and P. Fränti, "Time-series Clustering by Approximate Prototypes", 19th International Conference on Pattern Recognition (ICPR 2008), Tampa, Florida, USA, December, 2008

http://cs.joensuu.fi/~villeh/time-series-clustering.pdf

So here the idea of clustering different bird flight trajectories could be used.

I tried to design a local search clustering method for non-equal length time-series data. One of the main problems is that, M-step or (centroid step in k-means) turns out to be difficult. I have no proof yet, but it is analogous to multiple sequence alignment (which is NP-complete, for a wide variety of distance measures). Now, of course we have data vectors from R^d and not from finite field, so situation might be a little bit different.

Any ideas are welcome. If somebody is interested in off-line discussion on these topics with me, you can reach me in my work email: vishv@i2r.a-star.edu.sg

Thanks folks for all the great suggestions!

What you are looking for is "dynamic communities" in social networks:

C. Tantipathananandh, T. Y. Berger-Wolf

“Algorithms for Identifying Dynamic Communities”

Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 2009, Paris, France

http://compbio.cs.uic.edu/~tanya/research/pubs/TantipathananandhBerger-Wolf_CommunitiesAppx09.pdf

Code available here:

http://compbio.cs.uic.edu/~chayant/commdy/

See also Falkowski's PhD thesis:

T. Falkowski. Community Analysis in Dynamic Social Networks. Dissertation,

University Magdeburg, 2009.

http://deposit.ddb.de/cgi-bin/dokserv?idn=995040419&dok_var=d1&dok_ext=pdf&filename=995040419.pdf

All the suggestions above are great. I would like to point out that the physics community have been working on this problem too, see for example section XIII of this survey or the reference of this paper.

Post a Comment