(This is partially in response to Suresh's wonderful series on clustering.)
Recently, my advisor (Dave Mount) and I have been kicking around the idea of how to define the problem of clustering over time. Imagine birds traveling in a flock from Canada to Mexico, and suppose that you are able to perfectly observe their locations over time (of course, for any practical use we'd need to come up with a better model, but for now...). Given all these locations, you'd like to be able to identify the flock as a cluster. If this flock crosses another flock (say, stupidly traveling from California to Maryland), you'd like to be able to identify these as two separate flocks, even if the "current" time is the one at which they're all in the same place.
Now, what about if the birds' pattern is a bit more complicated... Suppose that instead of staying nicely within the same approximate radius as they fly, the radius of the cluster varies drastically. Perhaps all the birds fly away from each other for some amount of time, then converge back to a central point, then fly away again, etc. Or perhaps there are many flocks all flying from Canada to Mexico that follow parallel linear paths, but spaced many miles apart. Are these clusters that we want to identify?
These are certainly questions that biologists are interested in (understanding infections over time), as well as folks in data mining and statistical modeling (general ideas about changing data over time). I've seen algorithmic approaches about clustering data streams (Cormode, Muthukrishnan, and Zhuang) or kinetic data structures models that maintain a clustering snapshot (Gao, Guibas, and Nguyen, and my own work) and database papers about essentially this question (Kalnis, Mamoulis, and Bakiras), but haven't seen algorithmic approaches to this question. So, I ask the internet, are there any papers I should know about? Anything else I should be thinking about?
Linkage for the new school year
2 hours ago