Educational slides on TRACLUS, an algorithm for clustering trajectory data created by Jae-Gil Lee, Jiawei Han and Kyu-Young Wang, published on SIGMOD’07.
http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Trajectory clustering - Traclus Algorithm
1. Trajectory Clustering
BASED ON: TRAJECTORY CLUSTERING: A PARTITION-AND-GROUP
FRAMEWORK
EDITED BY: IVAN SANCHEZ
BY: JAE-GIL LEE
JIAWEI HAN
KYU-YOUNG WHANG
EDUCATIONAL SLIDES ON TRACLUS, AN ALGORITHM FOR
CLUSTERING TRAJECTORY DATA CREATED BY JAE-GIL LEE, JIAWEI
HAN AND KYU-YOUNG WANG, PUBLISHED ON SIGMOD’07.
http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf
2. Objective
To group similar trajectories together (cluster).
Trajectory define a set of multidimensional points Tr = p1, p2, p3… pn.
A point is d-dimensional entity.
Most Approaches take in consideration only complete trajectories, thus missing valuable
information on common Subtrajectories.
Input: Set of trajectories S = (Tr1, Tr2, Tr3….Tri…TrnumTraj)
Output: Cluster of Trajectories C = (C1, C2 … CnumClusters) where each cluster contains ε or
more trajectories.
◦ Ε is a threshold that determines the minimum number of trajectories to create a cluster.
◦ Each cluster is composed by a set of trajectories. E.g. C1 = (Tr3, Tr9… Trc1max).
3. Approaches
DBScan
◦ Uses density clustering
◦ Works only on entire trajectories
Partition and Group
◦ Also uses density-based clustering (help to
discover clusters of arbitrary shape and to
filter out noise-outliers).
◦ Can discover common subtrajectories.
4. Partition and Group Framework
2 phased: Partition and Grouping
Additionally calculates a representative trajectory per cluster.
Discover Common Subtrajectories
TRACLUS Algorithm.
◦ Partition trajectories into segments. O(n)
◦ Where n is the number of trajectories.
◦ Group similar segments together (clustering). O(n log n)
◦ Where n is the number of segments
◦ Calculate representative trajectory per cluster. O(n)
◦ Where n is the number of trajectories.
A trajectory can belong to multiple clusters.
7. Partition Phase
Partition a trajectory in a set of Segments.
A trajectory partition is a line segment pipj where i<j and both points belong to the same
trajectory.
Groups similar line segments together
This allows to find common subtrajectories.
All segments from all trajectories are inserted into a common set D.
Time complexity O(n) where n is the number of points on a trajectory.
8. How to partition a trajectory?
Characteristic Points: Points where the trajectory changes rapidly
From a Trajectory Tr: p1,p2,p3…pj…plen determine a set of characteristic points
{pc1,pc2,pc3,…,pcPart}.
The trajectory is partitioned a every characteristic point, and each partition is represented by a
line segment between two consecutive partition points.
Line segment = Trajectory partition.
9. How to optimally partition a trajectory?
Properties:
◦ Preciseness: Difference between a trajectory and a set of its trajectory partitions should be as small as
possible.
◦ Conciseness: Number of trajectory partitions should as small as possible.
Balance Preciseness and Conciseness using MDL (minimum description length).
Best Hypothesis H to explain D is the one that minimizes the sum of L(H) and L(D|H).
◦ L(H): Sum of length of all trajectory partitions. Measures conciseness.
◦ L(D|H): Sum of the difference between a trajectory and a set of its trajectory partitions. Measures
Preciseness.
◦ This can be costly so it is approximated by a local Optima, such that MDLpart(pi,pj)<=MDLnopart(pi,pj).
Time Complexity O(n).
11. Distance Measure
Based on the projection of points of one segment over the other.
3 components:
◦ Perpendicular Distance: (Lehmer mean of order 2) between to line segments.
◦ It is the Euclidean distance between the projected points of one trajectory (over the other) and the original points that generated the
projection.
◦ Parallel Distance: Is the minimum distance of the projected points and the points of the segment over which the
projection was made.
◦ Angle Distance: Smallest intersecting angle between the segments. Helps to measure trajectories with direction.
Distance measure can be easily calculated with vector operations.
The overall distance between two segments is given by the sum of the 3 components.
13. Clustering Phase
Line segments of the same cluster are close to each other according to a distance measure.
Use Density-Based clustering as in DBSCAN.
Being D is the set of all line segments:
15. Clustering Algorithm
2 Parameters:
◦ ε: Neighborhood of Segment
◦ MinLns: Minimum number of Lines.
Trajectory cardinality limits maximum number of clusters.
Turns a set of Segments D into a Set of clusters O.
Complexity:
◦ O(n log n): where n is the number of segments. Using a spatial index.
◦ O(n²)= For number of dimensions >= 2.
18. Representative Trajectories
Imaginary trajectory obtain from the clusters.
As a regular trajectory, a representative trajectory is a sequence of points.
Representative trajectory indicates the major behavior of segments of a cluster.
Representative trajectory = Common subtrajectory.