Skip to Content
Support Palestine 🍉  ⬩  Know what's happening

Lab 10: Convex Hull, KD-Tree, and Streaming Algorithms

This lab focuses on two major themes in modern computing — computational geometry and algorithms for large data sets. You’ll implement foundational algorithms that are widely used in fields like GIS, computer graphics, spatial databases, and big data analytics.

Introduction

Computational geometry provides the algorithmic tools needed to solve spatial problems, while large data sets require scalable, memory-efficient strategies. Mastering these domains prepares students to handle challenges in areas like:

  • Map-based applications and spatial search
  • Data analytics at scale
  • Robotics and computer vision
  • Scientific simulations and real-time monitoring

In this lab, we implement:

  1. Convex Hull (2D) — the smallest polygon enclosing a set of points.
  2. Range Search using KD-Tree — efficient spatial querying.
  3. Streaming Algorithms for Large Data — approximate solutions using limited memory.

Geometry and Big Data Algorithms

1. Convex Hull using Graham Scan

Concept: Given a set of 2D points, the convex hull is the minimal convex boundary containing all points.

import matplotlib.pyplot as plt def graham_scan(points): points = sorted(points) def cross(o, a, b): return (a[0] - o[0])*(b[1] - o[1]) - (a[1] - o[1])*(b[0] - o[0]) lower, upper = [], [] for p in points: while len(lower) >= 2 and cross(lower[-2], lower[-1], p) <= 0: lower.pop() lower.append(p) for p in reversed(points): while len(upper) >= 2 and cross(upper[-2], upper[-1], p) <= 0: upper.pop() upper.append(p) return lower[:-1] + upper[:-1]

2. 2D Range Search with KD-Tree

from scipy.spatial import KDTree def build_kd_tree(points): return KDTree(points) def range_query(kd_tree, query_point, radius): return kd_tree.query_ball_point(query_point, radius)

3. Count Distinct Elements in Streaming Data (using HyperLogLog-style estimation)

import hashlib import random def simulate_stream(n=100000): return [random.randint(0, 10**6) for _ in range(n)] def hash_value(x): return int(hashlib.md5(str(x).encode()).hexdigest(), 16) def estimate_cardinality(stream, num_buckets=64): max_trailing_zeros = [0]*num_buckets for val in stream: h = hash_value(val) bucket = h % num_buckets zeros = len(bin(h >> 6)) - len(bin((h >> 6).rstrip('0'))) max_trailing_zeros[bucket] = max(max_trailing_zeros[bucket], zeros) avg = sum(max_trailing_zeros) / len(max_trailing_zeros) return int(2**avg * num_buckets * 0.77351)

Example Usage

Click to expand

# 1. Convex Hull points = [(random.randint(0, 100), random.randint(0, 100)) for _ in range(30)] hull = graham_scan(points) # 2. KD-Tree Range Query tree = build_kd_tree(points) neighbors = range_query(tree, (50, 50), 20) # 3. Streaming Approximation stream_data = simulate_stream() approx_unique = estimate_cardinality(stream_data) print("Convex Hull Points:", hull) print("Neighbors in range:", neighbors) print("Estimated Unique Elements:", approx_unique)

Visualization

Click to expand

def plot_convex_hull(points, hull): xs, ys = zip(*points) hx, hy = zip(*(hull + [hull[0]])) # Close the polygon plt.figure(figsize=(7, 7)) plt.scatter(xs, ys, label="Points") plt.plot(hx, hy, 'r-', label="Convex Hull") plt.title("Convex Hull (Graham Scan)") plt.legend() plt.show() plot_convex_hull(points, hull)

Time and Space Complexity

AlgorithmTime Complexity
Convex Hull (Graham Scan)O(nlogn)O(n \log n)
KD-Tree ConstructionO(nlogn)O(n \log n)
KD-Tree Range QueryO(logn+k)O(\log n + k)
Cardinality EstimationO(n)O(n)

Applications

  1. Convex Hull:

    • Geographic boundary detection
    • Collision detection in games
    • Object shape recognition
  2. KD-Tree Search:

    • Nearest neighbor in spatial datasets
    • Robotics and motion planning
    • Location-based services
  3. Streaming Algorithms:

    • Real-time analytics on sensor or clickstream data
    • Network traffic analysis
    • Social media trend monitoring

Conclusion

This lab bridges classic geometry algorithms with scalable data techniques. From determining geometric boundaries to handling millions of data points efficiently, these tools are foundational to both theoretical and applied computing.

References

  1. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C.
    Introduction to Algorithms, 3rd/4th Edition, MIT Press.
  1. Scalable Data Mining – Stanford CS246


💡

Some of my latest posts on LinkedIn

Last updated on