Semantic Currents
Data visualization mapping 44 years of NYU ITP thesis projects through semantic similarity—2,744 projects clustered by meaning rather than medium, simulated as Physarum slime mold forming pathways between discourse communities.
Overview
A computational retrospective of NYU ITP’s entire thesis archive (1981-2025)—2,744 student projects analyzed through semantic similarity and visualized as a living network. The left side shows all projects clustered by meaning, mapped two-dimensionally and grouped by themes and topics regardless of medium. The right side highlights bridge projects: theses that actively create pathways between different discourse communities, seeding connections across ITP’s massively intertwined 44-year conversation about technology.
The visualization uses Physarum (slime mold) simulation, where projects that span multiple conceptual territories act as nutrient sources, growing organic pathways between clusters. This reveals how certain works function as conceptual bridges in the broader network of ideas, curiosity, and technological exploration.
How It Was Made
Built a sophisticated data pipeline processing messy archival JSON files into semantic vectors:
Stage 1: Data Cleaning & Validation
- Pydantic schemas enforced data types, handled missing fields, cleaned HTML entities and inconsistent formatting
- Extracted temporal context (project year) from filenames
- Created “clean contract”—guaranteed predictable format for all downstream processing
Stage 2: Thematic Analysis
- Sent each project’s title and description to GPT-4-nano with carefully engineered prompts
- Forced chain-of-thought reasoning with strong negative constraints (“DO NOT list technologies, generic mediums”)
- Extracted 5-7 high-level conceptual themes per project (e.g., “algorithmic identity,” “digital intimacy”)
- Used structured Pydantic response models for reliability, eliminated JSON parsing failures
Stage 3: Semantic Document Construction
- The critical strategic decision: what to embed?
- Constructed “Semantic Documents” combining AI-generated themes + project title/pitch + original keywords
- Intentionally excluded long descriptions and background fields—themes are purified signal; full text would be noise
- Sent batch to OpenAI’s text-embedding-3-large model for 3072-dimension vectors
Stage 4: Clustering & Dimensionality Reduction
- Applied HDBSCAN clustering on high-dimensional embeddings (not 2D projections—preserves full semantic information)
- Used UMAP to create 2D visualization coordinates
- Chose low
min_cluster_sizeof 3—artistic decision to create more food sources for Physarum simulation, resulting in more intricate organic networks
Stage 5: Archive Creation
- Generated complete
thesis_analysis.jsonincluding full embedding vectors - Future-proofed: anyone with this file can re-cluster or perform new semantic analysis without API calls
- Transformed output from simple data file into durable research asset
Technical Approach
The Physarum simulation creates pathways between semantic clusters based on projects that bridge multiple discourse communities. Some theses naturally span conceptual territories—these become nutrient sources in the simulation, growing organic connection networks that reveal ITP’s shifting ideals and conversations.
The algorithm identifies projects that “belong all over”—works that could legitimately exist in multiple thematic clusters. These bridge projects seed pathways between different areas of inquiry, visualizing how technological discourse at ITP isn’t siloed but massively intertwined.
The left visualization shows the semantic landscape—all 2,744 projects positioned by meaning. The right shows the emergent network—which projects actively create connections between communities of practice and thought.
Context
Created for “Mutable Molds” group exhibition at Clive Davis Gallery, NYU (August 15-31, 2025). Part of NYU IMA Low Res 2024-2025 Residents show exploring “shifting systems—semantic, perceptual, and ecological.”
This was my initial foray into working with embeddings, word vectors, and NLP-driven systems—research that became the foundation for ongoing Semantic Garden work exploring embeddings as environmental forces in autonomous systems. The project demonstrates how algorithmic analysis can reveal hidden structures in large cultural archives, making visible the conceptual bridges that connect seemingly disparate areas of technological exploration.