Friday, December 12, 2025

Bending Light: Simulating General Relativity in the Browser with Raw WebGL

As software engineers, we often work within the comfortable constraints of Euclidean geometry: grid layouts, vector positions, and linear interpolations. But what happens when the coordinate system itself is warped?

Recently, I built a real-time visualization of a Schwarzschild black hole using raw WebGL and JavaScript. The goal wasn't just to create a pretty image, but to solve a complex rendering problem: How do you perform ray casting when light rays don't travel in straight lines?

This project explores the intersection of high-performance graphics, general relativity, and orbital mechanics, all running at 60 FPS in a standard web browser.

The Challenge: Non-Euclidean Rendering

Standard 3D rendering (rasterization or standard ray tracing) assumes light travels linearly from a source to the camera. In the vicinity of a black hole, extreme gravity bends spacetime. Light follows geodesics—curves defined by the spacetime metric.

To visualize this, we cannot use standard polygon rasterization. Instead, we must utilize Ray Marching within a Fragment Shader, solving the path of every pixel mathematically in real-time.

Architecture: The "No-Framework" Approach

While libraries like Three.js are excellent for production 3D apps, I chose raw WebGL API for this simulation.

Why?

Performance Control: I needed direct control over the GLSL rendering pipeline without overhead.
First-Principles Engineering: Understanding the low-level buffer binding and shader compilation ensures we aren't relying on "black box" abstractions for critical math.
Portability: The entire engine runs in a single HTML file with zero dependencies.

The Physics Stack

1. Relativistic Ray Marching (The Shader)

The core logic lives in the Fragment Shader. For every pixel on the screen, we fire a ray. However, instead of a simple vector addition (pos += dir * step), we apply a gravitational deflection force at every step of the march.

We approximate the Schwarzschild metric by modifying the ray's direction vector ( $\vec{D}$ ) based on its distance ( $r$ ) from the singularity:

// Inside the Ray Marching Loop

float stepSize = 0.08 * distToCenter; // Adaptive stepping

vec3 gravityForce = -normalize(rayPos) * (1.5 / (distToCenter * distToCenter));

// Bend the light

rayDir = normalize(rayDir + gravityForce * stepSize * 0.5);

rayPos += rayDir * stepSize;

Optimization Strategy: Note the stepSize. I implemented adaptive ray marching. We take large steps when the photon is far from the black hole (low compute cost) and micro-steps when close to the event horizon (high precision required). This keeps the loop iteration count low (~120 passes) while maintaining visual fidelity.

2. The Event Horizon & Accretion Disk

We define the Schwarzschild Radius ( $R_s$ ).

If a ray's distance drops below $R_s$ , it is trapped. The pixel returns black (the shadow).
If the ray intersects the equatorial plane ( $y \approx 0$ ) within specific radii, we render the Accretion Disk.

To simulate the Doppler Beaming effect (where the disk looks brighter on the side moving toward the camera), I calculated the dot product of the disk's rotational velocity and the ray direction.

3. Keplerian Orbital Mechanics (The JavaScript)

The background stars aren't just animating on a linear path. They follow Kepler’s 3rd Law of Planetary Motion ( $T^2 \propto r^3$ ).

I engineered the JavaScript layer to handle the state management of the celestial bodies:

Blue Giant: Far orbit, period set to exactly 10.0s.
Red Dwarf: Near orbit.
Math: I calculated the inner star's period dynamically based on the ratio of the semi-major axes to ensure physical plausibility.

// Keplerian Ratio Preserved float r2 = 7.5; float ratio = pow(r2 / r1, 1.5); // T^2 proportional to r^3 float T2 = T1 * ratio;

Procedural Generation: Noise without Textures

Loading external textures introduces HTTP requests and cross-origin issues. To keep the architecture monolithic and fast, I generated the starfield and galaxy band procedurally using GLSL hash functions (pseudo-random noise).

By manipulating the frequency and amplitude of the noise, I created a "soft" nebula effect that adds depth without the GPU cost of high-res texture sampling.

UX: Spherical Camera System

A visualization is only useful if it can be explored. I implemented a custom camera system based on Spherical Coordinates ( $\rho, \theta, \phi$ ) rather than Cartesian vectors. This prevents "gimbal lock" and allows the user to orbit the singularity smoothly using mouse drags or touch gestures.

The state is logged to the console for debugging specific views:

Camera Pos: [-3.34, 0.10, -12.06], Zoom (Radius): 12.51

Conclusion

This project demonstrates that modern browsers are capable of heavy scientific visualization if we optimize the pipeline correctly. By combining physics-based rendering, adaptive algorithms, and low-level WebGL, we can simulate general relativity in real-time on consumer hardware.

Thursday, December 11, 2025

Central Limit Theorem visualization using JS

Realistic Galton Board

Physics-based collisions demonstrating the Central Limit Theorem

Balls Finished: 0

Heisenberg Uncertainty Principle demonstration

Heisenberg Uncertainty Principle Demo

Heisenberg Uncertainty Principle

Focus on Momentum (Wave) Focus on Position (Particle)

Position Certainty 50%

Momentum Certainty 50%

Δx · Δp ≥ ℏ/2

Drag the slider to observe the trade-off:

Right: We clamp down on Position. The dot becomes sharp (we know where it is), but its Velocity becomes chaotic and unpredictable.
Left: We measure Momentum. The movement becomes smooth and predictable, but the particle expands into a fuzzy cloud (we don't know exactly where it is).

My notes on Designing Data-Intensive Applications

1) Reliable, Scalable, and Maintainable Applications

DDIA opens by reframing reliability as “works correctly under adversity,” not “never fails.” It classifies faults (hardware, software, and—most common—human), argues for fault tolerance over fault prevention, and stresses measuring what matters: load (domain-specific parameters) and performance (especially latency percentiles and tail latency). It distinguishes scalability strategies (vertical vs. horizontal, elasticity) and frames maintainability as a first-class goal grounded in operability, simplicity, and evolvability. This lens anchors every technology choice in the book.

Most surprising. How strongly the chapter elevates operability—tooling, visibility, and safe procedures—as part of system design, not an afterthought.

Biggest learning. Design APIs and dataflows to limit blast radius of human error; it’s the dominant failure mode in production.

2) Data Models and Query Languages

The chapter compares relational, document, and graph models and shows how model choice shapes thinking, queryability, and change over time. It revisits NoSQL’s origins (scale, schema flexibility, special queries) and shows the later convergence (RDBMS with JSON; some document stores supporting joins). It contrasts declarative vs. imperative query styles (SQL, Cypher, SPARQL, Datalog vs. Gremlin/MapReduce), emphasizing that declarative queries enable richer, engine-level optimization.

Most surprising. Datalog’s expressiveness and composability—rarely used in industry, but a clean mental model for complex relationships.

Biggest learning. Pick the model that minimizes mismatch with access patterns; you can combine models, but every bridge you build becomes maintenance.

3) Storage and Retrieval

Under the hood, databases rely on data structures like hash indexes, B-trees, and LSM-trees. B-trees give predictable point/range lookups with in-place updates; LSM-trees trade read amplification for high write throughput via compaction (SSTables). The chapter also separates OLTP vs. analytics: data warehouses, star/snowflake schemas, columnar storage, compression, sort orders, and materialized aggregates—each optimized for large scans and aggregations. The practical takeaway: your index and storage layout are workload contracts.

Most surprising. How dramatically compaction strategies (e.g., LSM) shift write/read trade-offs; “fast writes” are never free.

Biggest learning. Treat columnar storage and sort order as first-class design levers for analytics; they often beat “more CPU” by orders of magnitude.

4) Encoding and Evolution

Serialization is a compatibility contract: JSON/XML vs. binary formats like Protocol Buffers, Thrift, and Avro. DDIA explains schema evolution—backward/forward compatibility, field defaults, and name-based matching (not position) in Avro—and compares dataflow modes: via databases (schemas), services (RPC/REST), and asynchronous logs/queues (message passing, change data capture). The chapter’s design lens is “how do we change in production without breaking consumers?”

Most surprising. Practical details like Avro’s reader/writer schema negotiation and the hidden 33% size overhead of Base64 that many pipelines forget.

Biggest learning. Treat schemas like APIs and roll out change with compatibility plans across every hop in your dataflow—DBs, services, and streams.

5) Replication

Why replicate? Availability, locality, throughput. DDIA surveys leader–follower (sync/async), multi-leader, and leaderless (quorum-based) designs, then digs into the pain: replication lag and read phenomena (read-your-writes, monotonic reads, consistent prefix), write conflicts, failover safety, hinted handoff, and concurrent write resolution. Each topology optimizes a different corner of the CAP space and operational reality. The headline: choose replication strategy to match write patterns, failure semantics, and geo constraints—and be explicit about read guarantees.

Most surprising. Quorum reads/writes do not imply linearizability under partitions and delays—your “consistent” system may still return surprising results.

Biggest learning. Specify user-visible guarantees (RYW/monotonic/consistent prefix) and instrument them; otherwise, lag will surface as correctness bugs.

6) Partitioning

Partition (shard) to scale writes and storage: by key-range (good for range scans, vulnerable to hotspots) or hash (great balance, sacrifices locality). Then reconcile secondary indexes across partitions (by document vs. by term), plan for skew mitigation, and decide on rebalancing (automatic vs. manual) and request routing. Parallel queries traverse shards; operational playbooks must keep routing tables and ownership consistent during moves. Partitioning is where data model meets operability.

Most surprising. The operational complexity of global secondary indexes across shards—fast reads now create hard rebalancing problems later.

Biggest learning. Design hot-key mitigation (key salting, time-bucketing, or write fan-in) on day one; you will need it.

7) Transactions

ACID is not one thing; it is a family of behaviors, and many systems ship with weak isolation by default. DDIA walks through anomalies (dirty reads/writes, lost updates, write skew, phantoms) and the isolation continuum: Read Committed, Snapshot Isolation, Repeatable Read, and full Serializability—implemented via 2PL or Serializable Snapshot Isolation (SSI). It also covers two-phase commit (2PC) and its realities. The advice: pick isolation deliberately for each workload; measure, don’t assume.

Most surprising. SQL isolation names are historically inconsistent; “Repeatable Read” may be snapshot isolation, not serializable, depending on the engine.

Biggest learning. SSI provides a pragmatic path to serializable behavior without the worst 2PL downsides—if you accept aborts and tune for them.

8) The Trouble with Distributed Systems

Distributed systems fail partially: networks drop, reorder, and delay messages; processes pause (GC); clocks drift; timeouts are ambiguous; byzantine behavior exists. You cannot know if a timed-out request was processed. DDIA demystifies clocks (time-of-day vs. monotonic), fault detection limits, and why synchronized clocks are shaky foundations. The mindset shift: design for gray areas with idempotency, retries, and explicit uncertainty handling.

Most surprising. “Timeouts don’t tell you what happened”—even mature teams forget this in write paths and background jobs.

Biggest learning. Build protocols around idempotence, fencing tokens, and compensations; correctness must survive duplicate or reordered messages.

9) Consistency and Consensus

This chapter untangles consistency models: linearizability (single-copy illusion), causal ordering, and total order broadcast; then connects them to distributed transactions (2PC) and fault-tolerant consensus (e.g., Paxos/Raft) and coordination services (ZooKeeper/etcd). It emphasizes the cost of linearizability and when you actually need it (e.g., uniqueness constraints, leader election). The core is choosing the weakest model that preserves invariants.

Most surprising. Quorum configurations can still be non-linearizable; ordering and causality matter as much as “how many nodes agreed.”

Biggest learning. Externalize coordination to a dedicated, well-understood service; don’t reinvent ad-hoc consensus inside your business logic.

10) Batch Processing

DDIA traces batch from Unix pipes to MapReduce to DAG engines (Spark, Tez, Flink): move compute to data; materialize intermediate state; join at scale (reduce-side vs. map-side); and design for skew. The point is not fetishizing frameworks but understanding when a recomputation model is simpler and more correct. Batch complements services and streams by providing determinism, reprocessing, and backfills over immutable logs.

Most surprising. Many “real-time” features are healthier as overnight recomputations with well-tracked freshness SLAs.

Biggest learning. Make the log your source of truth; batch is how you rebuild correct derived state when models or code change.

11) Stream Processing

Streams turn events into state via partitioned logs, CDC, and event sourcing. The chapter explains processing time vs. event time, watermarks, windowing, stateful operators, stream joins, and fault tolerance via checkpoints and idempotence. Systems like Kafka + stream processors keep services and analytics in sync by replaying rather than mutating state in place. The broader message: unify batch and stream around the same log.

Most surprising. “Exactly once” is a protocol property (idempotence + atomic commit/checkpointing), not magic—understand where duplicates are eliminated.

Biggest learning. Embrace immutable event logs and derive all read models from them; recovery becomes replay, not surgery.

12) The Future of Data Systems

The finale argues for derived data and unbundled databases: compose specialized stores (OLTP, search, analytics, cache) with dataflow—CDC, validation, constraints, and observability—to maintain correctness across systems. It emphasizes end-to-end arguments for guarantees, trust-but-verify checks, and the ethics of data use (timeliness, integrity, privacy). The north star is building applications around dataflow instead of around databases, so recomputation and evolution are routine.

Most surprising. Treating the database as just another consumer of the log reframes integration and unlocks simpler, auditable architectures.

Biggest learning. Product correctness is a system property across boundaries; codify constraints and verification in the dataflow—not only in one database.

Closing takeaways

Make guarantees explicit (isolation level, read semantics, freshness) and monitor them like SLOs, not folklore.
Choose the weakest consistency that preserves invariants; spend linearizability budget only where invariant violations are existential.
Architect around an immutable log; let batch and stream recompute derived views safely.

Adam Nagy's blog