ClusterPilot is live, and a kernel got 25x faster

Hey friends,

Big fortnight. A tool I had been putting off building for years shipped to PyPI, and a GPU kernel I thought was fine turned out to be reading mostly zeros.

What I’ve Been Up To

ClusterPilot is out. I built it because I was genuinely tired of the loop: write a SLURM script, rsync files, refresh squeue, wait, repeat. If you work on HPC clusters, you know the loop. It is a terminal app where you describe your job in plain language, it generates a cluster-aware SLURM script using AI, handles upload, submission, monitoring, and syncs results back when it finishes. Logs update in real time over SSH. Optional push notifications so you can close the laptop and actually walk away. It is MIT-licensed, free to self-host with your own API key, and pip install clusterpilot works. I have also set up a Fly.io instance with a capped key if you want to try it without any setup. Conda-forge package is in review.

On the PhD side: my Monte Carlo kernel was faithfully multiplying all 216 entries of a coupling matrix per spin flip, for a cubic lattice where exactly 6 of those entries are nonzero. The kernel did not know that. And the larger the system, the worse it gets. Swapping in a precomputed sparse neighbour list cut the work to just the 6 values that actually matter: 25.8x speedup, bit-identical outputs under deterministic RNG, and a run that used to take 7 hours now finishes in 16 minutes. The dense path stays as a fallback for long-range systems. Obvious in retrospect. Most things are.

Also rebuilt juliafrank.net. Claude Sonnet 4.6 handled the mockup and design decisions; I did the CSS and WordPress work by hand. That split turned out to be more satisfying than fully automating it. Having someone walk you through the reasoning means you actually understand what you built. Custom CSS stops feeling intimidating once you are the one making the choices.

Worth Reading

cuTile.jl Brings NVIDIA CUDA Tile-Based Programming to Julia Tim Besard and Keno Fischer, the people behind CUDA.jl, have wrapped CUDA 13.1’s tile-based kernel abstraction for Julia. The idea is that managing individual threads and memory hierarchies by hand is a level of detail most kernels should not require. I had not come across cuTile.jl before and I am genuinely curious whether it would simplify the kind of kernel work I have been doing. Worth a look if you write CUDA kernels in Julia.

Chipmunk: GPU Kernel Optimisations, Part III A detailed walkthrough of why naive sparsity does not automatically win on GPUs, and what you have to do to your data layout for it to pay off. The authors get a 9.3x speedup by rethinking how sparse data is packed in memory. I found this after writing up the neighbour list result above and it felt uncomfortably relevant. If the 25x number made you curious about the underlying mechanism, this is the explanation.

How I use Obsidian for academic work I use Obsidian for my PhD vault and content work, and I am always curious how other people set theirs up, especially when they are researchers rather than productivity enthusiasts. This one is by a PhD researcher in AI: three plugins, typed links, a downloadable template, and a system that has been working for five years. No methodology, no philosophy. Just someone in a similar field showing their actual setup.

Quick Thought

I fed several months of Claude chats about my PhD into Obsidian, then passed the notes into NotebookLM to make flashcards. The flashcards surfaced gaps I had not noticed. Conversations in chats feel productive in the moment. But recall does not let you fake it the same way, when you have to retrieve something from memory, you find out quickly what actually stuck. It has also, somewhat accidentally, been useful for shaping the PhD proposal I need to present in a few weeks.

Until next time, Julia

Thoughts on Today’s Issue?

Love ❤️

Good 👍

So so 🤔

You’re getting this because you signed up at juliafrank.net. If this is not your thing any more, you can unsubscribe below.

3-20 Brandt Street, Unit 124, Steinbach, MB R5G 1Y2
Unsubscribe · Preferences

Julia Frank

ClusterPilot is live, and a kernel got 25x faster

What I’ve Been Up To

Worth Reading

Quick Thought

Two GPUs, four new tools, and a 1959 paper

🚀 From Zero to Plugin: I Built an Obsidian to WordPress Publishing Plugin in 24 Hours.