Inference Engineer

San Francisco, CA·In-person·Full-time

We process vision models over every frame of video, in realtime. That only works if the inference stack is unreasonably fast. We need someone who finds that problem fun.

What we need from you

You're the kind of engineer who reads a model architecture and immediately sees where it'll stall. You think about memory bandwidth the way most people think about lunch. You've written CUDA kernels that actually shipped, and you know the difference between a benchmark win and a production win.

Our customers deploy on their own hardware, which means we can't just optimize for one target. NVIDIA GPUs are the primary focus, but Apple Silicon, Trainium, and TPUs all matter. Deep knowledge of one or two is plenty to start.

You should probably apply if:

You've made production inference workloads meaningfully faster, not just turned a benchmark green
You know why your CUDA kernel is slow before you open Nsight
You've used TensorRT, Core ML, XLA, or Neuron SDK in anger, and you can pick up whichever one you haven't
You have opinions about when to fuse kernels and when not to
You've shipped things you had to support on hardware you didn't choose

You should definitely not apply if:

Your optimization strategy is “quantize it and ship it”
You want to own a single platform and never touch anything else
Remote work is non-negotiable (we're in-person in San Francisco because building hard things is a team sport)

What you'll actually do

Own Moondream's inference performance end-to-end, from profiling to custom kernels to production deployment
Build optimized backends across GPU, Apple Silicon, Trainium, and TPU targets
Work with the research team so new architectures are fast by design, not by afterthought
Probably get into a heated argument about whether a 3% throughput regression is worth a cleaner abstraction (it isn't)

Details

Location: San Francisco, CA (in-person)
Stack: CUDA, C++, Python, and platform-specific toolchains (TensorRT, Core ML, XLA, Neuron SDK) as needed
Compensation: $230k–$300k + meaningful equity
Benefits: Health, dental, vision, paid parental leave, relocation support

Apply

Send your resume to hiring@moondream.ai

Expect to hear back within a week. If you don't, please follow up!