SupabaseRemote

Member of Technical Staff – Exceptional Generalist

Posted Today

Description

11 - 50 employees

Founded 2025

🤖 Artificial Intelligence

Artificial Intelligence • B2B • Enterprise

Inferact is a startup founded by the creators and core maintainers of vLLM, the leading open-source LLM inference engine. The company aims to accelerate AI progress by making model inference cheaper and faster, expanding vLLM's performance and support for emerging architectures and accelerator hardware. Inferact combines deep expertise at the intersection of models and hardware to provide inference infrastructure used by research labs, hyperscalers, and startups, while continuing to develop and contribute optimizations back to the open-source community.

Member of Technical Staff

Exceptional Generalist

🔥 6 hours ago

Report problem

11 - 50 employees

Founded 2025

🤖 Artificial Intelligence

Artificial Intelligence • B2B • Enterprise

📋 Description

• This is a globally remote opportunity. • We're seeking exceptional generalist engineers who can work across the entire vLLM stack: from low-level GPU kernels to high-level distributed systems. • This role is designed for self-directed, autonomous individuals who can identify the highest-leverage problems and solve them end-to-end without constant guidance. • You'll work asynchronously with our San Francisco headquarters while maintaining full ownership of critical infrastructure. • You might be optimizing CUDA kernels one week, deg distributed orchestration systems the next, and implementing new model architectures the week after. • The work you do will directly impact how the world runs AI inference. • Potential focus areas include: • - Inference Runtime: Push the boundaries of LLM and diffusion model serving. • - Kernel Engineering: Write the low-level kernels and optimizations. • - Performance & Scale: Build distributed systems that power inference at global scale. • - Cloud Orchestration: Build the operational backbone for cluster management, deployment automation, and production monitoring.

🎯 Requirements

• Bachelor's degree or equivalent experience in computer science, engineering, or similar • Demonstrated ability to work autonomously and drive projects to completion without close supervision • Excellent asynchronous communication skills and ability to collaborate effectively across time zones • Strong track record of shipping high-impact work in complex technical environments • Deep expertise in at least one of: systems programming, GPU/accelerator programming, distributed systems, or ML infrastructure • Technical Depth (strong in at least two): • - CUDA kernels or equivalent (Triton, TileLang, Pallas) with deep understanding of GPU architecture • - High-performance distributed systems in Rust, Go, or C++ • - Python with PyTorch internals and LLM inference systems (vLLM, TensorRT-LLM, SGLang) • - Kubernetes, container orchestration, and infrastructure-as-code at scale • - Transformer archite

Skills

PythonGoKubernetesArtificial IntelligenceRustLLMPyTorchAIBackboneFirebasecppIamC++cplusplusMLMachine LearningIAM