Principal Systems Engineer (C++/CUDA) - PT Freelance - Americas/EMEA
Description
Role Overview
We are building the next generation of AI compute. Our technology breaks the "Memory Wall" by fusing software-level semantic compression with hardware-level memory tiering. We are looking for a systems-level visionary to help us turn theoretical silicon exploits into enterprise-ready production code on NVIDIA and AMD hardware.
Responsibilities
- Develop and optimize high-performance CUDA kernels and C++ modules that manage massive-scale memory architectures (HBM, DDR5).
- Architect and implement zero-copy memory-tiering solutions that allow consumer and enterprise GPUs to process multi-million token context windows.
- Collaborate with AI agents to rapidly research, prototype, and refine hardware-level optimizations for NVIDIA and AMD silicon.
- Lead the transition of academic-level silicon hacks into stable, fault-tolerant, and production-ready enterprise software.
Required Skills
- Mastery of Modern C++ (20/23) and CUDA.
- Deep understanding of Hardware Memory Physics (HBM3, GDDR7, PCIe Gen 5, DMA).
- Experience in compiler design or high-performance computing (HPC) for AI/LLM workloads.
- A "Hardware-First" mindset—you understand that software is limited by the laws of silicon and electricity.
- Expertise in using AI-agentic tools to accelerate complex systems-engineering workflows.
Nice to Have
- Familiarity with advanced memory management techniques.
- Experience in collaborating with hardware design teams.
Skills
cplusplusAIC++cppLLM