ArcMultiple locations

Principal Systems Engineer (C++/CUDA) - PT Freelance - Americas/EMEA

Project-Based

Description

Role Overview

We are building the next generation of AI compute. Our technology breaks the "Memory Wall" by fusing software-level semantic compression with hardware-level memory tiering. We are looking for a systems-level visionary to help us turn theoretical silicon exploits into enterprise-ready production code on NVIDIA and AMD hardware.

Responsibilities

  • Develop and optimize high-performance CUDA kernels and C++ modules that manage massive-scale memory architectures (HBM, DDR5).
  • Architect and implement zero-copy memory-tiering solutions that allow consumer and enterprise GPUs to process multi-million token context windows.
  • Collaborate with AI agents to rapidly research, prototype, and refine hardware-level optimizations for NVIDIA and AMD silicon.
  • Lead the transition of academic-level silicon hacks into stable, fault-tolerant, and production-ready enterprise software.

Required Skills

  • Mastery of Modern C++ (20/23) and CUDA.
  • Deep understanding of Hardware Memory Physics (HBM3, GDDR7, PCIe Gen 5, DMA).
  • Experience in compiler design or high-performance computing (HPC) for AI/LLM workloads.
  • A "Hardware-First" mindset—you understand that software is limited by the laws of silicon and electricity.
  • Expertise in using AI-agentic tools to accelerate complex systems-engineering workflows.

Nice to Have

  • Familiarity with advanced memory management techniques.
  • Experience in collaborating with hardware design teams.

Skills

cplusplusAIC++cppLLM