ArcMultiple locations

Principal Systems Engineer (C++/CUDA) - PT Freelance - Americas/EMEA

Posted 2 days ago

Project-Based

Description

Role Overview

We are building the next generation of AI compute. Our technology breaks the "Memory Wall" by fusing software-level semantic compression with hardware-level memory tiering. We are looking for a systems-level visionary to help us turn theoretical silicon exploits into enterprise-ready production code on NVIDIA and AMD hardware.

Responsibilities

Develop and optimize high-performance CUDA kernels and C++ modules that manage massive-scale memory architectures (HBM, DDR5).
Architect and implement zero-copy memory-tiering solutions that allow consumer and enterprise GPUs to process multi-million token context windows.
Collaborate with AI agents to rapidly research, prototype, and refine hardware-level optimizations for NVIDIA and AMD silicon.
Lead the transition of academic-level silicon hacks into stable, fault-tolerant, and production-ready enterprise software.

Required Skills

Mastery of Modern C++ (20/23) and CUDA.
Deep understanding of Hardware Memory Physics (HBM3, GDDR7, PCIe Gen 5, DMA).
Experience in compiler design or high-performance computing (HPC) for AI/LLM workloads.
A "Hardware-First" mindset—you understand that software is limited by the laws of silicon and electricity.
Expertise in using AI-agentic tools to accelerate complex systems-engineering workflows.

Nice to Have

Familiarity with advanced memory management techniques.
Experience in collaborating with hardware design teams.

Skills

cplusplusAIC++cppLLM