דרושיםLocation:Yokne`am

דרושים»הנדסה» manager, ai networking performance research and analysis

Project-Based

Description

לפני 20 שעות חברה חסויה Location: Job Type: lead performance research and evaluation of advanced networking technologies supporting ai workloads, including llm training and inference at supercomputing scale. define end-to-end performance TEST plans and methodology for next-generation networking hw and networking technologies, including performance expectations and target kpis. drive benchmarking, profiling, reporting, and deep performance characterization of networking workloads and offload features. collaborate closely with simulation, architecture, chip-design, firmware, and software teams to assess performance tradeoffs and identify bottlenecks. perform deep root cause analysis (rca) for performance gaps and stability issues, and drive cross-team mitigation plans. develop and enhance performance analysis tools, automation frameworks, and scalable methodologies for cluster-level performance evaluation. own performance observability efforts, including telemetry pipelines, dashboards, and job-level performance analytics.Requirements: what we need to see: b.sc in Computer Science or software engineering 5+ years of experience with high-performance networking technologies (rdma, Storage, security, ovs, mpi) 3+ years as an engineering team manager demonstrated performance analysis skills and methodologies. experience with cluster level performance, telemetry, nic, dpus, switches, and gpus. fast and self-learning capabilities with strong analytical and problem solving skills programming languages: Python, bash and C / C ++ languages experience with Linux os distros team player and a leader with good communication and interpersonal skills ways to stand out from the crowd: deep system -level architecture knowledge (intel / amd / arm cpus, nvidia gpus, hca/dpu architecture, memory subsystems, pcie, Storage, nvlink). strong expertise in rdma networking performance and ai communication stacks (e.g., nccl). proven experience analysing ai workload communication patterns and benchmarking distributed llm training workloads at scale. experience deg telemetry frameworks, monitoring pipelines, and performance dashboards for large clusters. familiarity with modern ai tooling including performance-driven agents, automation pipelines, and rag-based applications.This position is open to all candidates. Hide

Skills

SecurityLinuxPythonBash

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching