דרושיםLocation:Tel Aviv-Yafo

דרושים»תוכנה» Researcher

Project-Based

Description

לפני 18 דקות חברה חסויה Location: Job Type: we are seeking a senior AI Researcher to join its R&D group and lead the frontier of large-scale LLM optimization. You will focus on maximizing performance, scalability, and efficiency of LLM training and inference across massive GPU clusters, bridging deep learning research, distributed systems design, and hardware-aware optimization. At our company, we treat AI performance as a systems problem. Just as we reinvented networking through disaggregation and software-defined scale, were applying the same philosophy to AI infrastructure. Your work will directly influence how large models are deployed, scaled, and optimized across high-density compute environments. Key Responsibilities ● Conduct cutting-edge research in artificial intelligence and machine learning, from problem formulation to experimental validation. ● Research, design, implement and evaluate novel algorithms, models, optimization strategies and architectures across areas of large-scale LLM training and inference (e.g., tensor/pipeline/expert parallelisms, quantization, prefill/decode disaggregation, GPU communication optimization). ● Translate research ideas into working prototypes and production-ready solutions. ● Stay up to date with state-of-the-art research, frameworks, and emerging trends in the AI ecosystem. ● Publish research findings internally and externally (papers, technical reports, blog posts, or patents) and present results to internal and external technical audiences. ● Collaborate closely with engineers, product teams, and other researchers to align research with real- world impact ● Profile distributed training and inference pipelines - identifying algorithmic, memory, and scheduling inefficiencies to contribute to a technical decision-making and long-term research roadmaps. ● Validate research through measurable impact, higher throughput, better FLOPS utilization, improved convergence efficiency, or reduced compute cost.Requirements: ● Strong foundation in machine learning, deep learning, and statistical modeling. ● Deep understanding of deep learning internals-transformer architectures, distributed training paradigms, precision scaling, and optimizer behavior. ● Proven hands-on experience training or deploying LLMs on multi-GPU and/or multi-node clusters. ● Ability to read, understand, and critically evaluate academic research papers. Demonstrated ability to translate theoretical ideas into practical, production-level performance improvements. ● Strong problem-solving skills and ability to work independently on open-ended research problems. ● Clear written and verbal communication skills in English. Optional Qualifications ● MSc or PhD in Computer Science, Electrical Engineering, Mathematics or a related quantitative field. ● Strong mathematical background, including linear algebra, probability, and optimization. ● Strong grasp of parallel and distributed systems principles, including communication collectives, load balancing, and scaling bottlenecks. ● Proficiency with frameworks like DeepSpeed, Megatron-LM, NeMo VLLM, SGLang, or equivalent large- scale training ecosystems. ● Understanding of CUDA, Triton, or low-level GPU kernel development, and experience profiling large models across multi-node GPU systems.This position is open to all candidates. Hide

Skills

Machine LearningDeep Learning

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching