Senior Software Engineer - Perf and Benchmarking

  • CoreWeave Expired
  • San Francisco, California
  • Full Time

This job ad was removed 11 hours ago.

Job Description


Requirements
  • 3-5 years of experience building distributed systems, high-performance computing components, or cloud services
  • Strong programming skills in Python or Go (C+ a plus) with understanding of networked systems and performance fundamentals
  • Hands-on experience with Kubernetes in production environments plus familiarity with CI/CD and observability tools (eg, Prometheus, Grafana, OpenTelemetry)
  • Exposure to performance-critical GPU systems (CUDA, NCCL, NVLink/PCIe, memory bandwidth) or model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM)
  • Effective communicator comfortable working cross-functionally
  • (Desirable) Experience with time-series databases, LSM-based storage engines, or custom data pipelines
  • (Desirable) Familiarity with MLPerf or other large-scale benchmarking frameworks
  • (Desirable) Contributions to OSS projects such as llm-d, vLLM or PyTorch
  • (Desirable) Exposure to benchmarking GPU clusters or multi-region environments
  • (Desirable) Background working with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies
What the job involves
  • We're looking for a Senior Engineer for CoreWeave's Benchmarking & Performance team
  • You will have an integral part in our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure
  • You will also aid us in achieving industry-leading end-to-end performance benchmarking publications such as MLPerf
  • You will be an owner who leads designs, raises engineering standards, and delivers measurable improvements to latency, throughput, and reliability across multiple services
  • You'll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native platform and meet strict P99 SLAs at scale
  • Develop and enhance Kubernetes-native benchmarking services that measure latency, throughput, jitter, and cost-per-request across CoreWeave's compute stack
  • Contribute to implementing and maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and result validation
  • Participate in design discussions and contribute to architecture decisions within the team
  • Break down engineering tasks into clear milestones and deliver reliable, high-quality code
  • Collaborate with teammates to maintain reproducible, well-documented benchmarking processes
  • Provide constructive code reviews and share best practices with peers
  • Mentor junior engineers; review cross-team designs and elevate coding/testing standards
  • Help ensure reproducible, well-documented benchmarking processes
Requirements
  • 3-5 years of experience building distributed systems, high-performance computing components, or cloud services
  • Strong programming skills in Python or Go (C+ a plus) with understanding of networked systems and performance fundamentals
  • Hands-on experience with Kubernetes in production environments plus familiarity with CI/CD and observability tools (eg, Prometheus, Grafana, OpenTelemetry)
  • Exposure to performance-critical GPU systems (CUDA, NCCL, NVLink/PCIe, memory bandwidth) or model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM)
  • Effective communicator comfortable working cross-functionally
  • (Desirable) Experience with time-series databases, LSM-based storage engines, or custom data pipelines
  • (Desirable) Familiarity with MLPerf or other large-scale benchmarking frameworks
  • (Desirable) Contributions to OSS projects such as llm-d, vLLM or PyTorch
  • (Desirable) Exposure to benchmarking GPU clusters or multi-region environments
  • (Desirable) Background working with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies
What the job involves
  • We're looking for a Senior Engineer for CoreWeave's Benchmarking & Performance team
  • You will have an integral part in our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure
  • You will also aid us in achieving industry-leading end-to-end performance benchmarking publications such as MLPerf
  • You will be an owner who leads designs, raises engineering standards, and delivers measurable improvements to latency, throughput, and reliability across multiple services
  • You'll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native platform and meet strict P99 SLAs at scale
  • Develop and enhance Kubernetes-native benchmarking services that measure latency, throughput, jitter, and cost-per-request across CoreWeave's compute stack
  • Contribute to implementing and maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and result validation
  • Participate in design discussions and contribute to architecture decisions within the team
  • Break down engineering tasks into clear milestones and deliver reliable, high-quality code
  • Collaborate with teammates to maintain reproducible, well-documented benchmarking processes
  • Provide constructive code reviews and share best practices with peers
  • Mentor junior engineers; review cross-team designs and elevate coding/testing standards
  • Help ensure reproducible, well-documented benchmarking processes
Job ID: 523557660
Originally Posted on: 6/4/2026

Want to find more Technology opportunities?

Check out the 165,053 verified Technology jobs on iHireTechnology