Senior Software Engineer - Perf and Benchmarking

CoreWeave Expired
San Francisco, California
Full Time

This job ad was removed 11 hours ago.

Search for Similar Jobs

Job Description

Requirements

3-5 years of experience building distributed systems, high-performance computing components, or cloud services
Strong programming skills in Python or Go (C+ a plus) with understanding of networked systems and performance fundamentals
Hands-on experience with Kubernetes in production environments plus familiarity with CI/CD and observability tools (eg, Prometheus, Grafana, OpenTelemetry)
Exposure to performance-critical GPU systems (CUDA, NCCL, NVLink/PCIe, memory bandwidth) or model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM)
Effective communicator comfortable working cross-functionally
(Desirable) Experience with time-series databases, LSM-based storage engines, or custom data pipelines
(Desirable) Familiarity with MLPerf or other large-scale benchmarking frameworks
(Desirable) Contributions to OSS projects such as llm-d, vLLM or PyTorch
(Desirable) Exposure to benchmarking GPU clusters or multi-region environments
(Desirable) Background working with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies

What the job involves

We're looking for a Senior Engineer for CoreWeave's Benchmarking & Performance team
You will have an integral part in our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure
You will also aid us in achieving industry-leading end-to-end performance benchmarking publications such as MLPerf
You will be an owner who leads designs, raises engineering standards, and delivers measurable improvements to latency, throughput, and reliability across multiple services
You'll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native platform and meet strict P99 SLAs at scale
Develop and enhance Kubernetes-native benchmarking services that measure latency, throughput, jitter, and cost-per-request across CoreWeave's compute stack
Contribute to implementing and maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and result validation
Participate in design discussions and contribute to architecture decisions within the team
Break down engineering tasks into clear milestones and deliver reliable, high-quality code
Collaborate with teammates to maintain reproducible, well-documented benchmarking processes
Provide constructive code reviews and share best practices with peers
Mentor junior engineers; review cross-team designs and elevate coding/testing standards
Help ensure reproducible, well-documented benchmarking processes

Requirements

3-5 years of experience building distributed systems, high-performance computing components, or cloud services
Strong programming skills in Python or Go (C+ a plus) with understanding of networked systems and performance fundamentals
Hands-on experience with Kubernetes in production environments plus familiarity with CI/CD and observability tools (eg, Prometheus, Grafana, OpenTelemetry)
Exposure to performance-critical GPU systems (CUDA, NCCL, NVLink/PCIe, memory bandwidth) or model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM)
Effective communicator comfortable working cross-functionally
(Desirable) Experience with time-series databases, LSM-based storage engines, or custom data pipelines
(Desirable) Familiarity with MLPerf or other large-scale benchmarking frameworks
(Desirable) Contributions to OSS projects such as llm-d, vLLM or PyTorch
(Desirable) Exposure to benchmarking GPU clusters or multi-region environments
(Desirable) Background working with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies

What the job involves

We're looking for a Senior Engineer for CoreWeave's Benchmarking & Performance team
You will have an integral part in our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure
You will also aid us in achieving industry-leading end-to-end performance benchmarking publications such as MLPerf
You will be an owner who leads designs, raises engineering standards, and delivers measurable improvements to latency, throughput, and reliability across multiple services
You'll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native platform and meet strict P99 SLAs at scale
Develop and enhance Kubernetes-native benchmarking services that measure latency, throughput, jitter, and cost-per-request across CoreWeave's compute stack
Contribute to implementing and maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and result validation
Participate in design discussions and contribute to architecture decisions within the team
Break down engineering tasks into clear milestones and deliver reliable, high-quality code
Collaborate with teammates to maintain reproducible, well-documented benchmarking processes
Provide constructive code reviews and share best practices with peers
Mentor junior engineers; review cross-team designs and elevate coding/testing standards
Help ensure reproducible, well-documented benchmarking processes