Senior Software Engineer - Perf and Benchmarking
- CoreWeave Expired
- San Francisco, California
- Full Time
This job ad was removed 11 hours ago.
Job Description
Requirements
- 3-5 years of experience building distributed systems, high-performance computing components, or cloud services
- Strong programming skills in Python or Go (C+ a plus) with understanding of networked systems and performance fundamentals
- Hands-on experience with Kubernetes in production environments plus familiarity with CI/CD and observability tools (eg, Prometheus, Grafana, OpenTelemetry)
- Exposure to performance-critical GPU systems (CUDA, NCCL, NVLink/PCIe, memory bandwidth) or model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM)
- Effective communicator comfortable working cross-functionally
- (Desirable) Experience with time-series databases, LSM-based storage engines, or custom data pipelines
- (Desirable) Familiarity with MLPerf or other large-scale benchmarking frameworks
- (Desirable) Contributions to OSS projects such as llm-d, vLLM or PyTorch
- (Desirable) Exposure to benchmarking GPU clusters or multi-region environments
- (Desirable) Background working with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies
- We're looking for a Senior Engineer for CoreWeave's Benchmarking & Performance team
- You will have an integral part in our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure
- You will also aid us in achieving industry-leading end-to-end performance benchmarking publications such as MLPerf
- You will be an owner who leads designs, raises engineering standards, and delivers measurable improvements to latency, throughput, and reliability across multiple services
- You'll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native platform and meet strict P99 SLAs at scale
- Develop and enhance Kubernetes-native benchmarking services that measure latency, throughput, jitter, and cost-per-request across CoreWeave's compute stack
- Contribute to implementing and maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and result validation
- Participate in design discussions and contribute to architecture decisions within the team
- Break down engineering tasks into clear milestones and deliver reliable, high-quality code
- Collaborate with teammates to maintain reproducible, well-documented benchmarking processes
- Provide constructive code reviews and share best practices with peers
- Mentor junior engineers; review cross-team designs and elevate coding/testing standards
- Help ensure reproducible, well-documented benchmarking processes
Requirements
- 3-5 years of experience building distributed systems, high-performance computing components, or cloud services
- Strong programming skills in Python or Go (C+ a plus) with understanding of networked systems and performance fundamentals
- Hands-on experience with Kubernetes in production environments plus familiarity with CI/CD and observability tools (eg, Prometheus, Grafana, OpenTelemetry)
- Exposure to performance-critical GPU systems (CUDA, NCCL, NVLink/PCIe, memory bandwidth) or model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM)
- Effective communicator comfortable working cross-functionally
- (Desirable) Experience with time-series databases, LSM-based storage engines, or custom data pipelines
- (Desirable) Familiarity with MLPerf or other large-scale benchmarking frameworks
- (Desirable) Contributions to OSS projects such as llm-d, vLLM or PyTorch
- (Desirable) Exposure to benchmarking GPU clusters or multi-region environments
- (Desirable) Background working with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies
- We're looking for a Senior Engineer for CoreWeave's Benchmarking & Performance team
- You will have an integral part in our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure
- You will also aid us in achieving industry-leading end-to-end performance benchmarking publications such as MLPerf
- You will be an owner who leads designs, raises engineering standards, and delivers measurable improvements to latency, throughput, and reliability across multiple services
- You'll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native platform and meet strict P99 SLAs at scale
- Develop and enhance Kubernetes-native benchmarking services that measure latency, throughput, jitter, and cost-per-request across CoreWeave's compute stack
- Contribute to implementing and maintaining benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, and result validation
- Participate in design discussions and contribute to architecture decisions within the team
- Break down engineering tasks into clear milestones and deliver reliable, high-quality code
- Collaborate with teammates to maintain reproducible, well-documented benchmarking processes
- Provide constructive code reviews and share best practices with peers
- Mentor junior engineers; review cross-team designs and elevate coding/testing standards
- Help ensure reproducible, well-documented benchmarking processes
Job ID: 523557660
Originally Posted on: 6/4/2026
Want to find more Technology opportunities?
Check out the 165,053 verified Technology jobs on iHireTechnology
Similar Jobs