ML Infrastructure Engineer

Phizenix
Menlo Park, California
Full Time

Email Address

Apply Now

ML Infrastructure Engineer

ML Infrastructure Engineer

Menlo Park, CA | On-Site | Full-Time/Direct Hire

Client Opportunity | Through Phizenix

Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language modelsbuilt for faster generation, multimodal integration, and scalable enterprise deployment.

We're looking for a

ML Infrastructure Engineer

to help build the infrastructure that powers large-scale model training and real-time inference. You'll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.

Responsibilities

Design and manage distributed infrastructure for ML training at scale

Optimize model serving systems for low-latency inference

Build automated pipelines for data processing, model training, and deployment

Implement observability tools to monitor performance in production

Maximize resource utilization across GPU clusters and cloud environments

Translate research requirements into robust, scalable system designs

Must-Haves

MS or PhD

in Computer Science, Engineering, or a related field (or equivalent experience)

Strong foundation in software engineering, systems design, and distributed systems

Experience with cloud platforms (AWS, GCP, or Azure)

Proficient in Python and at least one systems-level language (C++/Rust/Go)

Hands-on experience with Docker, Kubernetes, and CI/CD workflows

Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective

Understanding of GPU programming and high-performance infrastructure

Nice-to-Haves

Experience with large-scale ML training clusters and GPU orchestration

Knowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)

Experience with distributed training strategies (e.g., data/model/pipeline parallelism)

Familiarity with orchestration tools like Kubeflow or Airflow

Background in performance tuning, system profiling, and MLOps best practices

At

Phizenix

, we're committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let's build the futuretogether.

$180,000

$200,000 USD

Employers have access to artificial intelligence language tools (AI) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Report this job

Dice Id:

91165417

Position Id:

...

Job ID: 480288474

Originally Posted on: 6/7/2025

Email Address

Apply Now