LLM Inference / AI Infrastructure Engineer

Apex 2000
Charlotte, North Carolina
Full Time

Email Address

Apply Now

LLM Inference / AI Infrastructure Engineer
Location: Charlotte, NC
Duration: 9-12 Month

JD:
vLLM TensorRTLLM Triton Inference Server SGLang Inference Optimization Continuous Batching Speculative Decoding KV Cache / Prefix Caching FP8 / AWQ / GPTQ Tensor Parallelism Kubernetes ML Serving KServe OpenShift AI Helm / Operators GPU Orchestration Run:AI Performance Benchmarking CUDA / NCCL / MIG Prometheus / Grafana ML Observability

skills sanity check: HAVE YOU WORKED ON Nvidia H200? If yes, chances are you will know all above skills

Job ID: 522491901

Originally Posted on: 5/26/2026

Email Address

Apply Now