LLM Inference / AI Infrastructure Engineer
Location: Charlotte, NC
Duration: 9-12 Month
JD:
vLLM TensorRTLLM Triton Inference Server SGLang Inference Optimization Continuous Batching Speculative Decoding KV Cache / Prefix Caching FP8 / AWQ / GPTQ Tensor Parallelism Kubernetes ML Serving KServe OpenShift AI Helm / Operators GPU Orchestration Run:AI Performance Benchmarking CUDA / NCCL / MIG Prometheus / Grafana ML Observability
skills sanity check: HAVE YOU WORKED ON Nvidia H200? If yes, chances are you will know all above skills
Job ID: 522491901
Originally Posted on: 5/26/2026