LLM Inference / AI Infrastructure Engineer

  • Apex 2000
  • Charlotte, North Carolina
  • Full Time

LLM Inference / AI Infrastructure Engineer
Location: Charlotte, NC
Duration: 9-12 Month

JD:
vLLM TensorRTLLM Triton Inference Server SGLang Inference Optimization Continuous Batching Speculative Decoding KV Cache / Prefix Caching FP8 / AWQ / GPTQ Tensor Parallelism Kubernetes ML Serving KServe OpenShift AI Helm / Operators GPU Orchestration Run:AI Performance Benchmarking CUDA / NCCL / MIG Prometheus / Grafana ML Observability

skills sanity check: HAVE YOU WORKED ON Nvidia H200? If yes, chances are you will know all above skills

Job ID: 522491901
Originally Posted on: 5/26/2026

Want to find more Technology opportunities?

Check out the 165,505 verified Technology jobs on iHireTechnology