On-prem Platform Engineer

TekGlobal
Charlotte, North Carolina
Full Time

Email Address

Apply Now

Role : On-prem Platform Engineer

Location: Charlotte, NC

Key Skills:

Must-Have Skills (Mandatory Keywords)

LLM Inference & Optimization

vLLM, TensorRT-LLM, Triton Inference Server, SGLang
Inference optimization techniques:

Continuous batching
Speculative decoding
KV cache / Prefix caching

Model optimization:

FP8, AWQ, GPTQ

Distributed & GPU Systems

Tensor parallelism and large model scaling
CUDA, NCCL, GPU architecture
GPU partitioning & optimization (MIG)

Kubernetes & ML Serving

Kubernetes-based ML serving platforms
KServe, OpenShift AI
Helm charts, Operators, platform automation

GPU Orchestration

Run:AI or similar GPU scheduling/orchestration platforms
Multi-tenant GPU workload management

Platform Engineering

Experience building internal AI/ML platforms (on-prem or hybrid)
Strong automation and system design mindset

Observability & Performance

Prometheus, Grafana
ML observability (model latency, throughput, drift, resource utilization)
Performance benchmarking and tuning

Good to Have / Preferred Skills

Experience with LLMOps / GenAI pipelines
Exposure to hybrid cloud (on-prem + Google Cloud Platform/Azure integration)
Familiarity with Inferentia / alternative accelerators
Knowledge of service mesh / networking in GPU clusters

· Build, configure, and operate onprem Kubernetes/OpenShift AI platforms for deploying and serving GenAI models and LLM inference workloads.

· Design and optimize highperformance inference stacks using vLLM, TensorRTLLM, Triton Inference Server, SGLang, and advanced techniques (continuous batching, speculative decoding, KV caching).

· Manage GPU orchestration and capacity using Run:AI, MIG, CUDA/NCCL, and tensor parallelism to maximize utilization and throughput.

· Deploy and operate Kubernetes ML serving frameworks (KServe, Helm, Operators) for scalable, reliable model serving.

· Drive inference optimization and benchmarking, leveraging FP8, AWQ, GPTQ, and performance tools such as GuideLLM and Locust.

· Implement observability and ML monitoring using Prometheus, Grafana, Arize AI, ensuring SLA/SLO compliance for GenAI services.

· Collaborate with ML and research teams to onboard new models, tune inference performance, and productionize GenAI use cases.

Job ID: 523336680

Originally Posted on: 6/2/2026

Email Address

Apply Now

Want to find more Technology opportunities?

Check out the 165,053 verified Technology jobs on iHireTechnology

Similar Jobs

Enterprise SRE / DevOps Engineer

Experis

Charlotte, NC

Core Google Cloud Architect

Lorven Technologies, Inc.

Charlotte, NC

Google Cloud Platform (Google Cloud Platform) Engineer

Liberty Personnel, Inc

Harrisburg, NC

Data Platform Engineer

Innova Solutions

Charlotte, NC

Systems Operations Manager Data Platforms -Teradata & Hadoop

Wells Fargo

Charlotte, NC

Search All Technology Jobs »