Senior Azure Platform Engineer
- HCM Staffing and Consulting
- Jersey City, New Jersey
- Full Time
We are placing a Senior Azure Platform Engineer with a centralized Platform Engineering team at one of the world's largest financial institutions. The team owns the internal Azure platform that application teams across the bank depend on and they're in a critical phase, scaling toward enterprise-wide General Availability while expanding into multi-cloud.
You'll build, harden, and operate real production infrastructure. You'll own incidents, run root cause analysis, and improve the platform so the same failure doesn't happen twice.
What you'll do
Own Terraform-based infrastructure across multi-environment and multi-subscription Azure setups state management, drift detection, remediation, and module design
Operate AKS clusters in production pod and node troubleshooting, scaling, ingress issues, cluster upgrades, and incident response
Implement and enforce Azure Policy at management group and subscription scope, including deny and audit effects and active remediation
Design and maintain platform security controls: Managed Identity, RBAC at control and data plane, Key Vault, Entra ID, and secure service-to-service communication
Own production incidents end-to-end triage, root cause analysis, resolution, and prevention
Build observability into the platform: logging strategy, alerting, container monitoring, and AKS diagnostic tooling
Partner with application teams on platform onboarding, automation patterns, and best practices
Contribute to platform hardening and standardization as the platform scales to support more teams
What you bring
Terraform - Remote state management, state locking, drift detection and remediation, multi-environment module design, recovery and import scenarios. This is the highest-priority bar.
AKS / Kubernetes - Production cluster operations pod and node troubleshooting, resource exhaustion, ingress, rollbacks, cluster upgrades. You need operational stories, not theory.
Azure Policy & governance - Hands-on policy enforcement (deny, audit, modify) at management group and subscription scope. Remediation tasks and compliance reporting.
Security & identity - Managed Identity (system vs. user-assigned), RBAC at control and data plane levels, Key Vault, Entra ID, JWT/OAuth, secure inter-service communication.
Observability & RCA - Log Analytics, Azure Monitor, Prometheus, Grafana, Splunk, or ELK. Full incident triage from symptom to resolution to prevention.
Azure platform services - AKS, App Services, Azure Functions, Storage, VNets, Private Endpoints, NSGs, Azure Firewall hands-on production experience.
API Management - Policy configuration: throttling, rate limiting, auth enforcement, request/response transformations, observability integration.