Sr. DevOps Engineer - AI and Site Reliability Engineering

  • Oregon Employment Department
  • Salem, Oregon
  • Full Time

Our company

At Teradata, we believe that people thrive when empowered with better information. Teradata Autonomous Knowledge Platform activates enterprise intelligence by unifying data, knowledge and business context to achieve tangible outcomes. With Teradata, organizations can provide agents with full context for impact when it matters. Our solution lets businesses connect and scale on premises, in the cloud, or through a hybrid approach. Teradata delivers real business value with AI.

What You'll Do

  1. Working on a team of professionals, you will design, implement, test, deploy, administer, and continually improve software solutions to ensure system reliability and availability, mitigate operational risks, track system health, and improve mean-time-to-discover and mean-time-to-respond for operational issues.

  2. You will help lead chaos engineering efforts in a production-alike environment, exposing systems to simulations of real-world turbulence with the objective of identifying and quantifying operational weaknesses and developing remediation strategies.

  3. You will leverage modern AI technologies, including large language models, machine learning, and agentic systems, both to increase the operational efficiency of the team and to measure and improve the reliability, scalability, observability, supportability, and performance of Teradata software.

  4. You will become a subject-matter expert in the production deployment and upgrade of Teradata software and the full software stack, from the network layer all the way to the observability tooling, that it relies on.

Who You'll Work With

  1. You'll work on a globally-distributed team of other devops professionals, with engineers focused on site reliability engineering and observability.

  2. You'll work closely with product engineering and cloud operations personnel to understand operational requirements and identify and remediate operational deficits.

  3. You'll work with security and compliance teams to help provide evidence necessary to meet Teradata's compliance obligations.

  4. You'll report to a Sr. Manager, Site Reliability Engineering.

What Makes You A Qualified Candidate

  1. Bachelor's degree or equivalent in computer science or a related field, master's degree or equivalent preferred.

  2. 4+ years of industry experience.

  3. Experience with at least one major cloud service provider (AWS, Azure, and/or Google Cloud), preferably all three. CSP developer or architect certifications preferred.

  4. Experience building and deploying complex software solutions to significant operational problems. Proficiency with at least one modern programming language such as Python, and with a modern source control tool, preferably Git.

  5. Familiarity with machine learning libraries such as Tensorflow and Scikit-Learn.

  6. Experience building and deploying AI systems via cloud-based generative AI and agentic AI platforms such as AWS Bedrock, AWS Sagemaker, Azure AI Foundry, Google Vertex AI, and Google AgentSpace.

  7. Experience with at least one modern defect tracking tool, preferably Jira.

  8. Experience with an infrastructure-as-code (IaC) cloud provisioning tool, preferably Terraform, and with a configuration management tool such as Ansible or Puppet.

  9. Experience with Grafana or an equivalent observability tool.

  10. Experience with a build/deployment automation tool such as Jenkins or Bamboo.

  11. Familiarity with both SQL and noSQL databases, and use cases for each.

  12. Experience administering Linux-based systems.

What You'll Bring

  1. 4+ years of experience in the software industry in a devops or site rel...

Job ID: 523604911
Originally Posted on: 6/4/2026

Want to find more Technology opportunities?

Check out the 164,721 verified Technology jobs on iHireTechnology