AWS Cloud Data Engineer

  • Xoriant Corporation
  • Boston, Massachusetts
  • Full Time

Role: AWS Cloud Data Engineer

Location: Boston, MA (Hybrid Onsite)

Industry: Financial Services

About the Role

We are seeking a highly skilled Cloud Data Engineer to design, build, and optimize a modern, scalable Legal Data Lakehouse platform. Operating within State Street's Global Technology Services, you will leverage a deep knowledge of the full suite of AWS cloud services combined with high-performance Databricks capabilities to ingest, model, and secure complex enterprise data structures (including contracts, litigation matters, eDiscovery datasets, and global regulatory feeds).

This role is critical to establishing a single, highly governed, audit-ready source of truth that powers critical legal operations, compliance analytics, and emerging generative AI/ML use cases across our global footprint.

Key Responsibilities:

  1. Data Lakehouse Engineering & Architecture
  • Design, build, and maintain enterprise-grade, custom data pipelines utilizing Databricks (PySpark, Spark SQL, and Scala) on AWS infrastructure .
  • Implement and manage a multi-layered Lakehouse architecture ( Bronze, Silver, and Gold zones ) to curate unstructured contract text, semi-structured logs, and highly structured transactional tables.
  • Architect robust end-to-end data ingestion frameworks supporting high-throughput batch and near real-time data flows from on-premises systems and third-party legal platforms.
  1. Cloud Infrastructure & Platform Optimization
  • Utilize the broad suite of AWS services (including but not limited to S3, Lambda, Glue, EMR, Athena, EC2, and CloudWatch ) to support and optimize distributed storage and compute infrastructure.
  • Conduct advanced performance tuning on large-scale Apache Spark workloads optimizing partitioning, indexing, caching strategies, and Databricks cluster utilization to manage cloud run costs efficiently.
  • Automate deployment configurations, orchestrate multi-dependency workflows (via Databricks Jobs/Workflows, Airflow, or Autosys), and build containerized solutions using Docker.
  1. Data Governance, Security & Compliance
  • Enforce strict, fine-grained access controls, row/column-level security, and data classification strategies using Databricks Unity Catalog integrated with AWS IAM and enterprise identity providers.
  • Ensure all data pipelines and lakehouse layers remain strictly compliant with global data privacy regulations (e.g., GDPR) and rigid internal financial audit standards.
  • Implement end-to-end data lineage tracking, validation frameworks, and automated reconciliation routines to preserve absolute data integrity for legal and regulatory reporting.
  1. Downstream Integration & Innovation
  • Collaborate with business analysts and legal operations to expose curated datasets via secure APIs and optimized connectors.
  • Enable seamless consumption of financial and legal analytics through integration with visualization tools like Power BI or automation platforms ( Power Apps / Power Automate ).
  • Support data readiness for advanced AI/ML models, contract intelligence tools, and eDiscovery search workflows.

Required Skills & Qualifications

Core Technical Skills:

  • Databricks & Spark: 3+ years of deep, hands-on experience building, scheduling, and debugging data pipelines on Databricks utilizing PySpark, Scala, or Spark SQL.
  • AWS Cloud Suite: Extensive knowledge of AWS core services, with deep familiarity across object storage (S3), serverless compute (Lambda), data cataloging/ETL (Glue), access management (IAM), and encryption (KMS).
  • Data Modeling: Strong proficiency in relational database design, data warehousing structures, schema evolution, and performance tuning techniques (e.g., Delta Lake formats, Apache Iceberg).
  • Programming & Scripting: Strong coding skills in Python and advanced SQL are mandatory.
  • CI/CD & Devops: Proven familiarity with version control (Git) and standard automated deployment workflows.

Domain & Professional Value-Adds:

  • Regulated Industries: Experience in Financial Services, Asset Management, or handling highly sensitive, audit-driven data environments is highly preferred.
  • Legal Data Concepts: Familiarity with legal data constructs such as contract clauses, corporate matter management, or metadata extraction is a significant advantage.
  • Ownership Mindset: Excellent communication skills, with a track record of collaborating across global, distributed engineering and business architecture teams.
Job ID: 523336370
Originally Posted on: 6/2/2026

Want to find more Technology opportunities?

Check out the 165,053 verified Technology jobs on iHireTechnology