About the Role
As Senior Data Engineer, you will lead the development and ownership of domain data products, including batch, streaming and artificial intelligence/machine learning (AI/ML) feature pipelines. You will drive design decisions that improve data reliability, performance and governance maturity while standardizing patterns that scale across teams. You will partner cross-functionally to enable analytics, ML and GenAI use cases with trusted data.
What Youll Do
Design, build and maintain batch, streaming and real-time Artificial Intelligence (AI) feature pipelines to extract data from diverse source systems and producers (Application Programming Interfaces (APIs), events, databases, files) ensuring efficient ingestion, transformation and publishing
Design, refine and implement scalable data models, semantic layers and data contracts to promote consistency, reuse and accessibility
Owns the end-to-end data product lifecycle for the domain. Define and maintain data contracts, including service level agreements (SLAs), schema expectations, quality metrics and consumer ownership, to ensure a reliable and trustworthy experience
Partner with cross functional teams to co-design scalable data solutions that meet business needs and clearly define the boundaries between data pipeline responsibilities and model-building activities
Develop automated workflows and Continuous Integration / Continuous Deployment (CI/CD) pipelines using tools such as Airflow, Apache Spark and Python to drive reliability and faster delivery
Implement validation, observability and evaluation frameworks that ensure accuracy, lineage and timeliness across data pipelines and large language model (LLM) outputs
Apply and enforce governance, privacy and compliance standards (GDPR, PCI DSS, CCPA), ensuring data security and traceability
Partner with cross functional teams to translate business needs into technical data solutions that scale across domains
Drive performance tuning, automation and adoption of AI-powered data tools to enhance data platform efficiency
Mentor data engineers and champion best practices for maintainable, governed and reusable data assets
Own cost and performance tradeoffs for domain data products and monitor compute usage, storage growth and unit cost to implement optimizations that reduce spend while meeting SLAs
Additional tasks may be assigned
What Skills You Have
Required
4+ years designing, building and optimizing data pipelines and models in production, ideally within large-scale cloud environments
Proficiency in SQL and Python (or Scala) for data development, testing and automation
Preferred
Bachelors or Masters degree in Computer Science, Information Systems, Data Engineering or a related field
Experience with Apache Spark (or equivalent) for large-scale data processing and performance optimization
Experience using Airflow/Cloud Composer/Dagster for orchestration, transformation and CI/CD pipelines
Experience with cloud warehouses/lakes (BigQuery, Redshift, Snowflake) and object storage
Experience designing and optimizing streaming pipelines using Kafka, Pub/Sub, spark
Strong understanding of dimensional modeling, normalization and schema design for analytics and GenAI integration into data products
Experience with data testing, lineage, monitoring and observability frameworks to ensure data integrity and reliability