Lead PySpark Developer Architect
Job Description:
We are seeking a highly skilled and experienced Lead PySpark Developer Architect to join our dynamic data engineering team. In this role, candidate will lead the design, development, and optimization of large-scale, fault-tolerant data processing systems. Candidate will leverage deep expertise in PySpark and distributed computing to architect innovative solutions that enable efficient data pipelines and analytics.
Responsibilities:
Lead the development and architecture of scalable data processing systems using PySpark.
Design and implement efficient and reliable data pipelines, data lakes, and ETL workflows.
Fine-tune Spark applications for optimal performance, including configuration tuning, memory management, and resource allocation.
Collaborate with data engineers, data scientists, and stakeholders to understand data processing requirements and deliver robust solutions.
Manage and optimize Spark clusters, ensuring high availability and performance, utilizing tools like Kubernetes, YARN, and Mesos.
Work with big data storage solutions such as HDFS, S3, Parquet, and ORC to manage data storage and retrieval efficiently.
Utilize Spark SQL, DataFrames, and Dataset APIs to perform complex data transformations and analytics.
Apply best practices in distributed computing principles and stay current with the latest technologies and trends in big data processing.
Requirements:
10+ Years experience as a Lead Spark Developer, Data Engineer, or similar role with extensive hands-on PySpark experience.
Strong proficiency in Python and Spark APIs.
Deep understanding of distributed computing principles, architectures, and best practices.
Expertise in designing and developing fault-tolerant and scalable data processing systems.
Strong skills in tuning Spark applications, including configuration, memory, and resource management.
Experience with cluster management tools such as Kubernetes, YARN, or Mesos.
Practical knowledge of big data storage solutions including HDFS, S3, and formats like Parquet and ORC.
Demonstrated ability to design and implement efficient data pipelines and data lakes.
Excellent problem-solving, communication, and collaboration skills.
Employers have access to artificial intelligence language tools (AI) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job
Dice Id:
10424603
Position Id:
8680307
Job Description:
We are seeking a highly skilled and experienced Lead PySpark Developer Architect to join our dynamic data engineering team. In this role, candidate will lead the design, development, and optimization of large-scale, fault-tolerant data processing systems. Candidate will leverage deep expertise in PySpark and distributed computing to architect innovative solutions that enable efficient data pipelines and analytics.
Responsibilities:
Lead the development and architecture of scalable data processing systems using PySpark.
Design and implement efficient and reliable data pipelines, data lakes, and ETL workflows.
Fine-tune Spark applications for optimal performance, including configuration tuning, memory management, and resource allocation.
Collaborate with data engineers, data scientists, and stakeholders to understand data processing requirements and deliver robust solutions.
Manage and optimize Spark clusters, ensuring high availability and performance, utilizing tools like Kubernetes, YARN, and Mesos.
Work with big data storage solutions such as HDFS, S3, Parquet, and ORC to manage data storage and retrieval efficiently.
Utilize Spark SQL, DataFrames, and Dataset APIs to perform complex data transformations and analytics.
Apply best practices in distributed computing principles and stay current with the latest technologies and trends in big data processing.
Requirements:
10+ Years experience as a Lead Spark Developer, Data Engineer, or similar role with extensive hands-on PySpark experience.
Strong proficiency in Python and Spark APIs.
Deep understanding of distributed computing principles, architectures, and best practices.
Expertise in designing and developing fault-tolerant and scalable data processing systems.
Strong skills in tuning Spark applications, including configuration, memory, and resource management.
Experience with cluster management tools such as Kubernetes, YARN, or Mesos.
Practical knowledge of big data storage solutions including HDFS, S3, and formats like Parquet and ORC.
Demonstrated ability to design and implement efficient data pipelines and data lakes.
Excellent problem-solving, communication, and collaboration skills.
Employers have access to artificial intelligence language tools (AI) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job
Dice Id:
10424603
Position Id:
8680307
Job ID: 483018259
Originally Posted on: 6/27/2025
Want to find more Technology opportunities?
Check out the 153,081 verified Technology jobs on iHireTechnology
Similar Jobs