Lead PySpark Developer Architect

Precision Technologies Corp
New York, New York
Full Time

Email Address

Apply Now

Lead PySpark Developer Architect

Job Description:

We are seeking a highly skilled and experienced Lead PySpark Developer Architect to join our dynamic data engineering team. In this role, candidate will lead the design, development, and optimization of large-scale, fault-tolerant data processing systems. Candidate will leverage deep expertise in PySpark and distributed computing to architect innovative solutions that enable efficient data pipelines and analytics.

Responsibilities:

Lead the development and architecture of scalable data processing systems using PySpark.

Design and implement efficient and reliable data pipelines, data lakes, and ETL workflows.

Fine-tune Spark applications for optimal performance, including configuration tuning, memory management, and resource allocation.

Collaborate with data engineers, data scientists, and stakeholders to understand data processing requirements and deliver robust solutions.

Manage and optimize Spark clusters, ensuring high availability and performance, utilizing tools like Kubernetes, YARN, and Mesos.

Work with big data storage solutions such as HDFS, S3, Parquet, and ORC to manage data storage and retrieval efficiently.

Utilize Spark SQL, DataFrames, and Dataset APIs to perform complex data transformations and analytics.

Apply best practices in distributed computing principles and stay current with the latest technologies and trends in big data processing.

Requirements:

10+ Years experience as a Lead Spark Developer, Data Engineer, or similar role with extensive hands-on PySpark experience.

Strong proficiency in Python and Spark APIs.

Deep understanding of distributed computing principles, architectures, and best practices.

Expertise in designing and developing fault-tolerant and scalable data processing systems.

Strong skills in tuning Spark applications, including configuration, memory, and resource management.

Experience with cluster management tools such as Kubernetes, YARN, or Mesos.

Practical knowledge of big data storage solutions including HDFS, S3, and formats like Parquet and ORC.

Demonstrated ability to design and implement efficient data pipelines and data lakes.

Excellent problem-solving, communication, and collaboration skills.

Employers have access to artificial intelligence language tools (AI) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Report this job

Dice Id:

10424603

Position Id:

8680307

Job ID: 483018259

Originally Posted on: 6/27/2025

Email Address

Apply Now