Required Qualifications/Skills/Experience
Experience Level: 5 to 8+ years of hands-on data engineering and software development experience, preferably within a large enterprise or financial services environment.
Programming Languages: Strong proficiency in Scala and Python (PySpark) for data processing.
Hadoop/HDFS Expertise:
oIn-depth understanding of HDFS architecture, data storage, and fault tolerance mechanisms.
oExperience with HDFS commands and administration.
oIn-depth knowledge of HDFS architecture, fault tolerance, and hands-on experience with HDFS administration commands.
Distributed Systems:
oFundamental understanding of MapReduce programming paradigm, even if primary development is in Spark/Flink.
oFoundational understanding of the MapReduce programming paradigm and experience with cluster coordination tools, specifically ZooKeeper.
oKnowledge of Zookeeper for distributed coordination services.
Spark/Flink:
oProven track record of deploying production-level Apache Spark (or Flink) applications.
Onsite Requirement:
oMust be willing and able to work onsite in the Tampa, FL office 3 days a week.
Preferred Qualifications/Skills/Experience
Experience with workflow orchestration tools (e.g., Apache Airflow, Oozie).
Familiarity with streaming technologies such as Apache Kafka.
Knowledge of modern cloud data platforms (AWS, GCP, or Snowflake) as enterprise environments modernize.
Understanding of CI/CD pipelines and automated testing in a Big Data environment.
We are seeking a highly skilled C12 PySpark/Scala Developer to join our data engineering team.
The ideal candidate will have deep hands-on experience in the Hadoop ecosystem, specializing in distributed systems, data storage, and high-performance processing pipelines.
In this role, you will be responsible for developing scalable data solutions using PySpark and Scala, while maintaining a strong foundational knowledge of legacy and modern distributed architectures.
You will work closely with cross-functional teams to design, build, and optimize data pipelines that handle massive volumes of critical financial data.
Job Duties
Pipeline Development:
oDesign, develop, and deploy highly scalable Big Data pipelines using PySpark and Scala to process large-scale datasets.
Hadoop Ecosystem Management:
oUtilize in-depth understanding of HDFS architecture, data storage, and fault-tolerance mechanisms to optimize data reliability and accessibility.
System Administration & Coordination:
oExecute HDFS commands for administration and leverage ZooKeeper for distributed coordination services and cluster management.
MapReduce Integration:
oApply a fundamental understanding of the MapReduce programming paradigm to optimize workloads and integrate seamlessly with primary development in Spark/Flink.
Code Quality & Optimization:
oWrite clean, efficient, and well-documented code.
oConduct performance tuning and troubleshoot bottlenecks in distributed data processing jobs.
Collaboration:
oWork alongside data architects, business analysts, and downstream consumers to ensure data solutions meet strict business requirements and regulatory compliance standards.
- **Only those lawfully authorized to work in the designated country associated with the position will be considered.**
- **Please note that all Position start dates and duration are estimates and may be reduced or lengthened based upon a clients business needs and requirements.**
Job ID: 523509116
Originally Posted on: 6/3/2026
Want to find more Technology opportunities?
Check out the 165,520 verified Technology jobs on iHireTechnology
Similar Jobs