Overview

Do you have a passion for big data, analytics, and cloud computing? Do you thrive in a fast-paced environment? Are you passionate about building teams and leading the data-driven digital transformation of Fortune 500 companies?

Apache Spark

Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Our Client

Our client is building the world’s largest real-time finding network.

At the core of Data Science and Engineering, our mission is to enable our client to make smart data-backed decisions. Join our team to help build scalable data solutions for cross-functional teams by optimizing data collection and data flow.
We are looking for a passionate engineer who can take complete ownership of company’s end to end data lifecycle by building scalable ETL solutions,  performing data modeling, and improving the overall data quality & system reliability. Become part of a team which leans on applying cutting-edge real-time and exploratory data analysis to make “everything” smart location enabled.

Responsibilities

  • Leverage Spark distributed computing in order to design, implement and manage scalable data processing ETL platforms
  • Propose, architect, and build algorithms for real-time data processing
  • Own end to end data flow.  Extend our data pipelines by collecting, storing, processing, and transforming large data sets
  • Collaborate with cross-functional teams in understanding the data requirements; Apply best practices for having the most reliable and consumable data
  • Propose and implement strategies for acquiring data in order to foster new insights

Requirements

  • 5+ years of combined experience as a data and/or backend engineer
  • Proficient with Python, Spark, AWS, and Spark Streaming
  • Deep knowledge of data mining techniques, SQL, relational databases, and NoSQL databases
  • Delight in crushing messy unstructured data and making the world sane by producing quality data that is clean and usable
  • You know OOD, have an interest in functional programming and understand basic statistics
  • Worked with geospatial data and understand the challenges related to storing and leveraging it in near real-time
  • You love exploring Apache ecosystem and have a proven experience to adapt to new tech.
  • You enjoy working with a highly focused team at a lean startup
  • You have good communication skills and can work independently

Nice to Have

  • Experience with Databricks/Airflow
  • Experience with  Kinesis/Kafka/ELK stack
  • Bachelors/Masters in Computer Science, Software Engineering, Mathematics, or equivalent experience

Tagged as: Airflow, Apache Spark, AWS, Databricks, ELK stack, Kafka, Kinesis, NoSQL, Python, Spark Streaming

Before applying for this position you need to submit your online resume. Click the button below to continue.

About Highering AI

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.