Who you are
You are a hands-on dynamic individual with 3 to 6 years of deep spark/Scala streaming data development experience and are excited to lead our backend product development. You want to grow your skills at an exciting, nimble, responsive company. You are ready to put in the effort and time to get to the next level. You are an adventurer and a team player. Whether it’s with your team, c-level executives or prospects, you can adapt to any situation.
What you will bring to the team and the company
What you will be doing
You will be in a lead development role for Data Sentinel. A high-growth software company that has developed a sensitive information intelligence platform that helps businesses to identify, inventory, categorize, track and trace sensitive data with the enterprise. We help companies know exactly what is in their data, no matter the source, the location, the type of data, or the scale. Our technology runs persistently within the business, constantly measuring data usage and placement against policies. We then trigger remediation actions, lowering risk while delivering compliant, governed and correct data back to the business.
You will be part of a product development team, reporting to the SVP of Engineering, with a goal of finding innovative solutions to processing and reading vast amounts of raw data from various systems and various formats using spark. This involves advanced data pipelines that will be embedded into our product.
Responsibilities
- Design & develop Scala/Spark processes for data discovery
- Produce unit tests for Spark transformations and helper methods
- Write Scaladoc-style documentation with all code
- Design data processing pipelines
Skills Required
- Scala (with a focus on the functional programming paradigm)
- Apache Spark 2.x
- Apache Spark RDD API
- Apache Spark SQL DataFrame API
- Apache Spark Streaming API
- Containerization experience (docker & Kubernetes)
- Spark query tuning and performance optimization
- SQL database integration (Microsoft, Oracle, Postgres, and/or MySQL, etc)
- Experience working with HDFS, S3, Cassandra, and/or DynamoDB
- Experience with document processing under Spark Streaming
- Experience with Kafka & Zookeeper
- Understanding of distributed systems