LOCATION: Kyiv, Ukraine
Company: Our customer serves major brands and advertisers who rely on the company for high-quality consumer data obtainable only through innovative mobile research technology. More than 2 million users have downloaded the application. Users enjoy the app’s smooth performance and respond with the honest, timely consumer feedback clients need for smart decision-making in the Smartphone Era.
The challenge of the position is developing data lakes and ETLs for collecting and transforming data. Implement Spark streaming for time series aggregation. Enhance existing data streams built in AWS Kinesis. Improve automated tests and deployment tools for the entire data pipeline. Create well encapsulated code that is easily reusable and distributable using Artifactory.
5-7 years of experience with big data including Apache Spark, HDFS, Hive, Yarn and S3;
Experience creating and managing a Hive metastore;
Expert in Java and Scala. Extremely good OOP skills;
Experience with Artifactory;
Expert in SQL, database normalization, and query optimization;
3-5 years MySQL experience;
3-5 years MongoDB experience or similar NoSQL store;
1-3 years Docker experience. Kubernetes experience a plus;
Use Git repositories in a team environment. Knowledge of gitflow a plus;
Experience with Databricks platform or AWS EMR a plus;
Be a quick study and eager to expand your knowledge base;
Be flexible and respond quickly to sudden needs and requests;
Work collaboratively on a staff where everyone’s input is valued;
Education in the field of computer science.
Databricks platform experience;
AWS EMR platform experience;
Advanced understanding of Spark streaming;
Extensive experience managing production data pipelines and data lakes.
Maintain existing ETLs written in Scala using Apache Spark;
Manage deployment of existing ETLs to AWS Glue;
Implement improvements to the existing data pipeline;
Create new ETLs based on specification delivered by our operations and engineering teams;
Create automation tools to help with deployment and scheduling of AWS Glue jobs;
Improve unit tests and test automation;
Implement Spark streaming for windowed time series data to avoid bulk reprocessing;
Create, distribute, and implement reusable components using Artifactory. Refactor existing parts of our stack written in Scala using these new components.
Great office location 3 minutes from Pecherska metro station and great office conditions;
Flexible working hours and possibility of remote work upon request;
Comfortable working place;
Medical insurance after trial period ending;
Possibility of business-trips to the US;
24 paid vacation days per year;
5 paid sick leave days per year.