Job description

Job Description


    Job Title: Big Data - Hadoop
    Work Location: Hopkins, MN (100% REMOTE)
     

    Job Description:

    We are seeking a skilled Hadoop Developer to contribute to building and maintaining large-scale data storage and processing infrastructure. This role requires a strong knowledge of Hadoop tools, as well as the ability to write efficient software using the Hadoop API. In this role, you will be responsible for creating efficient data storage solutions and handling massive datasets using Hadoop and its ecosystem of tools. Your expertise in big data frameworks will help drive the development and maintenance of cutting-edge data infrastructure that supports our business objectives.

    Responsibilities:

    • Write software that interacts with HDFS (Hadoop Distributed File System) and MapReduce for data processing.
    • Analyze business and technical requirements and evaluate existing data solutions to implement the best approaches.
    • Build, operate, monitor, and troubleshoot Hadoop infrastructure to ensure performance and stability.
    • Create tools, and libraries, and maintain processes that allow other engineers to access data and write their own MapReduce programs.
    • Develop comprehensive documentation and operational playbooks to manage the Hadoop infrastructure efficiently.
    • Evaluate and leverage hosted cloud solutions such as AWS, Google Cloud, or Azure to optimize data storage and processing.
    • Design and implement scalable, maintainable ETL (Extract, Transform, Load) pipelines for efficient data processing.
    • Understand and apply Hadoop’s security mechanisms, ensuring secure access to data and infrastructure.
    • Develop solutions to ingest data from various sources into the Hadoop ecosystem for processing and storage. 
    • A Bachelor’s degree in Software Engineering, Computer Science, or a related field.
       

    Preferred Skills:

    • Experience with Hadoop-as-a-Service platforms like AWS EMR or Google Dataproc.
    • Knowledge of Kafka, Spark Streaming, and other real-time processing tools.
    • Experienced with DevOps practices and tools such as Docker.
       

    Professional Experience:

    • Prior experience working as a Hadoop Developer or Big Data Engineer, with a strong understanding of big data frameworks and technologies.
    • Comprehensive knowledge of the Hadoop ecosystem and its key components, including architecture, functionality, and best practices.
    • In-depth expertise in working with Hive, HBase, and Pig for data storage, querying, and processing.
    • Familiarity with MapReduce and Pig Latin Scripts to write efficient, distributed data processing jobs.
    • Knowledge of back-end programming languages such as JavaScript, Node.js, and Object-Oriented Analysis and Design (OOAD).
    • Practical experience with data loading tools like Sqoop and Flume to manage data integration between Hadoop and external systems.
    • Strong analytical thinking and problem-solving skills to troubleshoot issues and optimize data processing.
    • Excellent project management skills with the ability to handle tasks efficiently, along with strong communication skills for collaborating across teams.
       

    Skills & Qualifications:

    • Solid understanding of JVM runtime, Java programming language, and ideally another JVM-based language. Familiarity with specific Java versions is a plus.
    • Strong knowledge of algorithmic complexity and its application in optimizing data processing tasks.
    • Ability to make informed trade-offs when designing and working with distributed systems.
    • Experience with software engineering best practices to develop maintainable, scalable, and robust applications.
    • Prior experience working with a Hadoop distribution, enabling effective infrastructure and software development.
    • Familiarity with frameworks such as Apache Spark for large-scale data processing and computation.
    • Experience with tools like HBase, Kafka, ZooKeeper, or other Apache software for distributed computing and data management.
    • Deep understanding of Linux systems, including its operation, networking, and security.
    • Knowledge of how to efficiently move large datasets across different platforms and systems.
       

    This role offers an exciting opportunity to work with cutting-edge Hadoop technologies, creating impactful solutions for large-scale data challenges. If you're experienced in distributed systems and big data, this could be a perfect fit for you!