Job description

Job Description

    Contract job Position!
     

    Job Title: Azure SRE

    Work Location: Houston, TX
     

    Job Description

    Are you eager to thrive in a dynamic, global setting where your ideas are valued and your contributions recognized? We are seeking an Azure Site Reliability Engineer to generate a system with scalability and a highly reliable system. The Azure SRE developer also spends time on operation activities such as problem-solving, documentation, and system management.

    • Deploying, monitoring, and supporting applications within a multi-tenant Kubernetes environment.
    • Enhancing the overall stability of our systems through effective monitoring, change control management, and the implementation of best practices to maintain production environments.
    • Supporting GitLab Pipelines and Terraform to manage Infrastructure as Code (IaC) across multiple Microsoft Azure resources and services.
    • Identifying opportunities for automation and troubleshooting user-reported issues.
    • Implementing migrations, upgrades, and patches across all environments.
    • Engaging effectively with business and development teams at all levels.
    • Diagnosing the root causes of application errors and escalating significant concerns as needed.
    • Collaborating with cross-functional teams to address complex challenges involving databases, UNIX, cloud technologies, etc.
    • Providing occasional on-call or weekend support for critical activities outside of regular hours.
       

    Role and Responsibilities

    • Develop and define a comprehensive cloud strategy tailored to the client's enterprise, considering industry standards, business objectives, and IT requirements.
    • Provide leadership in cloud infrastructure support, focusing on security best practices, privileged access management, role-based access control (RBAC), and policy implementation.
    • Conduct Cloud Production Readiness Assessments and recommend solutions to help establish secure, scalable cloud infrastructure with deployed applications.
    • Create future-state architecture blueprints for the client's enterprise.
    • Design cloud infrastructure environments, primarily on AWS and Azure, while also incorporating multi-cloud solutions from platforms like Oracle and Google Cloud.
    • Build secure cloud environments following the best practices of well-architected security frameworks.
    • Perform risk analysis for cloud migrations, identifying potential impacts, risks, and corresponding mitigation strategies.
    • Drive automation through DevOps tools and methodologies to streamline processes.
       

    Qualifications

    • Bachelor’s degree in Computer Science with 5+ years of experience in an enterprise environment.
    • A Master’s degree in Computer Science or related fields is desirable.
    • Relevant AWS or Azure certification is required.
       

    Experience

    • At least 4+ years of experience as a cloud migration or support consultant.
    • Practical experience with two or more automation tools, such as Terraform, Ansible, CloudFormation, or ARM Templates.
    • Proficiency in Unix and PowerShell scripting.
    • Prior experience as a Site Reliability Engineer (SRE) for large-scale AWS or Azure Cloud projects is essential.
    • Experience working in agile development environments and with project management practices.
    • Familiarity with cloud frameworks, containerization, and Kubernetes services.
    • Strong understanding of cloud infrastructure, compute stacks (Linux/Windows), management tools, and network integration.
    • Excellent written and verbal communication skills in English.
    • Extensive experience in all aspects of cloud computing, including infrastructure, storage, platforms, and data services.
    • Proven track record in architecting, designing, and implementing complex, highly available, and scalable cloud solutions.
    • Deep knowledge of cloud best practices and hands-on experience in cloud migration, design, implementation, and ongoing Day 2 operations.
       

    Additional Skills

    • A bachelor's degree or ideally 5+ years of experience in Information Technology, along with a strong understanding of Site Reliability Engineering (SRE) concepts and solid knowledge of Azure Cloud.
    • Hands-on experience with Kubernetes and Docker containers, and proficiency in DevOps tools such as GitLab, Azure DevOps, and Nexus, throughout all phases of modern DevOps pipelines.
    • Familiarity with key Azure technologies, including Virtual Machines, Key Vaults, Storage Accounts, Virtual Networks, and Hub Networks.
    • Proven experience in delivering high-quality operational services to customers.
    • Strong expertise in managing and restoring application services within a complex and secure banking environment.
    • Practical experience in defining and implementing automation routines to enhance service stability and quality in production.
    • Proficiency with Azure Repos, including branching, code reviews, and code analysis tools, and excellent knowledge of programming languages such as Python, Bash, and PowerShell.
    • A solid understanding of database platforms like PostgreSQL, Oracle, or Azure SQL, and a strong grasp of ITIL lifecycle and processes; experience with Backstage Open Platforms is a plus.
       

    Soft Skill

    • A dedicated and forward-thinking team player with a logical mindset, adept at problem-solving.
    • Well-organized with an intuitive ability to prioritize tasks in a complex environment.
    • A Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD).
    • Curious, collaborative, and continuously eager to learn new skills.
    • Fluent in English, capable of effectively navigating a global, multicultural environment.