Job description
Job Description
- Deploying, monitoring, and supporting applications within a multi-tenant Kubernetes environment.
- Enhancing the overall stability of our systems through effective monitoring, change control management, and the implementation of best practices to maintain production environments.
- Supporting GitLab Pipelines and Terraform to manage Infrastructure as Code (IaC) across multiple Microsoft Azure resources and services.
- Identifying opportunities for automation and troubleshooting user-reported issues.
- Implementing migrations, upgrades, and patches across all environments.
- Engaging effectively with business and development teams at all levels.
- Diagnosing the root causes of application errors and escalating significant concerns as needed.
- Collaborating with cross-functional teams to address complex challenges involving databases, UNIX, cloud technologies, etc.
- Providing occasional on-call or weekend support for critical activities outside of regular hours.
- Develop and define a comprehensive cloud strategy tailored to the client's enterprise, considering industry standards, business objectives, and IT requirements.
- Provide leadership in cloud infrastructure support, focusing on security best practices, privileged access management, role-based access control (RBAC), and policy implementation.
- Conduct Cloud Production Readiness Assessments and recommend solutions to help establish secure, scalable cloud infrastructure with deployed applications.
- Create future-state architecture blueprints for the client's enterprise.
- Design cloud infrastructure environments, primarily on AWS and Azure, while also incorporating multi-cloud solutions from platforms like Oracle and Google Cloud.
- Build secure cloud environments following the best practices of well-architected security frameworks.
- Perform risk analysis for cloud migrations, identifying potential impacts, risks, and corresponding mitigation strategies.
- Drive automation through DevOps tools and methodologies to streamline processes.
- Bachelor’s degree in Computer Science with 5+ years of experience in an enterprise environment.
- A Master’s degree in Computer Science or related fields is desirable.
- Relevant AWS or Azure certification is required.
- At least 4+ years of experience as a cloud migration or support consultant.
- Practical experience with two or more automation tools, such as Terraform, Ansible, CloudFormation, or ARM Templates.
- Proficiency in Unix and PowerShell scripting.
- Prior experience as a Site Reliability Engineer (SRE) for large-scale AWS or Azure Cloud projects is essential.
- Experience working in agile development environments and with project management practices.
- Familiarity with cloud frameworks, containerization, and Kubernetes services.
- Strong understanding of cloud infrastructure, compute stacks (Linux/Windows), management tools, and network integration.
- Excellent written and verbal communication skills in English.
- Extensive experience in all aspects of cloud computing, including infrastructure, storage, platforms, and data services.
- Proven track record in architecting, designing, and implementing complex, highly available, and scalable cloud solutions.
- Deep knowledge of cloud best practices and hands-on experience in cloud migration, design, implementation, and ongoing Day 2 operations.
- A bachelor's degree or ideally 5+ years of experience in Information Technology, along with a strong understanding of Site Reliability Engineering (SRE) concepts and solid knowledge of Azure Cloud.
- Hands-on experience with Kubernetes and Docker containers, and proficiency in DevOps tools such as GitLab, Azure DevOps, and Nexus, throughout all phases of modern DevOps pipelines.
- Familiarity with key Azure technologies, including Virtual Machines, Key Vaults, Storage Accounts, Virtual Networks, and Hub Networks.
- Proven experience in delivering high-quality operational services to customers.
- Strong expertise in managing and restoring application services within a complex and secure banking environment.
- Practical experience in defining and implementing automation routines to enhance service stability and quality in production.
- Proficiency with Azure Repos, including branching, code reviews, and code analysis tools, and excellent knowledge of programming languages such as Python, Bash, and PowerShell.
- A solid understanding of database platforms like PostgreSQL, Oracle, or Azure SQL, and a strong grasp of ITIL lifecycle and processes; experience with Backstage Open Platforms is a plus.
- A dedicated and forward-thinking team player with a logical mindset, adept at problem-solving.
- Well-organized with an intuitive ability to prioritize tasks in a complex environment.
- A Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD).
- Curious, collaborative, and continuously eager to learn new skills.
- Fluent in English, capable of effectively navigating a global, multicultural environment.
Contract job Position!
Job Title: Azure SRE
Work Location: Houston, TX
Job Description
Are you eager to thrive in a dynamic, global setting where your ideas are valued and your contributions recognized? We are seeking an Azure Site Reliability Engineer to generate a system with scalability and a highly reliable system. The Azure SRE developer also spends time on operation activities such as problem-solving, documentation, and system management.
Role and Responsibilities
Qualifications
Experience
Additional Skills
Soft Skill