/

Site Reliability Lead

Worldwide, Remote

The Site Reliability Lead is an IT professional responsible for the maintenance, deployment, configuration, and resource optimization of the company's SaaS solution in a hybrid cloud environment across a global footprint.

You will act as a subject matter expert in multi-cloud technologies, take end-to-end responsibility for your engineered results, automate everything to scale our cloud services, and work on all layers (infrastructure as code, automation, monitoring) using modern solutions and software technologies.


Responsibilities:

  • Provide technical leadership to a team of Site Reliability Engineers through guidance and support to ensure that they are meeting their goals and objectives.

  • Deploy, monitor, and manage the company's cloud solutions.

  • Be a key member of a team of dedicated SREs responsible for software engineering and operations.

  • Maintain virtual IT infrastructure including VMs and containers.

  • Optimize resource utilization to manage costs along with capacity planning and cost estimations.

  • Implement different automated and scripted solutions to improve our systems or to satisfy different company and business requirements.

  • Produce operational and technical documentation, such as work instructions, design documents, scripts, and operation manuals.

  • Ensure that the company services are running reliably without downtime and interruptions meeting SLA guidelines.

  • Identify and solve problems in distributed environments across configurations, operating systems, and networks.

  • Work with stakeholders in other departments to ensure that solutions are delivered on time


Requirements:

  • Leadership and organizational skills

  • Hands-on experience handling distributed environments with a keen interest in improving operational efficiency through automation.

  • Advanced experience in Linux administration in production environments, load balancing, system monitoring.

  • Experience with configuration management and IAAC tools (Terraform, Consul, Vault, Packer, Gitlab CI).

  • Proven capability with Cloud providers like Azure.

  • Experience with virtualization, containers, and service orchestration (KVM, Docker, Nomad, K8S, etc).

  • Knowledge about network protocols and experience administering networking infrastructure including VPNs, firewalls, etc.

  • Experience collaborating with multi-functional global and remote teams with a diverse set of backgrounds.

  • Excellent communication in written and spoken English.

  • At least 3-5 years of professional experience with relevant technologies in similar positions.

  • Experience in leading the team.


Nice to have:

  • Experience with a programming language such as Golang

  • Experienced with software development standard methodologies such as code management, CI/CD, and testing.

  • Knowledge of cryptography protocols, SSL/TLS, PKI, and network security.

  • University degree in Information Technology or similar.

  • Azure relevant qualification, preferably Architectural level.

  • DevSecOps experience.

  • PostgreSQL experience.