Site Reliability Lead
The Site Reliability Lead is an IT professional responsible for the maintenance, deployment, configuration, and resource optimization of the company's SaaS solution in a hybrid cloud environment across a global footprint.
You will act as a subject matter expert in multi-cloud technologies, take end-to-end responsibility for your engineered results, automate everything to scale our cloud services, and work on all layers (infrastructure as code, automation, monitoring) using modern solutions and software technologies.
Provide technical leadership to a team of Site Reliability Engineers through guidance and support to ensure that they are meeting their goals and objectives.
Deploy, monitor, and manage the company's cloud solutions.
Be a key member of a team of dedicated SREs responsible for software engineering and operations.
Maintain virtual IT infrastructure including VMs and containers.
Optimize resource utilization to manage costs along with capacity planning and cost estimations.
Implement different automated and scripted solutions to improve our systems or to satisfy different company and business requirements.
Produce operational and technical documentation, such as work instructions, design documents, scripts, and operation manuals.
Ensure that the company services are running reliably without downtime and interruptions meeting SLA guidelines.
Identify and solve problems in distributed environments across configurations, operating systems, and networks.
Work with stakeholders in other departments to ensure that solutions are delivered on time
Leadership and organizational skills
Hands-on experience handling distributed environments with a keen interest in improving operational efficiency through automation.
Advanced experience in Linux administration in production environments, load balancing, system monitoring.
Experience with configuration management and IAAC tools (Terraform, Consul, Vault, Packer, Gitlab CI).
Proven capability with Cloud providers like Azure.
Experience with virtualization, containers, and service orchestration (KVM, Docker, Nomad, K8S, etc).
Knowledge about network protocols and experience administering networking infrastructure including VPNs, firewalls, etc.
Experience collaborating with multi-functional global and remote teams with a diverse set of backgrounds.
Excellent communication in written and spoken English.
At least 3-5 years of professional experience with relevant technologies in similar positions.
Experience in leading the team.
Nice to have:
Experience with a programming language such as Golang
Experienced with software development standard methodologies such as code management, CI/CD, and testing.
Knowledge of cryptography protocols, SSL/TLS, PKI, and network security.
University degree in Information Technology or similar.
Azure relevant qualification, preferably Architectural level.