The Senior Reliability Engineer is an IT professional responsible for the maintenance, deployment, configuration, and resource optimization of the company's SaaS solution in a hybrid cloud environment across a global footprint. The Senior Reliability Engineer is responsible for identifying and managing asset reliability risks that can cause a stoppage and delay of plant or business operations. The responsibility includes monitoring production losses and high-maintenance cost assets and finding ways to reduce them.
You will act as a subject matter expert in multi-cloud technologies, take end-to-end responsibility for your engineered results, automate everything to scale our cloud services, and work on all layers (infrastructure as code, automation, monitoring) using modern solutions and software technologies.
Responsibilities:
Deploy, monitor, and manage the company's cloud solutions.
Be a key member of the team of dedicated SREs responsible for software engineering and operations.
Maintain virtual IT infrastructure including VMs and containers.
Optimize resource utilization to manage costs along with capacity planning and cost estimations.
Implement different automated and scripted solutions to improve our systems or to satisfy different company and business requirements.
Produce operational and technical documentation, such as work instructions, design documents, scripts, and operation manuals.
Ensure that the company services are running reliably without downtime and interruptions meeting SLA guidelines.
Identify and solve problems in distributed environments across configurations, operating systems, and networks.
Work with stakeholders in other departments to ensure that solutions are delivered on time.
Mentor team members who are less experienced.
Requirements:
Hands-on experience handling distributed environments with a keen interest in improving operational efficiency through automation.
Advanced experience in Linux administration in production environments, load balancing, and system monitoring.
Experience with configuration management and IAAC tools (Ansible, Terraform, Consul, Vault, Packer, Gitlab CI).
Proven capability with Cloud providers (AWS, Azure, GCP).
Experience with virtualization, containers, and service orchestration (KVM, Docker, Nomad, K8S, etc).
Knowledge about network protocols and experience administering networking infrastructure including VPNs, firewalls, etc.
Experience collaborating with multi-functional global and remote teams with a diverse set of backgrounds.
Excellent communication in written and spoken English.
At least 6+ years of professional experience with relevant technologies in similar positions.
Nice to have:
Experience with a programming language such as Rust, Golang, Python, and C++.
Experienced with software development standard methodologies such as code management, CI/CD, and testing.
Knowledge of cryptography protocols, SSL/TLS, PKI, and network security.
Experience with administering database management systems.
University degree in Information Technology or similar.
Azure, AWS, or Google certification or any relevant qualification