/

Senior Site Reliability Engineer

Worldwide, Remote

The Senior Reliability Engineer is an IT professional responsible for the maintenance, deployment, configuration, and resource optimization of the company's SaaS solution in a hybrid cloud environment across a global footprint. The Senior Reliability Engineer is responsible for identifying and managing asset reliability risks that can cause a stoppage and delay of plant or business operations. The responsibility includes monitoring production losses and high-maintenance cost assets and finding ways to reduce them.

You will act as a subject matter expert in multi-cloud technologies, take end-to-end responsibility for your engineered results, automate everything to scale our cloud services, and work on all layers (infrastructure as code, automation, monitoring) using modern solutions and software technologies. 

Responsibilities: 

  • Deploy, monitor, and manage the company's cloud solutions.

  • Be a key member of the team of dedicated SREs responsible for software engineering and operations.

  • Maintain virtual IT infrastructure including VMs and containers.

  • Optimize resource utilization to manage costs along with capacity planning and cost estimations.

  • Implement different automated and scripted solutions to improve our systems or to satisfy different company and business requirements.

  • Produce operational and technical documentation, such as work instructions, design documents, scripts, and operation manuals.

  • Ensure that the company services are running reliably without downtime and interruptions meeting SLA guidelines.

  • Identify and solve problems in distributed environments across configurations, operating systems, and networks.

  • Work with stakeholders in other departments to ensure that solutions are delivered on time.

  • Mentor team members who are less experienced.

Requirements:

  • Hands-on experience handling distributed environments with a keen interest in improving operational efficiency through automation.

  • Advanced experience in Linux administration in production environments, load balancing, and system monitoring.

  • Experience with configuration management and IAAC tools (Ansible, Terraform,  Consul, Vault, Packer, Gitlab CI).

  • Proven capability with Cloud providers (AWS, Azure, GCP).

  • Experience with virtualization, containers, and service orchestration (KVM, Docker, Nomad, K8S, etc).

  • Knowledge about network protocols and experience administering networking infrastructure including VPNs, firewalls, etc.

  • Experience collaborating with multi-functional global and remote teams with a diverse set of backgrounds.

  • Excellent communication in written and spoken English.

  • At least 6+ years of professional experience with relevant technologies in similar positions.

Nice to have:

  • Experience with a programming language such as Rust, Golang, Python, and C++.

  • Experienced with software development standard methodologies such as code management, CI/CD, and testing.

  • Knowledge of cryptography protocols, SSL/TLS, PKI, and network security.

  • Experience with administering database management systems.

  • University degree in Information Technology or similar.

  • Azure, AWS, or Google certification or any relevant qualification