Senior SRE Azure
We're Hiring | Senior SRE ? Azure
Colombia | Remote Position
Compensation: To Be Discussed
Permanent Contract
Collaboration with teams across Canada and the United States
We are looking for a Senior Site Reliability Engineer (SRE) who is passionate about reliability, observability, and operational excellence in cloud-native environments.
What You'll Do
Design and maintain end-to-end observability platforms, including metrics, logs, traces, alerts, and dashboards.
Define and manage SLIs, SLOs, and Error Budgets for critical services.
Lead incident management processes, escalations, and Root Cause Analysis (RCA).
Administer monitoring and incident management tools such as Grafana, Prometheus, Azure Monitor, Elasticsearch, and OpsGenie.
Drive automation initiatives using Python, Bash, and/or PowerShell.
Lead capacity planning, performance optimization, and continuous improvement initiatives.
Requirements
Strong experience in observability, monitoring, and reliability of distributed systems.
Hands-on experience with Azure (Commercial and Government) and Kubernetes (AKS).
Strong expertise with Grafana, Prometheus, Elasticsearch, Azure Monitor, and APM tools.
Experience leading critical incidents and RCA processes.
Experience with automation and scripting.
Advanced English proficiency (C1-C2) is required.
Availability to work aligned with PST business hours.
Benefits
Private Health Insurance (Sura Clásico)
Life Insurance
Connectivity Allowance
Microsoft Azure Certifications
Access to Learning Platforms
Flexible Schedule and Remote Work
2 Additional Days Off for Your Birthday (or a Family Member?s Birthday)
Employee Referral Bonus
Opportunity for Relocation to Canada
If you have experience in cloud-native environments, Kubernetes, Azure, and advanced SRE practices, we'd love to hear from you.