Job Description
Senior Site Reliability Engineer - Serve as the subject matter expert (SME) for Dynatrace, responsible for configuring, optimizing, and managing Dynatrace monitoring solutions.
- Design and implement monitoring strategies using Dynatrace to ensure comprehensive visibility into system performance, availability, and reliability
- Collaborate with our Engineering & Platform teams to ensure our services, platforms and infrastructure are emitting the right metrics
- Lead the rollout and adoption of Observability practices, tools, and frameworks across teams and projects.
- Collaborate with Incident Management teams to resolve critical incidents, conduct post-incident reviews, and implement preventive measures.
- Communicate complex information clearly and concisely, to explain various business and technical information
- Proactively identify and mitigate potential issues, bottlenecks, and performance degradation to ensure system reliability and uptime
- Drive automation initiatives using tools like Ansible, Terraform, or Kubernetes to streamline deployment, configuration, and management of infrastructure.
- Conduct capacity planning assessments, analyze resource utilization trends, and forecast capacity requirements to support business growth and scalability.
Mandatory Requirements: - Undergraduate degree in Computer Science or STEM (Science, Technology, Engineering, Math) and a minimum of 6 years of equivalent work experience in IT.
- Of the 6 years, a minimum of 4 years’ relevant IT work experience with agile methodologies, Cloud and DevOps environments, analysis, and/or technical proficiency, networking, and knowledge of a breadth of tools and approaches to solve production issues
- Significant experience with observability/monitoring/logging/alerting tools at scale, like Dynatrace, Splunk, Datadog, AppDynamics, Grafana, Zabbix, Logstash, Kibana, Prometheus
- Working knowledge of infrastructure configuration management and automation tools such as Chef, Puppet, Salt, Ansible, and Terraform.
- Working knowledge of Microsoft Visual Studio Team System and/or similar continuous integration and continuous deployment technologies such as Team Foundation Servicer, Jenkins CI, Github, and Artifactory, specifically related to software build, unit testing and deployment.
- Working knowledge of Microsoft ARM templates and JSON scripting for automated deployments.
- Working knowledge of Microsoft ARM IaaS and PaaS architectures.
- Working knowledge of API architecture and hybrid cloud integration patterns.
- Working knowledge of networking protocols and technologies such as routing, DNS, network peering.
- Working knowledge of developing and monitoring SLOs and SLAs.
- Experience with Microsoft Stack with technologies and frameworks like .Net, C#, JavaScript, SQL, JQuery, Angular 6 and up, HTML, CSS, XUnit, NUnit, Entity Framework, TDD, Regis Cache. Exposure to microservices is preferred.
Senior Site Reliability Engineer Assignment Length 12 months
Senior Site Reliability Engineer Assignment Location Richmond, BC
Job Tags
Work experience placement,