Site Reliability Engineer
You’re part of the Modzy engineering team that is dedicated to helping define, implement, and operate the technical infrastructure of a world-class artificial intelligence solution.
- You will provide an expert’s perspective on how to improve and maintain service level objectives (SLO) for systems that will deploy to a wide variety of infrastructure and handle missions critical to the federal government.
- You will implement processes, assessments, custom tools, and vendor appliances to help enable peak reliability, stability, and operational engagement for microservice development teams across the western hemisphere.
- You will act as a subject matter expert (SME) for implementing DevSecOps best practices while establishing a continuous delivery pipeline.
- You will assess performance and stability issues in production, lead blameless postmortem retrospectives, develop frameworks that collect and evaluate system telemetry, and write operational runbooks.
- A minimum of 5 years of professional full-stack development experience
- Production experience working with Kubernetes and major cloud providers, including AWS, Azure, or GCE
- Experience with continuous integration and delivery pipelines, including Jenkins, CircleCI, or TravisCI
- Experience with a scripting language, including Bash, Python, or Ruby
- Experience with corporate networking fundamentals and network security best practices
- Experience with service meshes
- Knowledge of how to set up and administer Splunk, ElasticSearch, or other log aggregation products
- Knowledge of how to set up and administer RDBMS products, including PostgreSQL
- Security Clearance is a huge plus!
- BS degree in CS or Computer Engineering