Seldon is looking for a Site Reliability Engineer (SRE) to join our team. We are focused on making it easy for machine learning models to be deployed and managed at scale in production. We provide Cloud Native products that run on top of Kubernetes and are open-core with several successful open source projects including Seldon Core, Alibi:Explain and Alibi:Detect. We also contribute to open source projects under the Kubeflow umbrella including KFServing.
About the role
- Build and run services on cloud Infrastructure.
- Observe and monitor services to ensure reliability.
- Setup and maintain CI/CD pipelines.
- Develop and improve the reliability process across the engineering team.
- A degree or higher level academic background in a scientific or engineering subject.
- Working with Linux and the Unix Shell
- At least 2 years of experience in industry or academia showing completed projects.
Core skills (The role will be focused on these skills so we would expect existing experience or a demonstrable desire to learn these)
- Familiarity with Cloud Infrastructure (GCP, AWS).
- Implementing "Infrastructure as Code" (Terraform or similar technologies)
- Managing Kubernetes and Docker environments.
- Knowledge of monitoring and alerting technologies.
- Knowledge of CI/CD systems.
- Strong programming skills (Golang, Pyhton, Bash).
- Interest in using and contributing to Open Source tools.
- Experience with maintaining / deploying machine learning models in production.
Share options to align you with the long-term success of the company.
Exciting phase of fast-paced start-up challenges with an ambitious team and unlimited potential for professional growth.
Access to discounted lunches, gyms, shopping and cinema tickets.
Flexible work-from-home policy.
Cycle To Work Scheme.
Our interview process is normally a phone interview, a coding task, and 2-3 hours of final interview (carried out virtually). We promise not to ask you any brain teasers or trick questions. We might design a system together on a whiteboard, the same way we often work together, but we won’t make you write code on one. Our recruitment process has an average length of 3 weeks.