Job Description
Job Brief:
We are looking for a Senior DevOps Engineer to join a very small, high-impact DevOps team working closely with Product and Engineering (web developers, SREs, data scientists) and directly with the CTO. This is a hands-on role with real ownership. You will design, build, operate, and evolve production systems end-to-end. The environment is fast-paced, startup-driven, and intentionally lean. This role is for engineers who thrive in ambiguity, move fast, and are comfortable being the final technical and decision-making line.
Responsibilities:
- Design, build, and operate cloud infrastructure in production environments with full lifecycle ownership.
- Run and scale Kubernetes clusters in real-world, high-availability scenarios.
- Implement and maintain Infrastructure as Code using Terraform and Helm.
- Operate GitOps workflows as a core operating model, not as a best-effort practice.
- Own observability across logs, metrics, and tracing, including alerting strategy and reliability improvements.
- Take full ownership of incidents: diagnosis, recovery, root cause analysis, and long-term system redesign.
- Participate in a weekly on-call rotation as the final escalation point.
- Work hands-on with engineering teams, directly supporting application needs (not coordination-only).
- Read and understand application code (Go, Python, shell) to debug, optimize, and improve systems.
- Continuously improve system reliability, scalability, and developer experience.
- Make and own technical trade-offs in an environment with incomplete information and changing priorities.
Requirements & Skills:
- 5+ years of hands-on DevOps / Infrastructure experience in fast-paced startup or scale-up environments.
- Strong, real-world experience with public cloud (AWS, GCP, or Azure).
- Proven production experience operating Kubernetes.
- Deep understanding and daily use of Infrastructure as Code (Terraform).
- Practical experience using GitOps and Helm as operational mechanisms.
- Strong ownership of observability: monitoring, logging, tracing, and alerting.
- Experience running live systems under real load and improving reliability over time.
- Comfortable being on-call as an owner, not just an escalation handler.
- Ability to read and reason about application code (Go, Python, shell).
- Excellent communication skills and perfectly fluent English.
- Goal-oriented, accountable, and execution-focused.
- Humble mindset: open to feedback, willing to learn, and focused on outcomes.
- Strong user-centric mindset — pride in building reliable systems that enable great user experiences.
- Ability to work within ±4 CET time zones.
Your Next Challenge Awaits!
Ready to take your career to the next level? Submit your application and explore the impact you can make with us!