Company Description
Building the bank of tomorrow takes more than skills.
It means combining our differences to imagine, discuss, code, develop, test, learn… and celebrate every step together. Share our vibes? Join Swissquote to unleash your potential.
We are the Swiss Leader in Online Banking and we provide trading, investing and banking services to +650’000 clients, through our performant and secured digital platforms.
Our +1200 employees work in a flexible way, without dress code and in multicultural teams.
By having a huge impact on the industry, they are growing their skills portfolio and boosting their career in a fast-pace environment. Have a look behind the scenes by checking Humans of Swissquote on Instagram.
We are all in at Swissquote. As an equal opportunity employer, we welcome candidates from all backgrounds, experiences and perspectives to join our team and contribute to our shared success.
Are you all in? Don’t be shy, apply!
Job Description
You will join the IT Department’s IT Platform Operations team, whose role is to operate the layer between raw infrastructure and the bank’s corporate-facing services: the application-tier middleware fabric, the Kubernetes control plane, and the user-facing surface of the bank’s Sovereign AI Platform.
The ideal candidate will possess deep expertise in operating Kubernetes-native platform engineering systems at scale, and will lead the integration of open-source AI tooling within a regulated corporate environment while ensuring large language model (LLM) inference scales. Your expertise will help your team deliver the platform on which the bank provides governed access to internal and external AI capabilities — distributed inference, agentic workflows, notebooks, and chatbots — built on top of the GPU and serving substrate provided by the Systems & Storage teams.
With your team, you will work closely with IT Architects, Observability & Performance Analysts, the Cybersecurity function and the Systems teams to plan and execute the department’s long-term objective of a sovereign AI capability that runs under the bank’s own governance — data sovereignty, content safety, prompt-injection defenses, agentic-workflow audit, and cost control on external API spend — and that is AI Act- and DORA-ready by design.
-
Design, deploy and operate distributed LLM inference (LLM-d) on Kubernetes — sizing for throughput, tail latency and GPU utilisation against the serving substrate provided by IT Systems Services (ITSS).
-
Operate and harden the user-facing AI surface: the Open WebUI cross-department chatbot, JupyterHub notebooks for data scientists, and the agent catalog (agentregistry).
-
Build and operate Agentgateway as the governed routing layer to external providers (Anthropic Claude API, OpenAI GPT API), enforcing traffic policy, rate limiting, cost controls and audit logging.
-
Implement content-safety, prompt-injection defense and agentic-workflow audit controls, plus the agent-identity model required for EU AI Act and DORA compliance.
-
Operate the Kubernetes control plane — etcd, API server, scheduler and controller-manager — with HA sizing and surge-upgrade discipline; contribute to multi-cluster management for the meshed cross-cluster pattern.
-
Define SLOs and instrument the platform for performance and availability; lead incident response across the AI platform and control-plane critical path.
-
Automate platform provisioning and configuration through Infrastructure as Code and governed automation (AAP), keeping every deployment repeatable, reviewable and auditable.
-
Develop and maintain architecture documentation and operational runbooks, and participate in the 24×7 on-call rotation.
Qualifications
Minimum Qualifications
-
7+ years of experience in infrastructure or platform engineering, with at least 3 years operating production Kubernetes and/or machine-learning serving workloads at scale.
-
Proven experience managing complex, mission-critical IT environments and contributing to large-scale platform projects.
-
Experience in regulated or high-assurance industries such as banking, telco, aviation, pharmaceutics or government.
-
Strong understanding of Kubernetes internals, container runtimes, distributed systems, networking and cloud-native security.
-
Excellent interpersonal skills, capable of working with multi-functional technical and business teams, along with different levels of management to influence decision making.
Preferred Qualifications
-
Hands-on experience with LLM-d or comparable distributed inference / model-serving frameworks (e.g. vLLM, TGI, NVIDIA Triton, Ray Serve, KServe).
-
Experience operating JupyterHub, Open WebUI, or similar multi-tenant notebook and chatbot platforms.
-
Familiarity with Kubernetes-native agentic frameworks (e.g. kagent), AI traffic-routing / gateway layers (e.g. Agentgateway), and agent-registry / catalog patterns.
-
Experience integrating and governing external LLM providers (Anthropic Claude, OpenAI GPT) — routing, rate limiting, cost control and audit.
-
Proficiency in one or more of the following languages: Python, Go, Rust, Java, C++.
-
Comfortable with Infrastructure as Code and governed automation tooling (Ansible / AAP, Terraform, etc.); familiarity with event streaming (Apache Kafka) and observability stacks.
Additional Information
SQ2