At Apple, we develop revolutionary technologies for the products that will define how we communicate in the future. The Zurich Vision Lab is an R&D team based in Zürich; we have shipped features like Persona, Animoji, Portrait Mode, and FaceTime Eye Contact, doing cutting-edge research while consistently shipping products. We collect and work with large datasets, and we build the infrastructure behind them. We are looking for a hands-on senior data engineer to own the data-management foundation for our machine-learning and feature-development work: the storage, pipelines, and quality controls that serve our internal customers so we can build amazing new products together.
Description
You will solve real, Apple-scale challenges, leading development of the internal-facing data infrastructure that enables the next generation of machine-learning and computer-vision projects: running data pipelines at scale in cloud environments. This is a hands-on, end-to-end role at the intersection of data engineering, DevOps, and machine learning, in that order.","responsibilities":"Design and operate the managed storage for large-scale datasets and metadata, with versioning that lets consumers pin, reproduce, and roll back with confidence.
Build and automate the ingestion, transformation, and publishing pipelines that move data through its full lifecycle reliably and at scale, and monitor them in production.
Establish managed data quality: validation, lineage, and clear governance. So teams can trust the data they build on.
Provide the tooling and interfaces that make datasets easy to discover, assemble, and consume across our machine-learning and feature-development processes.
Support data-collection and synthetic-data-generation pipelines that bring new data into the system and scale our training data.
Partner directly with the researchers and engineers who depend on this data, with a service mindset, automating toil rather than accepting it.
Preferred Qualifications
Experience running data pipelines and distributed compute at scale with tools such as Dagster, Airflow, Ray, Prefect, Temporal, DBOS etc.
Proficiency with cloud deployments: AWS, GCP, Kubernetes, Pulumi, etc.
Exposure to MLOps: developing, deploying, and monitoring ML systems, with dataset and model versioning.
Familiarity with dataframe engines such as Pandas, Polars, Daft, or Spark.
Experience building tools, platforms, or SDKs that other engineers rely on; computer vision or computer graphics experience is a plus.
Minimum Qualifications
Experience with distributed system design and automation, and strong software engineering fundamentals.
A track record of architecting, implementing, and operating production data pipelines end to end.
Strong SQL across engines such as Postgres, Trino, or SparkSQL, and working knowledge of columnar and lakehouse storage formats such as Parquet, Iceberg, or Delta.
A demonstrated bias toward improving the process: automating toil and building tooling rather than settling for the status quo.
Great interpersonal skills, a self-driven and customer-oriented attitude, and strong communication skills in English.