We're a team at Apple building software that helps shape the next generation of Siri and AI-powered experiences. The work spans frameworks, tooling, and infrastructure — including a strong focus on how we evaluate and measure the quality of what we ship. We can't say much about specifics, but the problems are new, the surface area is large, and the reach is enormous. We're a collaborative, humble, and curious group that learns from each other and builds together.

Description

You'll work alongside engineers, designers, and researchers to design and build software end-to-end — from early prototypes to production systems running on real devices. You'll have meaningful autonomy in how you get there, and the opportunity to shape both what we build and how we know it's working. The work is hard enough to stretch you, and the team is generous enough to support you while you grow.

Minimum Qualifications

3+ years of software engineering experience with strong CS fundamentals Proficiency in Swift, Objective-C, Python, or another modern language — strong engineers in adjacent stacks will pick up the rest You've shipped software that people used, and you're ready to own bigger pieces end-to-end Expert in using generative AI models for coding — you've integrated tools like Claude, Cursor, or Codex deeply into how you work, and have a point of view on where they help and where they don't An interest in software evaluation and quality — you care about whether what you build actually works, and want to be on a team that takes measurement seriously Comfortable with ambiguity; when you're stuck, you dig in Strong communication and a track record of working well across teams BS in Computer Science or equivalent experience

Preferred Qualifications

Experience in one or more iOS/macOS domains: system services, UI frameworks, concurrent application architecture, or performance Background building developer tools, test infrastructure, evaluation systems, or data pipelines Familiarity with how AI systems are evaluated — offline eval, human eval, A/B, or model-graded approaches Proficiency with one or more scripting languages (Python, Ruby, Bash) You seek out feedback and learn fast from those around you Close to the frontier — curious about new models and techniques, and have a point of view on where human-AI interaction is headed

Software Engineer, Agentic Evaluation

About the Role

Skills

Full job details

Description

Minimum Qualifications

Preferred Qualifications

Not the right fit?