A
Software Engineer, Agentic Evaluation
Apple
- Location
- Onsite (Cupertino, California)
- Employment
- Full-time
- Level
- Mid Level
Posted 1 week ago
About the Role
Join Apple's team to build the next generation of Siri and AI experiences. You will design and develop end-to-end software, focusing on frameworks, tooling, and infrastructure for evaluating AI-powered features.
Skills
Swift
Objective-C
Python
Generative AI
Software Evaluation
iOS Development
macOS Development
Test Infrastructure
Data Pipelines
Concurrent Application Architecture
System Services
UI Frameworks
A/B Testing
Model-Graded Evaluation
Scripting
Software Engineering
Full job details
We're a team at Apple building software that helps shape the next generation of Siri and AI-powered experiences. The work spans frameworks, tooling, and infrastructure — including a strong focus on how we evaluate and measure the quality of what we ship. We can't say much about specifics, but the problems are new, the surface area is large, and the reach is enormous. We're a collaborative, humble, and curious group that learns from each other and builds together.
You'll work alongside engineers, designers, and researchers to design and build software end-to-end — from early prototypes to production systems running on real devices. You'll have meaningful autonomy in how you get there, and the opportunity to shape both what we build and how we know it's working. The work is hard enough to stretch you, and the team is generous enough to support you while you grow.
3+ years of software engineering experience with strong CS fundamentals Proficiency in Swift, Objective-C, Python, or another modern language — strong engineers in adjacent stacks will pick up the rest You've shipped software that people used, and you're ready to own bigger pieces end-to-end Expert in using generative AI models for coding — you've integrated tools like Claude, Cursor, or Codex deeply into how you work, and have a point of view on where they help and where they don't An interest in software evaluation and quality — you care about whether what you build actually works, and want to be on a team that takes measurement seriously Comfortable with ambiguity; when you're stuck, you dig in Strong communication and a track record of working well across teams BS in Computer Science or equivalent experience
Experience in one or more iOS/macOS domains: system services, UI frameworks, concurrent application architecture, or performance Background building developer tools, test infrastructure, evaluation systems, or data pipelines Familiarity with how AI systems are evaluated — offline eval, human eval, A/B, or model-graded approaches Proficiency with one or more scripting languages (Python, Ruby, Bash) You seek out feedback and learn fast from those around you Close to the frontier — curious about new models and techniques, and have a point of view on where human-AI interaction is headed
Description
You'll work alongside engineers, designers, and researchers to design and build software end-to-end — from early prototypes to production systems running on real devices. You'll have meaningful autonomy in how you get there, and the opportunity to shape both what we build and how we know it's working. The work is hard enough to stretch you, and the team is generous enough to support you while you grow.
Minimum Qualifications
3+ years of software engineering experience with strong CS fundamentals Proficiency in Swift, Objective-C, Python, or another modern language — strong engineers in adjacent stacks will pick up the rest You've shipped software that people used, and you're ready to own bigger pieces end-to-end Expert in using generative AI models for coding — you've integrated tools like Claude, Cursor, or Codex deeply into how you work, and have a point of view on where they help and where they don't An interest in software evaluation and quality — you care about whether what you build actually works, and want to be on a team that takes measurement seriously Comfortable with ambiguity; when you're stuck, you dig in Strong communication and a track record of working well across teams BS in Computer Science or equivalent experience
Preferred Qualifications
Experience in one or more iOS/macOS domains: system services, UI frameworks, concurrent application architecture, or performance Background building developer tools, test infrastructure, evaluation systems, or data pipelines Familiarity with how AI systems are evaluated — offline eval, human eval, A/B, or model-graded approaches Proficiency with one or more scripting languages (Python, Ruby, Bash) You seek out feedback and learn fast from those around you Close to the frontier — curious about new models and techniques, and have a point of view on where human-AI interaction is headed
Not the right fit?
Browse all Agentic AI roles.