Skip to content
Skip to content
AI Engineer Jobs
A

Full Stack AI Native Claude Engineer

Apple

Location
Onsite (San Diego, California)
Employment
Full-time
Level
Senior Level
Posted 1 week ago

About the Role

Apple's Stability Engineering team is seeking a senior engineer to maintain and evolve large, AI-augmented systems that process crash reports into actionable insights. You will work on critical platforms and services that drive operating system quality across Apple's hardware portfolio.

Skills

Distributed Systems LLM Integration Full-Stack Development Production Engineering AI-Assisted Development Observability Capacity Planning Incident Response Data Pipelines Symbolication Ruby on Rails Node.js TypeScript Python ETL System Scaling

Full job details

Are you a senior engineer who can keep large, AI-augmented systems running reliably at Apple scale? Apple's Stability Engineering team is looking for a seasoned engineer to join our Core team in San Diego. We build and operate the platforms, services, and infrastructure that turn crash reports from Apple devices into actionable engineering insights. You'll work on systems where LLMs and agents are already part of the production fabric — evolving them, hardening them, and using AI tools to extend what a small team can deliver.

Description


Our team owns the end-to-end platform behind stability analysis at Apple: symbolication of crash logs across the company's hardware portfolio, the data pipelines that aggregate and cluster crash logs, and the applications and services that engineers across Apple use every day to drive operating-system quality. This role is about keeping that platform healthy, extending it deliberately, and making the engineering team itself more effective by using AI tools well. Day to day, you'll spend most of your time on the engineering work of running real systems: tuning evaluation infrastructure, tightening operational controls, improving auditability and debug trails, and scaling the workflows our analysts rely on. When new capabilities are needed, you'll prototype and integrate them into the platform. You'll partner closely with stability analysts who are domain experts in OS reliability, and with the broader team responsible for symbolication, ETL, and service infrastructure. You'll also be expected to use AI-assisted development tools fluently to investigate issues, refactor at scale, and ship more with a small team. We're looking for someone with the rigor of a seasoned production engineer who is also comfortable operating systems that include LLMs and agents as first-class components. If you enjoy taking responsibility for a complex, already-running platform and making it steadily better, we want to talk.

Minimum Qualifications


5+ years of professional software engineering experience building and operating production systems BS in Computer Science or a related field, or equivalent practical experience Fluent use of AI-assisted development tools (coding agents, code review assistants, etc.) to work effectively at scale Demonstrated experience designing and scaling distributed systems (load balancing, active-active topologies, capacity planning, throughput-bound services) Track record of maintaining and evolving production services — observability, operational controls, incident response, and steady iteration on existing systems Strong full-stack instincts; comfortable spanning data infrastructure, backend services, and the user-facing surfaces that consume them Proven ability to operate independently on ambiguous, open-ended problems where the right answer is not obvious

Preferred Qualifications


Experience operating LLM- or agent-based features in production environments over time Experience building or maintaining evaluation harnesses, audit trails, or replay infrastructure for AI systems Background in developer tools, observability, crash/stability analysis, or other operating-system-quality domains Familiarity with one or more of: Ruby on Rails, Node.js/TypeScript, Python for production services Experience working in environments with significant deferred scalability work (capacity-constrained, long-lead-time infrastructure)