Senior Staff Software Engineer, Agentic Data Tooling, DeepMind
- Sunnyvale, California
- Full Time
Minimum Qualifications
- Bachelor's degree in Computer Science, IT, a related field, or equivalent practical experience.
- 8 years of experience with software development in one or more programming languages (e.g., Python, C, C++, Java, JavaScript).
Preferred Qualifications
- Bachelor's or Master's degree in Computer Science, Data Science, or a related field.
- 5 years of experience in systems design, product management, or software engineering roles.
At Google DeepMind our mission is to build the world's first general-purpose learning agent. Central to this mission is the complex task of measuring the intelligence of our prototypes. As a Software Engineer, you will be working with the cutting edge AI agents developed by our exceptional team of Machine Learning and Neuroscience research scientists. Your responsibilities will include everything from creating systems for agent testing using 2D and 3D games to developing test problems within physics simulators. You will create graphical visualization of results, build competitive agent leaderboards and test new algorithms on robots. To succeed in this role you will need to have a strong foundation in software engineering and enjoy working on a wide range of challenging problems within a mission-driven team.
Shape the future of AI by building the core infrastructure and tooling that powers Gemini's agentic capabilities. Through advanced data curation and creation, you will drive the development of next-generation evaluation frameworks:
SmithBench: Our gold-standard benchmark testing whether AI agents can autonomously execute complex, end-to-end, first-party Google engineering workflows (such as CL lifecycles, bug investigations, and pipeline orchestration).
RE-Bench: Our benchmark designed to measure agent performance on highly complex, long-horizon research engineering tasks.
Whether you are designing environments to record programmatic agent-API interactions or building computer-control systems to capture real-time human and model trajectories, your tooling will deliver the foundational training and evaluation data that accelerates AI capabilities across models, harnesses, and skills.
Artificial intelligence will be one of humanitys most transformative inventions. At Google DeepMind, we are a pioneering AI lab with exceptional interdisciplinary teams focused on advancing AI development to solve complex global challenges and accelerate high-quality product innovation for billions of users. We use our technologies for widespread public benefit and scientific discovery, ensuring safety and ethics are always our highest priority.
We are pushing the boundaries across multiple domains. Our global teams offer diverse learning opportunities and varied career pathways for those driven to achieve exceptional results through collective effort.The US base salary range for this full-time position is $262,000-$365,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google .
Responsibilities
- Design and create novel data tooling to accelerate Gemini model evaluation, training, and hill climbing to improve agentic capabilities.
- Facilitate ingestion and creation of corpora representing complex worlds, and record human, agentic, and hybrid trajectories through the Reinforcement Learning (RL) environments.
- Build scalable data collection pipelines bridging capturing multi-turn, tool-using agent interactions and enabling rapid iteration on environment complexity and reward design.
- Create human-in-the-loop annotation and trajectory review tooling, analytics dashboards, and agentic orchestration frameworks to continuously generate, curate, and validate high-signal training corpora at scale.