Senior software engineer with 15+ years experience, now focused on AI safety evaluation. Maintaining UK AISI's Inspect Evals framework. Background includes scaling a startup from 4 to 200 employees and 11 years as an Australian Army Engineering Officer.

Skills

AI Safety & Evaluation

  • Inspect AI framework
  • LLM evaluation design and review
  • Evaluation pipeline orchestration
  • Testing standards development
  • Vivaria

Web Development

  • Python
  • JavaScript
  • Django
  • React.js
  • FastAPI
  • SQL
  • HTML
  • CSS

ML & AI Tools

  • Agentic AI tools
  • LLM APIs
  • Jupyter
  • Pandas
  • NumPy
  • PyTorch
  • Hugging Face
  • Streamlit

Cloud & Infrastructure

  • AWS
  • Docker
  • CI/CD
  • PostgreSQL
  • Redis
  • Celery
  • Kubernetes

Monitoring & Testing

  • pytest
  • jest
  • TDD
  • New Relic
  • Sentry
  • CloudWatch
  • Papertrail

Engineering Practices

  • Code review
  • Technical leadership
  • Agile development
  • AI coding agents

Work Experience (3)

Maintaining the [Inspect Evals repository](https://github.com/UKGovernmentBEIS/inspect_evals/) on behalf of the UK AI Security Institute.
  • Reviewing and quality-assuring contributed AI evaluations for correctness, reliability, and reproducibility
  • Establishing testing standards and procedures for evaluation contributions
  • Collaborating with international AI safety researchers and engineers
  • Working with frontier LLMs including GPT, Claude and Gemini
  • Building the Inspect Evals Scoring system using AWS Batch to orchestrate bulk runs of AI evaluations
  • Contributing to the Inspect Evals Dashboard showcasing LLM performance across diverse evaluations
First employee at ed-tech startup; scaled company from founding to ~200 employees with platform used by the majority of Australian high schools.
  • Architected and deployed Edrolo.com.au from inception using Django, React.js, and AWS
  • Built video captioning system using OpenAI's Whisper model, saving ~$100,000 pa
  • Grew engineering team from sole developer to 12-person tech team
  • Mentored junior developers and established team workflows and standards
  • Developed internal tools for user enrolments, payments, and shipping
Served as an Engineering Officer in the Australian Army, 5 years full time and 6 years active reserve.
  • Led soldiers in maintenance of weapons, vehicles, and equipment
  • Deployed to East Timor as Technical Regulation Officer
  • Developed prototype applications to improve logistics workflows
  • Published research on survivable military information systems (Best Paper, MilCIS 2010)
  • Held a Secret security clearance when required, currently Baseline

Projects (3)

Inspect Evals
Jan 2025 - Current
 https://github.com/UKGovernmentBEIS/inspect_evals
  • AI Evaluation
  • Python
  • Code Review
A repository of community contributed LLM evaluations for Inspect AI. Created in collaboration by the UK AISI, Arcadia Impact, and the Vector Institute.
Inspect Evals Scoring
Jan 2025 - Apr 2025
 https://github.com/ArcadiaImpact/inspect_evals_scoring
  • AI Evaluation
  • AWS Batch
A system for orchestrating bulk runs of AI evaluations using AWS Batch, to provide the data for the Inspect Evals Dashboard.
Inspect Evals Dashboard
Jan 2025 - Apr 2025
 https://inspect-evals-dashboard.streamlit.app/
  • AI Evaluation
  • Dashboard
Showcases how well a diverse set of LLMs perform on the evaluations implemented in Inspect Evals.

Volunteer

1/1/2012 - 1/1/2019
Spokesperson and Company Secretary
Stasis Systems Australia
Health advocacy

Education (2)

2000 - 2004
 Bachelor of Engineering (Mechatronic)
University of Adelaide
Grade: First Class Honours
2000 - 2003
 Bachelor of Mathematical and Computer Sciences
University of Adelaide

Certificates

2023-11-01
XCS224N: Natural Language Processing with Deep Learning
Stanford Online
2023-09-30
AI Safety - Governance
BlueDot Impact
2023-10-15
AI Safety - Alignment
AI Safety Quest
2023-12-15
AI Safety - Alignment 201
AI Safety Australia & NZ

Publications

1 Nov 2010
Plan for the Worst: Steps Towards Survivable Networks in MilCIS 2010
Discusses making military logistic and administrative information systems more survivable and thus more usable in deployed environments. Awarded Best Paper at MilCIS 2010.