I am:
I research evaluations of AI systems. Evaluations are how we discover and characterise the space of intelligent behaviour in AI. I am concerned with building out a metrology for this evaluation practice (a scientific study of the measurement tools). So, you can think of my research as lens crafting, rather than astronomy—I build the “telescopes for intelligence” rather than the intelligent systems themselves.
My academic work addresses the space of questions like these:
Previously, I studied Philosophy (BAH) and Computer Science (BS) at Stanford University. I wrote a thesis about trust—what it is, philosophically, and what is required for a philosophically defensible account of trust in explainable AI literature. More recently, I completed my MSc at the Oxford Internet Institute. My MSc thesis addresses the more technical question of construct validity in large language model evaluations (and should be publicly available soon!)
I was also a Founder in Residence at Entrepreneurs First, worked at Monte Carlo Data as a Founding Data Scientist, did NLP research with Stanford OVAL, and wrote three chapters of the Data Quality Fundamentals textbook with O’Reilly Media.
You can email me by decrypting this Caesar Cipher: ipXe%b\Xiej7f``%fo%XZ%lb