I test, compare, and score AI systems with structured precision — turning messy model outputs into clear, actionable quality signals that make AI products smarter and safer.
I’m Olatunji Habeeblahi O., an AI Model Evaluation Engineer and Automation Specialist based in Lagos, Nigeria. My path into AI didn’t begin in a lab — it started with language.
I began my career as a Technical Writer, learning how to translate complex systems into clear, structured communication. That discipline for precision — knowing exactly what something does and why it matters — turned out to be the perfect foundation for everything that followed.
From writing, I moved into AI Automation, building intelligent, self-running workflows in n8n, Zapier, and Make.com. I was designing systems that didn’t just do tasks — they made decisions, routed content, and scaled without adding manual overhead. That work deepened my curiosity about the intelligence behind the tools themselves.
That curiosity led me into Prompt Engineering — studying how language shapes model behaviour, what makes a prompt fail, and how small wording changes can produce entirely different outputs. I began testing systematically, not just intuitively.
Today, I work as a full-on AI Evaluation Specialist. I test AI systems with structured rubrics, run head-to-head agent comparisons, identify bias and failure modes, and produce the kind of actionable feedback that engineering teams can actually use. I am adept at independent remote work, rapid guideline adaptation, and delivering insights that improve AI system reliability and user experience.
Evaluated AI-driven automation workflows for logical accuracy, consistency, and reliability. Tested system outputs against defined requirements and edge cases. Analyzed AI-generated responses to identify errors, inconsistencies, and areas for improvement. Produced structured, actionable feedback to improve model behaviour and workflow quality. Designed and executed QA test cases and documented evaluation criteria and outcomes.
Evaluated operational processes and decision outcomes to drive efficiency improvements. Analyzed market and client data to guide strategic business decisions. Managed documentation, reporting, and stakeholder communication. Applied structured judgment in high-ambiguity scenarios — a discipline that translates directly into AI evaluation work.
Structured head-to-head evaluation of GPT-4, Claude, and Gemini across reasoning, accuracy, and consistency.
View Case Study →Designed a multi-dimensional rubric to evaluate LLM output quality — adapted from industry standards and original design.
View Case Study →End-to-end n8n pipeline: RSS ingestion → AI scoring → quality filtering → rewriting → auto-publishing to X and LinkedIn.
View Case Study →A degree built on empirical observation, data analysis, and systematic documentation — skills that transfer directly into structured AI evaluation work.
Available for evaluation contracts, AI QA consulting, and freelance automation projects. Remote-first, professional, and ready to deliver structured results from day one.
Message on WhatsApp