Ben Hull

A results-driven Software Developer and Security Consultant with expertise in machine learning, API development, and cybersecurity. Combines strong development experience in Python with deep expertise in transformer architectures and neural networks. A published researcher with experience implementing robust machine learning solutions, from financial fraud detection to large language model security. Brings a unique analytical perspective from a Mathematics and Physics background, with a proven ability to transform complex theoretical concepts into practical solutions.

Publications

Domain-specific prompt injection detection

Benjamin Hull and Donato Capitella

Unlike traditional injection attacks, such as SQL injection, where deterministic solutions exist, prompt injection in LLMs operates within the realm of natural language, where there is no clear separation between instructions and data. This makes it challenging to address the issue directly. Instead, the solutions we outlined and that are currently used in the industry rely on approaches that treat the LLM and its outputs as untrusted. These include implementing external authorization controls to limit the scope of actions LLM agents can perform using tools/plugins, sanitizing outputs to remove potentially harmful content, and employing human-in-the-loop oversight to ensure that all actions taken by the LLM are explicitly approved by a human operator. Additionally, we emphasized the importance of sanitizing input in two ways: (1) by reducing the accepted character sets to thwart attackers' attempts to embed malicious instructions, and (2) the detection of potential adversarial prompts by leveraging machine learning models trained to identify signs of injection attempts.

Domain-specific prompt injection detection, WithSecure Labs, April 2024

Using residual heat maps to visualise Benford's multi-digit law

Benjamin Hull, Alexander Long and Ifan G Hughes

Benford's law, established over a century ago, reveals that the occurrence of the first significant digit in large numerical datasets follows a nonuniform distribution. This counterintuitive nature is useful in forensic accounting and detecting financial fraud. A recent investigation on house price data in England and Wales pre and post-2014 shows two distinct patterns of human intervention: selling property just below tax thresholds and psychological pricing with a bias towards final digits being 0 or 5. The analysis indicates that the 2014 legislative change to soften tax thresholds significantly impacted house price data.

Benjamin Hull et al 2022 Eur. J. Phys. 43 015803

Research Projects

Beyond Evaluation: Learning Contextual Chess Position Representations

Personal Research

This paper presents ChessLM, a novel Transformer model inspired by self-supervised learning in NLP, designed to create rich vector representations (embeddings) of chess positions. Trained on a large game corpus using tasks like predicting masked pieces and move differences, the model learns to capture high-level thematic similarities such as pawn structures and king safety across different game stages. While analysis indicates limitations for direct position evaluation, the learned embeddings are effective for retrieving similar positions, suggesting applications like intelligent puzzle generation and opening new research directions for chess representation learning beyond traditional evaluation.

View Research Paper GitHub Repository

Language Translation using the Transformer Architecture

Personal Research

Implements a translation model using the Transformer architecture, based on the groundbreaking paper "Attention is All You Need" (Vaswani et al., 2017). The implementation focuses on English-to-French translation whilst offering a simple to understand implementation of the architecture in PyTorch.

Colab Notebook GitHub Repository

An Implementation of GPT using Pytorch

Personal Research

A from-scratch implementation of the GPT (Generative Pre-trained Transformer) architecture using PyTorch. The implementation focuses on understanding the core components of the transformer architecture and its application to language modeling.

Colab Notebook GitHub Repository

Investment Strategy using Machine Learning and Technical Indicators

Personal Research

The following paper aims to introduce some basic machine learning models to identify buying and selling conditions for financial assets. In particular, the S&P500 index will be considered, with technical indicator features being extracted from historical price data. We will also consider techniques to identify and address overfitting, a condition where the model fails to generalise well to new data. We will then optimise the resultant model by tuning its hyperparameters to better fit the data. Our final model has an accuracy of 68% and could be used as part of a investment strategy to identify buying and selling conditions in stock indexes.

View Research Paper

Can Benford's law be used to detect financial fraud?

Durham University Physics Level 4 Project

Benford's law (BL) describes the probability of a given digit occurring at a position (index) in a number. By analysing financial statements and other documents and measuring conformity with this law, it is could be possible to detect financial fraud. This project focuses on house price data, techniques used to measure conformity and SEC filling to determine how this law could be used to detect financial fraud.

View Research Paper