Ben Hull


Software engineer focusing on backend development. Previously worked as a Cyber Security consultant focusing on offensive security. I have a masters degree in Mathematics and Physics from the University of Durham and hold the OSCP and CRT Cyber Security certifications. My research interests span machine learning, particuarly on financial datasets, deep learning, natural language processing and Cyber.

Publications

Domain-specific prompt injection detection

Benjamin Hull and Donato Capitella

Unlike traditional injection attacks, such as SQL injection, where deterministic solutions exist, prompt injection in LLMs operates within the realm of natural language, where there is no clear separation between instructions and data. This makes it challenging to address the issue directly. Instead, the solutions we outlined and that are currently used in the industry rely on approaches that treat the LLM and its outputs as untrusted. These include implementing external authorization controls to limit the scope of actions LLM agents can perform using tools/plugins, sanitizing outputs to remove potentially harmful content, and employing human-in-the-loop oversight to ensure that all actions taken by the LLM are explicitly approved by a human operator. Additionally, we emphasized the importance of sanitizing input in two ways: (1) by reducing the accepted character sets to thwart attackers' attempts to embed malicious instructions, and (2) the detection of potential adversarial prompts by leveraging machine learning models trained to identify signs of injection attempts.

Domain-specific prompt injection detection, WithSecure Labs, April 2024

Using residual heat maps to visualise Benford's multi-digit law

Benjamin Hull, Alexander Long and Ifan G Hughes

Benford's law, established over a century ago, reveals that the occurrence of the first significant digit in large numerical datasets follows a nonuniform distribution. This counterintuitive nature is useful in forensic accounting and detecting financial fraud. A recent investigation on house price data in England and Wales pre and post-2014 shows two distinct patterns of human intervention: selling property just below tax thresholds and psychological pricing with a bias towards final digits being 0 or 5. The analysis indicates that the 2014 legislative change to soften tax thresholds significantly impacted house price data.

Benjamin Hull et al 2022 Eur. J. Phys. 43 015803

Research Projects

Language Translation using the Transformer Architecture

Personal Research

Implements a translation model using the Transformer architecture, based on the groundbreaking paper "Attention is All You Need" (Vaswani et al., 2017). The implementation focuses on English-to-French translation whilst offering a simple to understand implementation of the architecture in PyTorch.

Colab Notebook GitHub Repository

An Implementation of GPT using Pytorch

Personal Research

A from-scratch implementation of the GPT (Generative Pre-trained Transformer) architecture using PyTorch. The implementation focuses on understanding the core components of the transformer architecture and its application to language modeling.

Colab Notebook GitHub Repository

Investment Strategy using Machine Learning and Technical Indicators

Personal Research

The following paper aims to introduce some basic machine learning models to identify buy- ing and selling conditions for financial assets. In particular, the S&P500 index will be con- sidered, with technical indicator features being extracted from historical price data. We will also consider techniques to identify and address overfitting, a condition where the model fails to generalise well to new data. We will then optimise the resultant model by tuning its hyperparameters to better fit the data. Our final model has an accuracy of 68% and could be used as part of a investment strategy to identify buying and selling conditions in stock indexes.

benh_machine_learning_technical_indicators_2024.pdf

Detecting Network Based Intrusions using Neural Networks

Personal Research

The following paper aims to give an overview of some basic machine learning techniques that can be used to identify network based intrusions. This will include prepossessing steps used to format the data correctly. Neural network based models will then be applied to perform binary classification of network intrusion data into either normal patterns or attack patterns. The effectiveness of these models will be evaluated and improvements, including tuning their hyperparameters, will be considered.

benh_machine-learning-intrusion-detection_2024.pdf

Can Benford's law be used to detect financial fraud?

Durham University Physics Level 4 Project

Benford's law (BL) describes the probability of a given digit occurring at a position (index) in a number. By analysing financial statements and other documents and measuring conformity with this law, it is could be possible to detect financial fraud. This project focuses on house price data, techniques used to measure conformity and SEC filling to determine how this law could be used to detect financial fraud.

benh_benfords-law-financial-fraud_2021.pdf

Benford's Law as an Extension of Zipf's Law

Personal Reserach

Analytically, samples taken from a log-uniform distribution comply with Benford's law (BL). A statistical derivation of Benford's law, originally given by Hill, relies on this fact. Zipf's law describes the occurrence of words in a given languages and follows a similar digit law to BL. When a language has an infinite number of words Zipf's law reduces to the Riemann Zeta function. By considering an extension of Zipf's law as a summation of an uncountable infinite number of languages, each with an infinite number of words, we show a connection between Zipf's law, the Zeta function and Benford's law. This immediately extends the BL beyond its classical definition and provides a rich mathematical structure to the theory which is related to the Zeta function.

benh_generating-benford-distributions_2023.pdf

Solving Patience

Personal Reserach

Patience is a simple card game and is similar to Solitaire. The outcome of Patience is caused entirely by the arrangement of the pack of cards the game is played with. In theory, if a player can analyse the deck before playing they should be able to determine the score they will achieve. This research aims to analyse the process of playing Patience programmatically and whether the final score can be predicted given any shuffled pack of cards. A time series interpretation of the data is presented and further research topics suggested, such as the K-Mean clustering algorithm to detect features and trends in the underling time series distribution.

benh_patience-solution-research_2023.pdf