Key projects
Things I consider representative of my current skillset.
FlashAttention-2 Implementation — CUDA & Triton
2025 · CUDA

Implemented the FlashAttention-2 algorithm (Tri Dao et al.) for efficient transformer training, inspired by OpenAI's Fused Attention work. Focused on optimizing attention computation and memory usage

CUDA MatMul Optimization
2025 · CUDA · code

Designed & implemented a full CUDA GEMM optimization pipeline, and achieved close to 92% performance of cuBLAS. Also analysed vLLM with Paged Attention paper.

Transformer from scratch
2024 · PyTorch · code

Full implementation of the Transformer architecture in PyTorch: embeddings, positional encodings, multi-head attention, encoder–decoder blocks and training loop. Built mainly to understand the architecture deeply, including attention visualisation and ablation.

SAIDL CoreML & multi-modality assignments
2024 · ML systems / GNNs · code

Series of research-style assignments implementing the APL framework, experimenting with noise schedules, and using GNNs for multi-modal tasks. Good exposure to controlled experimentation and analysis.

RAG chatbot over research PDFs
2024 · LLMs · code

Retrieval-augmented question-answering system over PDFs. Handles chunking, embedding, vector search and a simple LLM wrapper. Designed to keep responses grounded in the retrieved context.

ResNet-18 on CIFAR-10
2024 · computer vision · code

Training pipeline for ResNet-18 on CIFAR-10 using realistic tricks: data augmentation, cosine LR schedule, mixed precision, and gradient handling. Achieved ~90% accuracy with a clean, configuration-driven script.

Real-time hand-tracking game
2024 · OpenCV / MediaPipe · code

Small interactive game controlled by hand gestures, using MediaPipe for landmark detection and OpenCV for rendering. Optimised to run in real time on modest hardware, which made latency and pipeline issues very visible.

Hardware & electronics projects
since grade 9 · Instructables

DIY electronics, including a self-watering plant system, sensor projects and 3D-printed hardware. These projects gave me intuition for signals, noise and real-world debugging that feeds into my AI hardware work.

In progress
Experiments & failed directions
Ideas that didn’t fully work out but were useful to attempt.
Mamba implementation
state-space models · PyTorch

Research-oriented implementation of the Mamba architecture, experimenting with selective state-space mechanisms and long-sequence modelling efficiency.

FD-LoRA
PEFT / fine-tuning

Fast Dynamic LoRA variants using PEFT, aimed at efficient fine-tuning and rapid switching across multiple tasks.

GPU Direct / BAM experiments
GPU I/O · systems

Investigating GPU-Direct paths, bandwidth limitations and how they affect end-to-end training throughput for ML workloads.

FinTech education platform (early)
product exploration

Duolingo-style financial literacy platform aimed at students, with gamified modules and simple content. Currently in exploration / concept stage.

AI Rubik’s cube solver
vision + control · paused

Attempted integrating perception and planning for an automated Rubik’s cube solver. The project stalled, but it forced me to think about closed-loop control and robustness.

Qualcomm hackathon – AI hardware benchmarking
hackathon · rejected

Prototype for benchmarking AI models across hardware platforms. Didn’t move through the hackathon, but seeded later ideas for GPU benchmarking and optimisation work.

AI hardware optimisation SaaS
SaaS concept · on hold

Early attempts at a SaaS platform for AI model benchmarking and auto-tuning on different devices. Parked until I gather more practical evidence from current research projects.

StudyGPT, MemoryVault, “vibe coding” tools
startup prototypes

Several early experiments around study assistants, knowledge management and fast coding patterns. Most were intentionally short-lived, but provided signal on what’s worth pursuing.

Agentic AI prototypes
multi-agent · paused

Played with agentic frameworks and task routing between agents. Paused to deepen my foundations in core DL and GPU systems before revisiting.