Projects – Vinay R Jumani

Key projects

Things I consider representative of my current skillset.

FlashAttention-2 Implementation — CUDA & Triton

2025 · CUDA

Implemented the FlashAttention-2 algorithm (Tri Dao et al.) for efficient transformer training, inspired by OpenAI's Fused Attention work. Focused on optimizing attention computation and memory usage

CUDA MatMul Optimization

2025 · CUDA · code

Designed & implemented a full CUDA GEMM optimization pipeline, and achieved close to 92% performance of cuBLAS. Also analysed vLLM with Paged Attention paper.

Transformer from scratch

2024 · PyTorch · code

Full implementation of the Transformer architecture in PyTorch: embeddings, positional encodings, multi-head attention, encoder–decoder blocks and training loop. Built mainly to understand the architecture deeply, including attention visualisation and ablation.

SAIDL CoreML & multi-modality assignments

2024 · ML systems / GNNs · code

Series of research-style assignments implementing the APL framework, experimenting with noise schedules, and using GNNs for multi-modal tasks. Good exposure to controlled experimentation and analysis.

RAG chatbot over research PDFs

2024 · LLMs · code

Retrieval-augmented question-answering system over PDFs. Handles chunking, embedding, vector search and a simple LLM wrapper. Designed to keep responses grounded in the retrieved context.

ResNet-18 on CIFAR-10

2024 · computer vision · code

Training pipeline for ResNet-18 on CIFAR-10 using realistic tricks: data augmentation, cosine LR schedule, mixed precision, and gradient handling. Achieved ~90% accuracy with a clean, configuration-driven script.

Real-time hand-tracking game

2024 · OpenCV / MediaPipe · code

Small interactive game controlled by hand gestures, using MediaPipe for landmark detection and OpenCV for rendering. Optimised to run in real time on modest hardware, which made latency and pipeline issues very visible.

Hardware & electronics projects

since grade 9 · Instructables

DIY electronics, including a self-watering plant system, sensor projects and 3D-printed hardware. These projects gave me intuition for signals, noise and real-world debugging that feeds into my AI hardware work.

In progress

Experiments & failed directions

Ideas that didn’t fully work out but were useful to attempt.

Mamba implementation

state-space models · PyTorch

Research-oriented implementation of the Mamba architecture, experimenting with selective state-space mechanisms and long-sequence modelling efficiency.

FD-LoRA

PEFT / fine-tuning

Fast Dynamic LoRA variants using PEFT, aimed at efficient fine-tuning and rapid switching across multiple tasks.

GPU Direct / BAM experiments

GPU I/O · systems

Investigating GPU-Direct paths, bandwidth limitations and how they affect end-to-end training throughput for ML workloads.

FinTech education platform (early)

product exploration

Duolingo-style financial literacy platform aimed at students, with gamified modules and simple content. Currently in exploration / concept stage.

AI Rubik’s cube solver

vision + control · paused

Attempted integrating perception and planning for an automated Rubik’s cube solver. The project stalled, but it forced me to think about closed-loop control and robustness.

Qualcomm hackathon – AI hardware benchmarking

hackathon · rejected

Prototype for benchmarking AI models across hardware platforms. Didn’t move through the hackathon, but seeded later ideas for GPU benchmarking and optimisation work.

AI hardware optimisation SaaS

SaaS concept · on hold

Early attempts at a SaaS platform for AI model benchmarking and auto-tuning on different devices. Parked until I gather more practical evidence from current research projects.

StudyGPT, MemoryVault, “vibe coding” tools

startup prototypes

Several early experiments around study assistants, knowledge management and fast coding patterns. Most were intentionally short-lived, but provided signal on what’s worth pursuing.

Agentic AI prototypes

multi-agent · paused

Played with agentic frameworks and task routing between agents. Paused to deepen my foundations in core DL and GPU systems before revisiting.