Hi, I’m Vinay.

I’m an undergraduate at BITS Pilani, Goa working at the intersection of deep learning, AI hardware, and high-performance computing. I enjoy understanding systems from first principles: how models are built, how they hit the GPU, and how to make that entire path faster and more reliable. And most importantly, I love to explore and play around with lots of stuff in tech.

vinayrjumani@gmail.com · github.com/Vinay12345-neutron · linkedin

Vinay R Jumani
B.E. Electronics & Instrumentation
BITS Pilani, Goa Campus
Current work
A brief snapshot. See Research for more detail.
EEG-LLM cognitive decoding
NTU Singapore · Dec 2025 - present

Decoding EEG signals into language space using LLMs, focusing on cognitive representations and neural-linguistic alignment. Deep learning pipelines for EEG preprocessing, time-frequency transforms, and multimodal embeddings.

Database Routing using LLMs on Enterprise Data
TCS Research · Nov 2025 - May 2026

Designing an LLM-assisted system to detect and resolve ambiguous natural language queries in enterprise search spanning databases, documents, and knowledge graphs.

GPU-accelerated HDF5 I/O
DaSH Lab, BITS Goa · Aug 2025 - Feb 2026

Exploring GPU-Direct Storage, HDF5 extensions and parallel I/O for large-scale ML training. Profiling bandwidth, PCIe usage and CPU-GPU transfer bottlenecks.

Selected projects
More details in Projects.
FlashAttention-2 Implementation — CUDA & Triton
CUDA · 2025

Implemented the FlashAttention-2 algorithm (Tri Dao et al.) for efficient transformer training, inspired by OpenAI's Fused Attention work. Focused on optimizing attention computation and memory usage

CUDA MatMul Optimization
CUDA · 2025 · code

Designed & implemented a full CUDA GEMM optimization pipeline, and achieved close to 92% performance of cuBLAS. Also analysed vLLM with Paged Attention paper.

Transformer from scratch
PyTorch · 2025 · code

Full implementation of the Transformer architecture in PyTorch: embeddings, positional encodings, multi-head attention, encoder-decoder blocks and training loop. Built mainly to understand the architecture deeply, including attention visualisation and ablation.

SAIDL CoreML & multi-modality assignments
GNNs · 2024 · Pytorch code

Series of research-style assignments implementing the APL framework, experimenting with noise schedules, and using GNNs for multi-modal tasks. Good exposure to controlled experimentation and analysis.