Hi, I’m Vinay.
I’m an undergraduate at BITS Pilani, Goa working at the intersection of deep learning, AI hardware, and high-performance computing. I enjoy understanding systems from first principles: how models are built, how they hit the GPU, and how to make that entire path faster and more reliable. And most importantly, I love to explore and play around with lots of stuff in tech.
vinayrjumani@gmail.com · github.com/Vinay12345-neutron · linkedin
BITS Pilani, Goa Campus
Decoding EEG signals into language space using LLMs, focusing on cognitive representations and neural-linguistic alignment. Deep learning pipelines for EEG preprocessing, time-frequency transforms, and multimodal embeddings.
Designing an LLM-assisted system to detect and resolve ambiguous natural language queries in enterprise search spanning databases, documents, and knowledge graphs.
Exploring GPU-Direct Storage, HDF5 extensions and parallel I/O for large-scale ML training. Profiling bandwidth, PCIe usage and CPU-GPU transfer bottlenecks.
Implemented the FlashAttention-2 algorithm (Tri Dao et al.) for efficient transformer training, inspired by OpenAI's Fused Attention work. Focused on optimizing attention computation and memory usage
Designed & implemented a full CUDA GEMM optimization pipeline, and achieved close to 92% performance of cuBLAS. Also analysed vLLM with Paged Attention paper.
Full implementation of the Transformer architecture in PyTorch: embeddings, positional encodings, multi-head attention, encoder-decoder blocks and training loop. Built mainly to understand the architecture deeply, including attention visualisation and ablation.
Series of research-style assignments implementing the APL framework, experimenting with noise schedules, and using GNNs for multi-modal tasks. Good exposure to controlled experimentation and analysis.