1st Place: Built an LLM-powered CUDA codegen system for PyTorch modules, enabling custom kernel generation and benchmarking on NVIDIA GPUs.