Building Flash Attention from Source

Notes from compiling Flash Attention on an A800 box. If you’re hitting endless build times or OOM “killed” errors, the key env vars and pitfalls here may save you time.

2025-07-26    610 words    3 min

Install CUDA and an NLP Stack with Conda (No Root)

How a non-root user can install a newer version of the transformers suite without being able to change the version of the installed cuda driver.

2024-03-10    684 words    4 min