coming soon!! deep dive into new nn architecture and performing better than transformers for language modeling (m log n compute vs n^2)