Discussion about this post

User's avatar
Neural Foundry's avatar

Really insightful breakdown on decoupling memory from computation. The sparsity allocation curve is fascinating, seems like optimizing that 20% tradeoff between MoE and Engram is where the magic happens. The prefetch strategy for offloading to CPU RAM is clever, makes me think about how we underutilize host memory in modt distributed setups. One question tho, how does this gating mechanism handle adversarial collisions, like if someone deliberately constructs inputs to maximize hash noise?

No posts

Ready for more?