Google researchers have revealed that memory and interconnect are the primary bottlenecks for LLM inference, not compute power, as memory bandwidth lags 4.7x behind.