Signals
Back to feed
8/10 Industry 3 Jul 2026, 05:00 UTC

Together AI raises $800M in new funding and reaches $1.15B in annual recurring revenue.

Together AI's staggering $1.15B ARR proves that enterprise demand for highly optimized, open-weight model infrastructure is rivaling closed-API providers. With Tri Dao leading their science division, their moat isn't just compute scale—it's foundational algorithmic efficiency like FlashAttention that maximizes GPU utilization. This funding will likely accelerate their distributed training orchestration and next-generation inference architectures.

Together AI has secured $800 million in a new funding round, propelling its annual revenue to an impressive $1.15 billion. This milestone cements the company's position as a dominant force in the generative AI infrastructure layer, proving that the market for training and serving open-weight models is aggressively competing with the closed-API ecosystems built by companies like OpenAI and Anthropic.

From an engineering perspective, Together AI's success is deeply rooted in algorithmic and systemic efficiency rather than just brute-force compute. A critical component of this technical moat is Chief Scientist Tri Dao, the primary author of FlashAttention. By making attention mechanisms hardware-aware, FlashAttention drastically reduces memory reads/writes between GPU HBM and SRAM. This allows Transformers to run significantly faster while consuming less VRAM, fundamentally altering the unit economics of large language model (LLM) inference and training. Together AI leverages these low-level optimizations to offer highly competitive pricing and lower latency for enterprise workloads.

This development matters because it signals a maturation in enterprise AI adoption. Companies are moving past prototype API calls and are now investing heavily in fine-tuning and hosting their own models (like Llama 3 or Mixtral) on optimized infrastructure to maintain data sovereignty and control costs. Together AI's $1.15B ARR indicates that this transition is happening at massive scale, shifting the value capture toward specialized cloud providers.

What to watch next: With an $800M war chest, expect Together AI to aggressively expand its compute clusters, likely securing massive allocations of NVIDIA's upcoming Blackwell GPUs. Furthermore, watch for new breakthroughs from Tri Dao's research team in sub-quadratic attention mechanisms (like Mamba or structured state space models) being integrated directly into Together's serving stack. The next frontier for the company will be abstracting away the complexities of multi-cloud, decentralized GPU orchestration for massive-scale pre-training.

Together AI FlashAttention Model Infrastructure GPU Optimization Enterprise AI