Mistral AI has closed its $640mm Series B led by General Catalyst, in a mix of equity and debt, which values the company at $6bn.
X.AI Announces Grok-1.5
By releasing the model weights and network architecture of Grok-1 two weeks ago, we presented a glimpse into the progress xAI had made up until last November. Since then, we have improved reasoning and problem-solving capabilities in our latest model, Grok-1.5.
Capabilities and Reasoning
One of the most notable improvements in Grok-1.5 is its performance in coding and math-related tasks. In our tests, Grok-1.5 achieved a 50.6% score on the MATH benchmark and a 90% score on the GSM8K benchmark, two math benchmarks covering a wide range of grade school to high school competition problems. Additionally, it scored 74.1% on the HumanEval benchmark, which evaluates code generation and problem-solving abilities.
Long Context Understanding
A new feature in Grok-1.5 is the capability to process long contexts of up to 128K tokens within its context window. This allows Grok to have an increased memory capacity of up to 16 times the previous context length, enabling it to utilize information from substantially longer documents.
Furthermore, the model can handle longer and more complex prompts, while still maintaining its instruction-following capability as its context window expands. In the Needle In A Haystack (NIAH) evaluation, Grok-1.5 demonstrated powerful retrieval capabilities for embedded text within contexts of up to 128K tokens in length, achieving perfect retrieval results.
Grok-1.5 Infra
Cutting-edge Large Language Model (LLMs) research that runs on massive GPU clusters demands robust and flexible infrastructure. Grok-1.5 is built on a custom distributed training framework based on JAX, Rust, and Kubernetes. This training stack enables our team to prototype ideas and train new architectures at scale with minimal effort. A major challenge of training LLMs on large compute clusters is maximizing reliability and uptime of the training job. Our custom training orchestrator ensures that problematic nodes are automatically detected and ejected from the training job. We also optimized checkpointing, data loading, and training job restarts to minimize downtime in the event of a failure. If working on our training stack sounds interesting to you, apply to join the team.
Looking Ahead
Grok-1.5 will soon be available to early testers, and we look forward to receiving your feedback to help us improve Grok. As we gradually roll out Grok-1.5 to a wider audience, we are excited to introduce several new features over the coming days.
Note that the GPT-4 scores are taken from the March 2023 release. For MATH and GSM8K, we present maj@1 results. For HumanEval, we report pass@1 benchmark scores.
More News
Mistral AI has closed its $640mm Series B led by General Catalyst, in a mix of equity and debt, which values the company at $6bn.
To support our rapid consumer adoption and expansion plans, we’ve raised $73.6 million in Series B funding from trusted VC firms and prominent tech visionaries. IVP led the round with continued support from our Seed and Series A investors NEA, Elad Gil, Nat Friedman, and Databricks, as well as new investors NVIDIA, Jeff Bezos (through Bezos Expeditions Fund), Tobi Lutke, Bessemer Venture Partners, Naval Ravikant, Balaji Srinivasan, Guillermo Rauch, Austen Allred, and Factorial Funds among others. Building upon our Series A from last year, we’ve now raised $100 million to date.