Mistral AI launches Mixtral-Next
chat.lmsys.orgMistral's process for releasing new models is extremely low-information. After getting very confused by this link I tried looking for a link that has any better information, and there just isn't one.
I thought Mixtral's release was weird when they just pasted a magnet link [0] into Twitter with no information, but at least people could download and analyze it so we got some reasonable third-party commentary in between that and the official announcement. With this one there's nothing at all to go on besides the name and the black box.
Company creates blackbox technology, and the company's communications are themselves like a blackbox... fitting
(I know that Mistral does a lot more stuff in the open than other companies, just couldn't resist the parallel between this and the blackbox limitations of LLMs in general)
> for releasing new models is extremely low-information
To be fair, this is not a release. This was the previous release https://mistral.ai/news/mixtral-of-experts/
It looks more like not trying very hard to hide things until release, rather than being a black box.
If this were the first incident like this I would agree, but they very intentionally dropped the magnet link for Mixtral on Twitter with no further context. That leaves me wondering if this was also a weird on purpose thing rather than just them being casual.
Does it matter? You know that if you really want to play with things early, you may get an opportunity. And if you want to read more details, you'll get an announcement too. What's the problem with it being either on purpose or casual?
It's an observation, not a complaint. It does leave very little to go on for an HN discussion, though, besides a meta conversation like this one.
Well, what could they say? Given the lack of transparency on the data it well could be:
“We’ve trained LLaMA MoE on a lot of GPT4 data. And this it is not as good as GPT4. And this is our blob, so we can release it under any license. If someone is silly enough to use what this blob generates, this is not our problem.”
For those unfamiliar with the LMSys interface:
Click/tap on "Direct Chat" in the top tab navigation and you can select "mistral-next" as model.
From limited experimentation, and within the confines of a single prompt (rather than a full chat), this model seems reasonably interesting. Does anyone have good examples of QA that showcase the capabilities compared to other advanced models?
I only get the message "Connection errored out."
AIExplained on youtube has guessed that Gemini 1.5 pro is taking Mistral’s accurate long content retrieval and Google just scaled it as much as they could. The Gemini 1.5 pro paper has a citation back to the last mistral paper in 2024.
And how does Mistral do "accurate long content retrieval"?
see the long range performance piece here https://arxiv.org/pdf/2401.04088.pdf
I don't think they explain it in the paper, do they? They just mention the result it seems. I am really curious to know too. Maybe they imply it's the result of their Mixture Of Experts architecture? Or maybe they just don't wanna say, idk
Note that it's actually "Mistral Next" not "Mixtral Next" - so it isn't necessarily a MoE. For example, an early version of Mistral Medium (Miqu) was not a MoE but instead a Llama 70B model. I wonder how many parameters this one has
I know what they were going for with the Mixtral name but every time I come across it I wonder if they considered just how easily the two might be confused. It seems like a poor branding decision - what if some expected the Mixtral performance but accidentally uses a Mistral model? What if someone wants the low resource usage of e.g. Mistral 7B but tries out Mixtral 8x7B instead? It's especially hard when your colleagues aren't necessarily native English speakers.
There's got to be a better name for such a cool product. Maybe MistralX? MistMix?
I feel like this is not really an issue. I personally lost track of all the llamas, <not>gpts, etc - but if somebody is going to seriously use a certain model, they'll find out soon enough if they're using the wrong one.
It has definitely affected myself and colleagues, perhaps we didn't waste much time but it's annoying. Even if it isn't a problem, it really cannot hurt to make the naming easier to understand.
I agree. I also think the Llama naming was confusing - versioning by capitalization? (LLaMA vs Llama)
Slightly related question: what's a good coding LLM to run on a 4070 12GB card?
Also, do coding LLMs use treesitter to "understand" code?
I’m pretty new to running these locally, but here’s my understanding:
Best models currently: codellama or deepseek coder. 6.7B or 1B depending on how much latency you can tolerate
Treesittee: from looking at the logs of the chat completions requests for Continue or Twinny extensions for VS Code, they both appear to just send a chunk of the document as well as a special placeholder to indicate where the cursor currently is.
Another one is https://github.com/WisdomShell/codeshell/blob/main/README_EN... and it has its own IntelliJ plugin.
I'm also interested in the answer to that.
Depends on what you want to use it for. I use deepseek-coder v1 (1.5 is too verbose). I use it like a customized web search to quickly build one-off scripts in python.
If you're wanting something to be your hands so you don't have to type, open-source LLMs and IDE integration is not reliably there yet. Follow the AIDER discord to stay up on the latest in this area.
> do coding LLMs use treesitter...?
It's up to the app to put that into the context. Generally, coding LLMs do well if you provide them the source tree, graph, search results, notable files, etc in the context. The is how Sourcegraph's Cody product works, for example.
Try deepseek 6.7B
It's quite funny to use! It is better when speaking French than chat gpt3.5 on my opinion
It is a French company so maybe they have extra French datasets?
I've been quite disappointed by French LLMs on Huggingface when I tried a month ago.
Mistral models tend to be quite good at non-English languages. French of course, but also Spanish, German and Italian. From what I have read it’s something they consider important when training their models.
wow, this might be the best LLM that i've used in terms of phrasing and presenting the answers.
I have the same impression. It is less verbose and doesn't beat around the bush (unlike GPT-3.5/4.5-Turbo), provides almost the same quality of code for the test cases I have, and has similar (GPT-4) or much better (GPT-3.5) spatial comprehension. It is at the GPT-3.5 level of math (read it as not good enough but better than anything else).
No indication that this a MoE (Mistral not Mixtral).
Very exciting nevertheless, here’s hoping the bless the OS community once again!
It is indicated by the magnet link which they have posted on twitter (see comments above).
There's no magnet link for this new (Mistral Next) model as of right now.
Could it be Mistral Large? This beats GPT-4 on my personal test.
I tried a bunch of my recent prompts to GPT-4 from daily use - this was often just slightly worse, sometimes slightly better. Fast too (tokens per second) while also not being overly wordy - very much appreciated that.
Refusals are a bit "I am just a language model"-y which GPT-4 has gotten away from. Also it's more refuse-y if I broach something rudely (which again I've found GPT-4 to have become much better at.)
Way better at everything than whichever Gemini I've been trying recently (can't tell for sure what I'm using when I use it.) But that one isn't even in contention for any use at all IME.
Overall it felt like I need to try it in daily use to work out if it's a contender with GPT-4 as a daily driver.
The clincher for me will be if the API to it is not so exorbitantly priced as GPT4, and if mistral can make using LoRAs economical.
Curious what your use case is that API tokens are actually significant expense at this time. I get it if you’re reselling a product but myself: abusively and naively messing around, I don’t see major prices as an end user. I sometimes hit the rate limit in ChatGPT, maybe like twice a week, but otherwise my monthly API bill in the last 6 months usually is in the dollars frame. Not tens or thousands of dollars, just dollars
Back when GPT4 came out mid last year my bills were slightly more dramatic (i am what you would probably call a semi heavy individual user) but they never surpassed $200 USD for a single month.
GPT4 is very cheap when you're hitting the api with common knowledge questions or reasoning about known things, and asking for short replies. It's very expensive when you're loading up the context with a ton of data and asking it to convert some data points into a narrative structure.
Oh, dude, you'll easily end up with $200 per hour if you are working on generating synthetic data...
It doesn't on mine it still is really wrong but compared to mixtral when you tell next that it did a mistake it corrects it to the right answer where mixtral was making things up that were even more wrong.
This was linked randomly on Mistrals Discord chat, nothing "official" yet.
It's a preview of their newest prototype model.
To use it, click "Direct Chat" tab and choose "Mistral next"
I used this but, upon asking which model it is, it replied as being a "fine-tuned version of GPT 3.5". Any clue why? In a second chat it replied "You're chatting with one of the fine-tuned versions of the OpenAssistant model!".
Models don't "know" what they are. Usually the system prompt contains that imformation. If it doesn't, it will hallucinate an answer for you.
From my tests, it did better than Gemini Ultra on a few reason/logic questions.
The Together.AI logo at the bottom is very hard to read... (Dark gray on black)
You can literally type "woke shit" in and you get woke shit out. I am so impressed.
As someone who has only been using GPT-4 since its release, I am pleasantly surprised by how far open LLMs have come.