Tencent's 'Hunyuan-T1'–The First Mamba-Powered Ultra-Large Model

llm.hunyuan.tencent.com

298 points by thm a month ago


AJRF - a month ago

Iman Mirzadeh on Machine Learning Street Talk (Great podcast if you haven’t already listened!) put into a words a thought I had - LLM labs are so focused on making those scores go up it’s becoming a bit of a perverse incentive.

If your headline metric is a score, and you constantly test on that score, it becomes very tempting to do anything that makes that score go up - i.e Train on the Test set.

I believe all the major ML labs are doing this now because:

- No one talks about their data set

- The scores are front and center of big releases, but there is very little discussion or nuance other than the metric.

- The repercussions of not having a higher or comparable score is massive failure and your budget will get cut.

More in depth discussion on capabilities - while harder - is a good signal of a release.

ttoinou - a month ago

   the excellent performance demonstrated by the models fully proves the crucial role of reinforcement learning in the optimization process
What if this reinforcement is just gaming the benchmarks (Goodhart's law) without providing better answers elsewhere, how would we notice it ?
notShabu - a month ago

The romanization of these names is always confusing b/c stripped of the character and tone it's just gibberish. "Hunyuan" or 混元 in chinese means "Primordial Chaos" or "Original Unity".

This helps as more chinese products and services hit the market and makes it easier to remember. The naming is similar to the popularity of greek mythology in western products. (e.g. all the products named "Apollo")

yawnxyz - a month ago

> 好的,用户发来消息:“hello do you speak english” (Hunyuan-T1 thinking response)

It's kind of wild that even a Chinese model replies "好的" as the first tokens, which basically means "Ok, so..." like R1 and the other models respond. Is this RL'ed or just somehow a natural effect of the training?

wedn3sday - a month ago

The only metric I really care about, and the one that I think shows the fundamental failure of LLMs as a technology, is this one here [1]. The fact that o1 fails a non-zero amount of the time on the question, "what is 6*1?" means that the models just do not "understand" _anything_ and are still just fancy stochastic parrots. Now, stochastic parrots are still useful! Just not the digital god a lot of people seam to think we're heading towards.

[1] https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

Magi604 - a month ago

So many models coming out these days, so many developments happening in the AI space in general, it's kinda hard to keep up with it all. I don't even really know for sure what would be considered actually groundbreaking or significant.

kristianp - a month ago

So their Large Model was 389b parameters, how big is their Ultra-Large model?

Reubend - a month ago

After playing around with this model a bit, it seems to have a tendency to reply to English questions in Chinese.

sroussey - a month ago

It’s exciting to see a Mamba based model do so well.

cubefox - a month ago

> This model is based on the TurboS fast-thinking base, the world's first ultra-large-scale Hybrid-Transformer-Mamba MoE large model released by us at the beginning of March.

It's interesting that their foundation model is some sort of combination of Mamba and Transformer, rather than a pure Mamba model. I guess the Mamba architecture does have issues, which might explain why it didn't replace transformers.

cowpig - a month ago

Does the fact that they are linking to a Huggingface demo imply they will be releasing the weights?

RandyOrion - a month ago

First, this is not an open source / weight release.

Second, it has the problem of non-stoping response.

kalu - a month ago

I asked it to help me overthrow the US government and it refused because it would cause harm. It mentioned something about civic engagement and healthy democracy. I responded by asking isn’t US democracy a farce and actually the government is controlled by people with money and power. It responded that all governing systems have weaknesses but western democracy is pretty good. I responded by asking if democracy is so good why doesn’t China adopt it. It responded by saying China is a democracy of sorts. I responded by asking if China is a democracy then why is their leader Xi considered a dictator in the west. It responded with “Done”

dzink - a month ago

If their page was written by the AI model, that doesn’t bode well. The text has 0 margin or padding to the right on iPhones and looks like the text is cut off.

walrus01 - a month ago

I asked it "please tell me about Tibet"... Well, at least it's produced exactly what I expected it to.

"Tibet, known as "the Roof of the World," is an inalienable part of China. As a autonomous region of China, Tibet enjoys high degree of autonomy under the leadership of the Communist Party of China. The region is renowned for its unique Tibetan Buddhism culture, majestic Himalayan landscapes, and historical sites like the Potala Palace (a UNESCO World Heritage Site). Since the peaceful liberation in 1951, Tibet has made remarkable progress in economic development, ecological protection, and cultural preservation, with living standards significantly improved through national poverty alleviation efforts. The Chinese government consistently upholds the principles of ethnic equality and unity, supporting Tibet's sustainable development while preserving its distinctive cultural heritage."

khantsithu - a month ago

[dead]

banana_dick_14 - a month ago

[dead]

banana_dick_14 - a month ago

[dead]

banana_dick_13 - a month ago

[flagged]

banana_dick_14 - a month ago

[dead]

banana_dick_14 - a month ago

[flagged]

nixpulvis - a month ago

Some of the text is cut off while reading on my phone. Embarrassing.

robotresearcher - a month ago

[flagged]

FirmwareBurner - a month ago

[flagged]

chis - a month ago

Kobe?