Settings

Theme

How the voices for ChatGPT were chosen

openai.com

54 points by saliagato 2 years ago · 77 comments

Reader

datahack 2 years ago

Am I the only one that really dislikes these overly perky voices? And the one they used for the demo was GOD AWFUL. If this is what the future sounds like… idk. I was hoping for a calm, neutral and not eager assistant with a little class. A little Jarvis maybe? Something calm and easy on the ears.

These all sounded to me like people auditioning for a role in film and television or something.

  • DeathArrow 2 years ago

    I would like something more David Attenborough-ish.

  • vlasky 2 years ago

    I'm surprised that you can't configure the pace or pitch of the voices. The Web Speech API, around for a long time, has that capability. I'd also love to have the option of Australian and British accents.

  • iLoveOncall 2 years ago

    I use "Cove" which is much more tame that the other by default and (I assume that's the case for all) matches your tone.

    So if I'm asking it a question on a neutral tone it will answer in the same way.

  • exitb 2 years ago

    This is contextual, the model seems to match the user vibe and the general situation. Take a look at the "Customer service proof of concept" video added to the announcement post[1] - there are two AIs talking using very different voices.

    [1] https://openai.com/index/hello-gpt-4o/

  • genericacct 2 years ago

    You're not alone. But it should be an easy fix with voice cloning proceeding as fast as it is.

    • datahack 2 years ago

      I’m hoping. That’s a good point.

      It really sounded like people from central casting. The fact that is was exactly that is hilarious to me.

      Your point is well advised!

    • anotherhue 2 years ago

      Simulation complete. Enter when ready.

    • echelon 2 years ago

      OpenAI won't be setting the mood once open source catches up. They have no moat. This is just another flashy demo attempting to throw shade at Google.

      There were already AI products with exactly this behavior (perhaps sans multi-modality) prior to OpenAI's demo. Some of them were even open source. And these products will cater to the things we care about: local, niche, privacy-respecting, with bespoke domain expertise.

  • jimsimmons 2 years ago

    Google did a much better job here and everyone glossed over

  • jareklupinski 2 years ago

    i want jonah jameson from the spiderman movies yelling things at me

LeoPanthera 2 years ago

None of the voices really seem that good to me. I think it's because they're all aggressively American. It would be nice to hear some other accents.

  • Joeri 2 years ago

    I had Sky answer in dutch, and hearing dutch with an american accent is funny but a little jarring. The frustrating part of being a flemish dutch speaker is that dutch itself is already poorly served by AI tools (it’s actually impressive chatgpt can do it at all) and additionally the flemish accent and dialects are not supported at all in pretty much anything, so all dutch text to speech sounds foreign.

    I hope eventually they can pick up language, dialect and accent from youtube content or something like that, because voice casting will never scale to the thousands of variations they need to have to support a global audience. Instead of picking a voice from a handful of choices it should do a q&a where it learns your locale and intonation preferences and generates a tailored voice on the fly.

    Still, I am impressed with the quality of the voices they have and with how well they manage dutch. It is just the case that there still is a lot of room for growth.

nacnud 2 years ago

As a British person, I find all the current OpenAI voices have overly strong (American) accents, which are way too perky/enthusiastic for my ears. It would be great if they could offer a more neutral accent, or even one British accent? (Example: Voice 5 from pi.ai is excellent, IMHO)

  • glimshe 2 years ago

    Some British accents, in particular outside of London, can be almost unintelligible to general English speakers.

    There is nothing intrinsically neutral about the British accent, it's more a matter of diction from the speaker. I would also not consider the OpenAI accents as neutral American, feels more like "Young Californian adult woman" accent to me.

    • MrSkelter 2 years ago

      You don’t seem to have a point. Thick accents are unintelligible regardless of source.

      The UK has more distinct accents than the US. However RP English, which was popular in an American form as “mid Atlantic” last century, is almost impossible for any English speaker to misconstrue thanks to its emphasis on clear diction. British actors have been proving that in the US for over a century. No one needs subtitles to understand Hugh Grant. Standard US accents frequently fail the merry, marry, Mary test and are worse overall.

      Even us Americans know this. The classic, original, beloved voice of the NY subway was a British guy faking a “flat” American accent.

msoad 2 years ago

Berlin famously casted a gender neutral voice for public transport announcements. It would be nice if OpenAI would have done that too. At least for one of the options. Computers better not have genders imo

  • Spiwux 2 years ago

    They're clearly trying to imitate human interaction, and believe it or not, the vast majority of humans have genders they're perfectly happy with.

    • ahofmann 2 years ago

      Well, but it is not a human. It's a talking computer and trying to sound not like a computer is disingenuous, creepy and heavily misleading. The movie "her" made a good point about that.

      • averageRoyalty 2 years ago

        I actually think the movie "Her" showed us the opposite - people feel comfortable with their computer sounding like a human. Almost every character in the film finds it near natural.

  • arianvanp 2 years ago
  • xdennis 2 years ago

    > Berlin famously casted a gender neutral voice for public transport announcements.

    The person in question is transsexual. You're saying that regular men and women aren't neutral but transsexuals are?

    > Computers better not have genders imo

    This doesn't make any sense. Gender is an intrinsic part of many languages which cannot be ripped out. It's impossible to talk in some languages without picking a way of speaking which is either masculine or feminine.

  • avereveard 2 years ago

    I'd love a glados voice or a extremely synthesized one 80s style. I'm using an open one for now, and I think it's very appropriate.

  • AnonymousPlanet 2 years ago

    I wouldn't be surprised if the person voicing those announcements would be more happy to be called female than gender neutral.

    • thomashop 2 years ago

      She's trans, so probably yes. But in an interview she says she is very happy to having been chosen as the gender-neutral voice of Berlin's subway.

  • GaggiX 2 years ago

    What is a gender neutral voice? Something that is interpolated between a male and a female voice?

    • thomashop 2 years ago
      • iLoveOncall 2 years ago

        It sounds genuinely bad though.

        It's from 4 years ago so I assume the quality could be much better, but it still feels a lot less natural than to talk with a "gendered" voice.

        I also don't really understand the problem of having only gendered voices. It's not YOUR voice, GPT-4 isn't speaking in your place, it's another entity that has its own voice.

        Do you ask people to speak with a gender neutral voice? If not how is that different?

        • thomashop 2 years ago

          Yeah, it's not great. I just thought it was interesting someone tried.

          I don't know how I got myself into the position in this argument but where I'm coming from is a sexy female voice as the default for a voice assistant seems old-fashioned to me.

          It feels like a lot of products are designed by white males and they represent their fantasies and desires. The customers are more diverse than the people designing the products and companies should be mindful of that.

  • iLoveOncall 2 years ago

    Why?

    • thomashop 2 years ago

      Because assigning genders to computers makes as much sense as giving your toaster a name, it’s 2024, so why not keep the gender out of it?

      The why question is easily answered if you see how many negative reactions their choice of voice caused. A gender-neutral voice would have just avoided annoying a certain percentage of the population, including me.

      I'm happy if advertising stops hitting the sexy/cliched stereotypes.

      Sometimes the Guardian goes a bit far but OpenAI could have avoided this kind of article: https://www.theguardian.com/commentisfree/article/2024/may/1...

      (Edit: I guess I was being slightly inflammatory with my first sentence. I think the default voice should be gender-neutral and then let the user choose what makes them happy. I don't think it was clever of OpenAI to use a sexy female as the default voice in their demos as evidenced by us having this discussion)

      • Arisaka1 2 years ago

        >Because assigning genders to computers makes as much sense as giving your toaster a name, it’s 2024, so why not keep the gender out of it?

        The goal of a tool is to be used by someone, and if the interface is the voice that the user can interact with it makes sense that it should ultimately be up to the user's preferences how the voice will sound like.

        I see the fact that they're aiming for gender-neutral voice as yet another ludicrous attempt to advertise their advocation for inclusiveness which, while I'm in favor, I think has manifestations that go well past benefiting the original intention. Examples: Main over Master branch on git repositories, Latinx, removing "blind playthrough" on Twitch.tv because it indicates ableism, and so on.

        I don't mind having some voice selections out of the box, but if they're gonna restrict my options and ability to change them to fit my preferences then I do mind. Our primal brain (lizard/monkey, or whatever tag you feel like assigning) will always perceive voice interaction as "talking to someone else", so why not just let the user choose who they talk to? It's a tool, for the user's needs. There's no need with appropriate ascribing of a gender to a tool, because it's not a human or anything living.

        • thomashop 2 years ago

          I guess I went a bit far with my comment. I don't think one needs to restrict the voices to be only gender-neutral.

          But I think the default voice and your first public demos could be gender-neutral these days.

      • Al-Khwarizmi 2 years ago

        My 4-year-old son didn't give our toaster a name, but he did give our robot vacuum a name. And it's just a moving gadget with some prerecorded voices.

        ChatGPT, a system designed to deal with human language and answer in a way similar to a human, will surely be anthropomorphized, and many people will want it to have male or female voices.

        Gender neutral should definitely exist as an option, it makes a lot of sense, but I don't see why it should be the only option.

        They would have annoyed less people if the accents weren't so American and quirky. Google Maps has had a female voice since like forever and I don't remember any outrage.

      • GaggiX 2 years ago

        >The why question is easily answered if you see how many negative reactions their choice of voice caused.

        To be honest, this HN thread was the first time I saw someone complaining about male and female voices in ChatGPT.

        >A gender-neutral voice would have just avoided annoying a certain percentage of the population, including me.

        The voice actors and actresses hired by OpenAI use their natural voices for training, I don't understand how that could be annoying to anyone, is the problem that they didn't hire a trans person (I imagine they have a more neutral voice)?

        • thomashop 2 years ago

          > To be honest, that HN thread was the first time I saw someone complaining about male and female voices in ChatGPT.

          It's the fifth place I've seen it since yesterday. And I haven't been looking. Even my mum sent me a Guardian article about it on Whatsapp.

          Daily show: https://www.youtube.com/shorts/51ucQ4s7Crc

          In my opinion, they should have hired a trans person, created a gender-neutral voice, and used that as the default. It would not have caused a backlash to the voices.

          • iLoveOncall 2 years ago

            > In my opinion, they should have hired a trans person, created a gender-neutral voice, and used that as the default.

            Estimates for the percentage of transgender people is between 0.1% and 0.6% of the global population.

            OpenAI offers 5 voices. I'd say it's perfectly statistically representative of the population.

            Also the voice "Breeze" sounds gender-neutral to me.

            • thomashop 2 years ago

              Yeah, upon reflection, I'm not sure if my suggestion is the best.

              Just, the default subservient, sexy female assistant doesn't help this technology's inclusivity.

          • GaggiX 2 years ago

            >Even my mum sent me a Guardian article about it on Whatsapp.

            I'm assuming you're trans and you follow or know people who seem to really care about these issues. I think you might be deeply biased, I search for the ChatGPT voice on Google and no one really seems to care about it in a negative way, if there was an actual backlash I would be able to find it easily.

            >In my opinion, they should have hired a trans person, created a gender-neutral voice, and used that as the default.

            I think the vast majority of the population would find it more natural to have a female or male voice. I also believe that people should be hired for their skills, not their gender.

            • thomashop 2 years ago

              I agree that hiring should be based on skills. Creating a gender-neutral voice option in ChatGPT isn't about excluding male and female voices but promoting inclusivity. Offering a neutral voice by default helps avoid gendered stereotypes and makes technology feel more accessible and less biased.

              Couldn't the fact that most ChatGPT developers are white, cis-male, and affluent introduce bias that one needs to actively work against to ensure a more inclusive and representative technology? Given historical discrimination, we need to elevate underrepresented voices to even the playing field.

              • GaggiX 2 years ago

                >Offering a neutral voice by default helps avoid gendered stereotypes

                I think the problem lies with a small group of people online who always see malice first, if the AI has a female voice then it's because people think women are submissive, if the AI has a male voice then it would be the patriarchy, the AI can tell you where to go for example. Some people who want to complain will always find a way to complain, there is nothing wrong with having a female or male voice as the default one.

                >most ChatGPT developers are white, cis-male

                White and cis doesn't really surprise me, since most of the US is white and cis; Sam Altmam is gay, if this sort of thing actually matters. The smaller presence of women in STEM disciplines is noticeable; for some reason, all of my female friends seem to care far less about computers, probably some bias at portraiting hackers as men in the media or something (even though the vast majority of hackers are indeed men), it's like the chicken or egg dilemma.

                • thomashop 2 years ago

                  I don't think I'm always seeing malice first. I am just tired of the cliched sexy female voice assistant.

                  Why not choose a gender-neutral voice if it makes a minority more comfortable? Does a gender-neutral voice cause harm to the rest of us?

                  • GaggiX 2 years ago

                    Like I said, I think the vast majority of the population would find it more natural to have a female or male voice. Also, it's not like everyone in your minority actually cares, I have a few friends who are trans, I doubt they care about the ChatGPT voice. I understand not wanting an overly sexualized voice, but even a male or female voice, I don't get it.

          • carroted 2 years ago

            That's still going to be either a female or male voice isn't it? Based on the fact that the person providing the voice is either female or male.

      • falcor84 2 years ago

        > Because assigning genders to computers makes as much sense as giving your toaster a name

        For whatever historical reason, the majority of Indo-European languages assign genders to nouns, so apparently it does make some sense.

        Also, I'm offended on behalf of all Cylons.

      • xdennis 2 years ago

        > Because assigning genders to computers makes as much sense as giving your toaster a name

        I don't know if you speak a language with genders, I assume you don't. A computer personality needs a gender in some languages in order to talk naturally about itself.

        For example, if it needs to answer a question with "I'm not sure." In Romanian it would say "Nu-s sigur." for a male personality or "Nu-s sigură." for female. There is no other option.

        > it’s 2024

        You know a position is bad when it's justified by "it's current year".

      • iLoveOncall 2 years ago

        > A gender-neutral voice would have just avoided annoying a certain percentage of the population, including me.

        How do you manage to be annoyed when they offer both male and female voices?

        I think you're the problem here buddy.

        • thomashop 2 years ago

          I was referring to the default being a sexy female. Maybe it makes sense because the majority of users are male but I guess that's what I'd like to see change.

          • iLoveOncall 2 years ago

            I recommend you actually try it. While it sounded like this in the demo, it's not actually the default in the app.

            The default sounds gender-neutral to me, or at least not very feminine, and it matches your tone, so it's "sexy" if you're being "sexy".

            That said I think we all agree the demo was awful.

            • thomashop 2 years ago

              Totally. I may have gone a bit far in my wording, but what irritated me was the cringy demos, not the actual app.

              I basically spend more time talking to ChatGPT than humans these days when I'm not on holiday. I'm in no way opposed to it.

              I don't know why, but I always end up going down the devil's advocate route without even realizing it. And then I find myself arguing positions that I don't even agree with. But it's interesting.

iamflimflam1 2 years ago

Article doesn’t address the ridiculously giggly female voice they used for the 4o demos.

  • isoprophlex 2 years ago

    Fake, americanized, subtly sexualized. I'm pretty sad that they went for the lowest common denominator approach.

    I'm not opposed to computers using emotion at all, mind you. But I don't like that arguably the AI company furthest ahead is choosing this gaudy hollywood approach to marketing.

    • dkersten 2 years ago

      The male voice they used didn’t have these qualities…

      I personally hated the female voice, it wasn’t pleasant to listen to. I found it quite grating, making the demo hard to watch.

  • riffraff 2 years ago

    Desi Lydic on the Daily Show had a perfect take on this

    https://www.youtube.com/shorts/51ucQ4s7Crc

thih9 2 years ago

> We support the creative community

I doubt the net result is support. Expanding creative community - yes. But support - not quite; established creators will likely suffer because of openai, most of them already having a harder time.

gaymenexisttoo 2 years ago

Sky has an uncomfortably flirty tone to me. If they're going to go that route, at least include an uncomfortably flirty male voice too. (Or admit Sky is there more for the enjoyment of straight male users than anything else)

saurik 2 years ago

> We believe that AI voices should not deliberately mimic a celebrity's distinctive voice—Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice. To protect their privacy, we cannot share the names of our voice talents.

I love how they have to go out of their way to explain that, rather than train a model on Scarlett Johansson's voice, they came about her voice honestly, by just finding someone who's natural voice sounded the most like Scarlett Johansson... as, clearly, no one is questioning that this voice was chosen to mimic Scarlett Johansson.

  • Havoc 2 years ago

    Pretty sure that’s another joking reference to the movie Her rather than actual concern about similarity. If that was the case pointing it out would attract lawsuits

    • saurik 2 years ago

      I mean... do you not think Sky sounds like Scarlett Johansson? I specifically use(d) that voice in no small part because of the to-me-pretty-obvious similarity.

      (Note that if you select that voice today, you might not actually be using it anymore. I noticed earlier today I was shunted off to Juniper, which I didn't like as much.)

      https://news.ycombinator.com/item?id=40413209

  • defrost 2 years ago

    Some are questioning why they chose not to mimic Stephen Fry https://www.youtube.com/watch?v=J7E-aoXLZGY or Attenborough.

th0ma5 2 years ago

From their Discord server: > @ everyone We’ve heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them. Read more about how we chose these voices [link above, but no further details]

  • easymodex 2 years ago

    Wow, so they actually got bullied into removing the voice? What a world. I actually liked that voice or at least didn't find any issue with it, they are all perky assistant type voices, but why not? Adding extra choices is fine if people want it, but removing the voice due to loud minority effect is just sad.

    • th0ma5 2 years ago

      I don't agree with this outlook at all and I think it is apparent they took cues from sexual fantasies to be honest.

omarfarooq 2 years ago

Would it be possible to use prompting to change aspects of the voice output? Will the voice respond angrily if you ask to act an angry character in a play?

  • prolyxis 2 years ago

    Yes. This was a big part of the GPT-4o demos. But likely they've specifically fine-tune it not to show negative emotions in a serious fashion.

  • iLoveOncall 2 years ago

    You don't even have to prompt it, it will match your tone.

input_sh 2 years ago

I can't help but read this as:

> 400 people did some unpaid work for us, only 5 of which eventually got paid, after 5 months of not knowing will they or won't they.

Fair enough, that's how things work for other voice acting jobs as well, I just doubt it usually takes almost half a year. I wonder how many of the shortlisted 14 have just given up somewhere along the way.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection