Settings

Theme

Falsehoods programmers believe about video

haasn.xyz

250 points by pomfpomfpomf3 9 years ago · 138 comments

Reader

derefr 9 years ago

> rendering subtitles at the output resolution is better than rendering them at the video resolution

I would like to know what's wrong with this approach. I watch a lot of commentated speed-run videos: that's often something like ~244p video, plus soft subtitles. The subtitles get rendered at the source resolution (presumably, into the video framebuffer) and then upscaled along with the image, forcing them to be a tiny blurry mess instead of the crisp, readable text they could be.

  • CoolGuySteve 9 years ago

    It's also missing the most common error I see: conflating subtitles with closed captions.

    Closed captions are positioned on the screen to indicate who's talking, have descriptive audio for sound effects, and should be in a high contrast easy to read font (most people with hearing deficiencies also have problems seeing, ie: out of date prescriptions for both hearing aids and eye glasses).

    As far as I know, QuickTime does it right but the Apple TV, Netflix, and YouTube fuck it up, but that's because I helped write the QuickTime one way back.

    • deadmutex 9 years ago

      AFAIK, The YouTube implementation does all of those.

      Here is a demo: https://www.youtube.com/watch?v=BbqPe-IceP4

      Please do not spread falsehoods.

      Disclamer: I work at YouTube.

      • DonHopkins 9 years ago

        The falsehood you're spreading is that youtube closed captioning is consistently usable for everyone by default.

        This is how my subtitles / closed captions have looked for me on youtube for a year or so now [1] (on up-to-date Mac Chrome). The font is extremely small and blurry and practically transparent, and there is a horrible background color, which is usually yellow until a week or so ago, but has now changed to green for Christmas.

        All I want for Christmas is readable YouTube text. I'm so glad YouTube is trying to keep up with the season's festivities by changing the background color of their absolutely unreadable text from yellow to green, but shouldn't they try to make the text readable by default somehow instead? Maybe a point size larger than 10 points, and a transparency higher than 10 percent, and a neutral or at least less nauseating background color?

        Do all users have different randomly selected fonts and point sizes and colors? Why does it change randomly without any user intervention? Is this some sort of a/b/.../z testing? Get it together, YouTube!

        I most certainly didn't do anything to configure the closed captions like this. Are there keyboard commands so power users can quickly switch fonts to strange colors and point sizes, that my cats may have pressed when walking on my keyboard?

        [1] http://imgur.com/gallery/GOh1t

        • DonHopkins 9 years ago

          Oh my god, apparently it WAS my cat's [1] fault for walking across the keyboard!

          Some genius at YouTube decided to implement persistent keyboard shortcuts that enable cats to easily and stealthily change the closed captioned text into unreadable colors!

          My cat can press "o" to make the text lighter and fuzzier, and press "b" to cycle through a garish series of primary background colors plus black and white, including the same color as the text, rendering it invisible. There may be others, but I can't tell and I'm afraid to try.

          Hoping that my opposable thumbs would enable me to get some help, I pressed "?" expecting to get a list of keyboard shortcuts, but that didn't do anything but violate the Principle of Least Astonishment [2].

          It's not all my cat's fault, though -- some of the blame lies with YouTube: purposefully designing, implementing and not documenting such annoyingly cat-friendly but unhelpfully user-hostile keyboard shortcuts.

          Googling for "youtube keyboard shortcuts" doesn't show any links to official YouTube documentation on the first page of results -- the top featured hit is an outdated page from an "SEO Consultant" full of social networking widgets and ads and self promotion, that doesn't even mention the closed captioning related keyboard shortcuts, which my cat discovered all by himself.

          Does YouTube itself even document its own keyboard shortcuts online anywhere, let alone providing pop-up "?" help?

          And does anybody really think that changing the transparency and background color of closed captioned text is so important that it deserved several dedicated undocumented keyboard shortcuts, no matter what the usability consequences were? Or that the user's inadvertent color and transparency preferences should be persisted across all videos instead of applied per-video? Who would even want partially transparent text anyway, let alone a key to change between several transparencies?

          [1] http://imgur.com/a/33Mrt

          [2] https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...

      • coldtea 9 years ago

        >Please do not spread falsehoods.

        Please assume the other side possibly doesn't know something you know (if that's really the case here), instead of being rude and accusing them of spreading falsehoods.

      • deoxxa 9 years ago

        http://take.ms/fwwhx

        That is frustratingly poor contrast.

        • fgandiya 9 years ago

          I hate it when it does that, but you can change it with the gear thingy.

        • deadmutex 9 years ago

          This might be because you might have changed your settings for CC in the past.

          Here is what mine look like http://imgur.com/HLIVXQ6

          You can click settings again to change font sizes, font family, colors, etc.

          • smnscu 9 years ago

            I have to change my CC settings every damn couple of days because they revert to the same white-on-white 400% bullshit. So thanks for that.

            • deadmutex 9 years ago

              I am not sure, but the screenshot I submitted seems to be the default settings.

              However, software that is 100% perfect is pretty much impossible to write, and if you think there's a systematic issue, please file a bug, so it can help others in same situation.

              • DonHopkins 9 years ago

                I am absolutely sure I never changed my settings, and that they have changed over time without me doing anything. Why would anyone want 50% font size at 25% opacity on a yellow then green background? I didn't ask for that. Yes, this is most definitely a systematic issue.

                The bug reports I've submitted to google have been ignored, and that's a frustrating distraction from what I'm paid to do. Maybe if you submit one yourself, somebody will pay attention, because google is paying you to work on youtube, and hopefully they will take you more seriously than their users.

                • saurik 9 years ago

                  > Maybe if you submit one yourself, somebody will pay attention, because google is paying you to work on youtube, and hopefully they will take you more seriously than their users.

                  FWIW, I always read "file a bug report", when not used to mean "I need more detail" but to mean "talk to the hand" and when spoken by someone working close to a project, as "fuck off", particularly if the person never even bothered to determine whether or not you've used their bug tracker in the past (or even filed a bug already for this specific issue).

                  When I find someone on a forum with a bug that I haven't heard about, I sit around and talk to them until they either get tired of wanting to talk to me or I get the information I need to fix the problem. The alternative would essentially translate to "I don't actually care about this bug", as that's the only way you are going to get certain classes of bug report. I have shown people at Apple bugs that they were absolutely fascinated by momentarily and then told "File a Radar". I clearly wasn't in a position to do at the moment and which of course I forgot to do it when I got home... they should know this happens, because this assuredly happens to almost every single person they tell that to (and no, "well, we do see a large number of bugs filed" is not evidence against "people you tell to file a bug using your arcane system, particularly if they have to do it days later, probably won't"), and yet even when a potentially rare and real and critical bug is shown to them in person (this was even at an event where the whole point was to work with customers on their issues), their response is easentially "engh, I don't care if this doesn't work unless it affects a ton of people". As someone who works in security, I'm going to assert "do you want vulnerabilities? because this attitude is how you get vulnerabilities": every bug is precious as it is a mistake in your mental model of the software, and who knows how far down the rabbit hole that mistake will take you.

                  Sure: I realize that the engineer isn't always the best person to do this, and even in my tiny company I had to solve that, but the solution isn't to tell people to "go use the bug tracker", a comment which shunts annoying work learning a new system, one which is all too likely to demoralize them (Apple's Radar is a great example of this), but instead to have someone whose job is to talk to people to follow up with credible bugs: I'd go "hey Xyz, there's a guy on this forum who's complaining about something I hadn't heard of before; can you try to get more details from them?" (where Xyz has changed over the years, but has always been one of the few key positions). I couldn't begin to count the number of times I have debugged an issue with someone on reddit.

                  • DonHopkins 9 years ago

                    The rule of thumb I used was: if Google is paying someone enough to go online and post defenses of youtube like "Please do not spread falsehoods", and tell me things I already know like "software that is 100% perfect is pretty much impossible to write," then part of his paid job should also be filing bug reports using the bug tracking system he uses every day, and probably already has an account logged in and a tab opened on, when the falsehoods turn out to be true.

                    If YouTube were open source, and I could look at the source code of the keyboard handler to find the cause of the problem myself, prove my bad experience was not just a falsehood to be brushed off, and possibly even suggest a fix, then maybe I would have been more motivated to put my own time into filing a bug report.

                    But Google is a huge well funded advertising company that payed billions of dollars for YouTube and makes billions of dollars off of it, has a huge complex system set up for digital rights management, promoting and paying for advertisements, enabling copyright holders to report violations, paying many employees for actively pursuing and resolving those copyright violations, removing inappropriate content, hiring conservative lobbyists and sending executives to kiss Donald Trump's ring [1], etc.

                    So I would expect YouTube employees to put at least as much time and effort into reporting bugs about their own product to their employer, as they put into monetizing YouTube while defending its reputation from people they perceive as spreading falsehoods about it.

                    [1] http://www.reuters.com/article/us-usa-trump-google-idUSKBN14...

                    • deadmutex 9 years ago

                      I am not getting paid to do this, and actually am taking some time to be with the family for holidays. I just saw the original comment saying that YT "fuck(s) it up" when it comes to CC, and it looked OK to me. So, I just wanted to share my results so that people do not assume "it's screwed up everywhere". I do know that the engineers I have met at work really want to try to do the right thing for users.

                      I don't have access to file a bug through a work account for the next few days, and if you come across any issues (like CC broken by default), please file a bug with a lot of details. People do look at that stuff. I am glad that you found the source of the issue, and I hope you can agree that it would've been impossible to find it if I had just filed a bug. I do not work anywhere close to the team that implemented the CC, and when people have said "file a bug" to me in a work context, they have meant it as a way to "let's keep track of this so it's not forgotten". Luckily, the people I have met at work have been good about this. I do not speak for Google or anyone else there, just sharing my own personal experience.

                      • DonHopkins 9 years ago

                        Thanks for responding. I'll submit a bug if I can, now that I know the cause the problem. But I need to know where best to submit it.

                        It's a design and documentation bug, that needs to be addressed at a higher level by re-evaluating the decisions and justifications behind all the keyboard accelerators, removing the ones that nobody actually uses and that cause more problems than they solve (like making closed captioned text transparent and changing its colors), implementing full and immediate "?" keyboard help, and writing some online documentation.

                        So should I simply click "send feedback" on any random youtube video and write up my suggestions, as this page tells me to? [1] I've done that now, so let's see what happens.

                        Do you really sincerely think my suggestion will actually make it back to the designers through that channel and that changes will happen as a result? Is there a way for me to track it?

                        Or is there a better accountable bug tracking system that I can actually submit a real trackable bug into and watch the progress and see if it gets marked "will not fix", like https://bugs.chromium.org but for youtube? Do you have access to a better bug tracking system for youtube that's not public?

                        [1] https://support.google.com/youtube/answer/4347644?hl=en

                  • deadmutex 9 years ago

                    Unfortunately, I don't work close to the team that implemented the CC system, nor do I know the language it is implemented in very well. And, I do not have access to my work account for a few days (I am taking some time off to be with family), so I can't file a bug with my work account either. It seems like the problem has been resolved, and it was tracked down to the owner's cat.

                    Luckily, at YT, I have not met anyone that said "file a bug report" and meant "fuck off". I have worked with some people in the past at a different company that have meant that, but not here. Usually it has meant "let's file a bug to not forget about it", this is just my experience, and I just wanted to share it. I am only speaking for myself, and others might have different experiences.

            • DonHopkins 9 years ago

              How many cats do you have? Any indoor pet chickens or chimpanzees? Or even mischievous children?

    • tantalor 9 years ago

      Okay, so how are subtitles different?

      • Shendare 9 years ago

        Subtitles are simply translations of speech or text into another language, and are generally (though not always) center aligned near the bottom of the frame.

        • grkjusdgf 9 years ago

          This is wrong. Subtitles and closed captions serve the same purpose and the terms are largely interchangeable. It's an Americanism to associate subtitles specifically with translations or closed captions specifically with EIA-608.

          The essential function of subtitles and closed captions is to enable a viewer to read dialogue (or contextual audio elements) without needing to either hear or understand the audio. It may be in the same language or not.

          As one example, in some Chinese markets TV and movies are all subtitled in Chinese, not (primarily) for the deaf, but because the standard Chinese subtitles are intelligible to readers whose only spoken language is a mutually unintelligible dialect.

          • mentat 9 years ago

            Sorry, but this is wrong. Closed captions may include non-speech audio as text for hearing impaired which subtitles will assume just speech translation not things like "[wind noise]".

  • akiselev 9 years ago

    I think that point should be amended to say "rendering subtitles at the output resolution is always better than rendering them at the video resolution." You don't want to upscale 244p soft subtitles to 1080p but you do want to default to giving video authors creative control over how the subtitles are displayed. The ASS subtitle format allows for some very complex styling that can be used as an artistic element in video (or just to make sure there's proper contrast, can be read by color blind people, character differentiation, etc.) so you generally don't want to assume anything. There's also the issue of coordinates for where the subtitles are supposed to be that all go to shit if you render them on a transformed (up/downscaled) frame.

    • haasn 9 years ago

      This comment is pretty much what I was going for. I've reworded it to make it clearer.

      The issue you can run into in practice is stuff like softsubbed signs, which can clash and look out of place with the native video if you render them at full res. There's also a related issue, which is that if you're using something like motion interpolation (e.g. “smoothmotion”, “fluidmotion” etc. or even stuff like MVTools/SVP), softsubbed signs will not match the video during pans etc., making them stutter and look very out-of-place - the only way to fix that is to render them on top of the video before applying the relevant motion interpolation algorithms.

      Personally I've always wished for a world in which subtitles are split into two files, one for dialogue and for signs, with an ability to distinguish between the two. (Heck, I think softsubbed signs should just be separate transparent video streams that are overlayed on top of the native picture, allowing you to essentially hardsub signs while still being capable of disabling them)

      Also, sometimes, rendering at full resolution is prohibitively expensive, e.g. watching heavily softsubbed 720p content on a 4K screen.

    • JoshTriplett 9 years ago

      > There's also the issue of coordinates for where the subtitles are supposed to be that all go to shit if you render them on a transformed (up/downscaled) frame.

      Sure, you have to transform the coordinates to the output. But still, better to render fonts at the final resolution; they'll always look better than if scaled after rendering.

      • akiselev 9 years ago

        > Sure, you have to transform the coordinates to the output. But still, better to render fonts at the final resolution; they'll always look better than if scaled after rendering.

        The font will look better but you have zero guarantee that the subtitles will be better too. Furthermore, you will lose any artistic value that the creator intended.

        For example, go get the Russian movie Night Watch and watch it with the original subtitles hardcoded and as a separate file. The director insisted on doing the subtitles himself and he used them for great artistic effect throughout the movie [1]. Watch it with scaling and aspect ratio stretching to see how nicely rendered, crisp high resolution fonts can be inferior to a pixelated, stretched version created with intent by an artist.

        [1] http://readingsounds.net/wp-content/uploads/2015/12/NightWat...

  • kazinator 9 years ago

    Maybe nothing is wrong; just that maybe it's not always strictly better. Suppose you are asked to form a plan for adding subtitle support to some unfamiliar video platform. It's probably best to start with an open mind about where in the pipeline subtitles will be composed with the video.

  • the8472 9 years ago

    In fact, rendering subtitles at the display resolution is one of the big selling points of the xy-subfilter + madvr renderer combination.

    The only practical downside I have noticed is that accurate rendering of subs containing complex vector graphics or effects (ASS supports that) at > HD resolutions takes a lot of CPU time, sometimes more than a single core can handle in realtime.

    There probably is a lot of potential for optimization, but those are hobby projects for their maintainers.

  • jheriko 9 years ago

    the point is precisely that it is more complicated than this obvious interpretation.

    whilst i don't necessarily agree... i do agree that if you want to conform to specs then you can't go thinking this way.

franciscop 9 years ago

The original one ( http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b... ) left me bafled. Then I realized you have to strike a balance; otherwise you cannot deal with names at all. The point where drawing the line depends on your industry/customers, but I'd safely say that it's too restrictive nowadays so these lists are useful somewhat and of course they are interesting.

  • kibwen 9 years ago

    It's true that you have to draw a line somewhere based on technical and business constraints, but an important takeaway of the names article is that you almost certainly don't need to do anything with a name other than treat it as an opaque string that can be displayed back to the user. For example, I'm struggling to think of a good reason why user registration would require separate first name and last name fields, and yet this practice is overwhelmingly common. For that matter, why do you want my real name at all, considering that it can't be used as a unique ID anyway?

    • smallnamespace 9 years ago

      But sometimes your government expects names to be broken into given and family names (the US, Japan, and China all seem to make this assumption -- every government form I've seen from those countries wants your name fully broken out).

      I've no direct experience with, say, Russian or Latin American governments, but cultures that use explicit patronymic or matronymic names might expect that broken out as well.

      If you ever need to submit user data to the government (e.g. for tax reasons), and you don't ask your user to break the name apart, then you will necessarily be guessing, which seems strictly worse than just asking them how their name might split.

      At the end of the day, if you operate in a given culture, then you need to address those cultural norms. Bending over backwards to support every possible edge case seems unwise if they also happen to disagree with those norms.

      • Muromec 9 years ago

        >I've no direct experience with, say, Russian or Latin American governments, but cultures that use explicit patronymic or matronymic names might expect that broken out as well.

        In Ukraine we either have three fields in government forms (Family name, Given name, Patronymic) or two (Family name, Given name), for example on ticket booking forms. Or just one, but you should write names in FGP order.

        Funny thing is that patronymic part is left out in transliteration, so in travel documents you see FGP form in Cyrillic and FG form in Latin letters. Transliteration algorithm is a bit funny, so people tend to have different Latin spelling of same name. And even different Cyrillic spelling of same name, depending if it's written in Ukrainian or Russian - Ukrainian, because Russian doesn't have dotted and double-dotted "i", the use "и" instead.

        In past the formal way to address people used given name and patronymic, but that's not that true anymore.

        In documents, sometimes full FGP form is only used at the start of the document and subsequent uses just include family name and first two letters of given and patronymic name. Tha is the only thing you can safely do automatically. Signatures also use this short form.

        Other thing is that, male and female version of patronymic and family name can differ, so you can't even compare names, not just process them automatically.

        And the good thing with patronymic names - combination of three names and birth date uniquely identifies 99.7% of voters, so this really matters.

      • kibwen 9 years ago

        I'm explicitly disregarding government websites here, since the government has a legitimate reason to care about my full name (and gets to define what a "legal" name means), and also 99% of the websites that I sign up for are unrelated to the government. There's no reason for a social network to report the names of all its users to a government agency automatically, and I'm skeptical that even my bank would have such a requirement.

        • smallnamespace 9 years ago

          I don't mean reporting all the names, but for example if you ever transmit payments there is a KYC process, and if you are a bank you must report any suspicious money laundering activity.

          At the end of the day there are cultural conventions around names, and various agencies use them. I don't see why software should be explicitly culturally neutral, unless your audience is explicitly a global one (and even then, I think localization is preferable to just sticking names into a single field).

          • Thrillington 9 years ago

            There is significant engineering cost to implement localization correctly. If you're a bank or in another market where you interact with a government, then you must bear those costs. If not, why not just use an opaque string and spend your engineering dollars on your actual product?

    • Frondo 9 years ago

      That's easy, it's to make better marketing materials.

      "Dear [first name]," flows better than "Dear [opaque string],".

      • kibwen 9 years ago

        That's what the marketing department may believe, but in reality that's super creepy. Let's not pretend that I'm on a friendly first-name basis with corporations, or that in reality I'm anything more than an autogenerated numeric ID in a database as far as the corporation is concerned. Furthermore, nobody but my mother calls me by my real first name, and this holds for half of my cohort. And why does any company think that their marketing correspondence is welcome in the first place?

        • mysterypie 9 years ago

          > nobody but my mother calls me by my real first name, and this holds for half of my cohort

          Really? I've been assuming that addressing people by first name--even people you've just met--is now the default, at least for the United States and Canada. Are you in the USA/Canada, by the way?

          I know that it used to be rude to address someone by their first name unless you knew them well. You had to say, Mr. last-name or Mrs/Ms./Miss last-name. I know this from old movies.

          But I thought that the etiquette has changed completely: First name is fine and last name sounds rather formal. Do others have a different experience?

          • kibwen 9 years ago

            For whatever reason, a large number of the people I know go by names that are not their legal first name. Some go by their middle; some drop letters from their first name in idiosyncratic ways; some keep the same pronunciation but have a different spelling; some change the spelling while leaving the pronunciation and then drop letters; some go by irc nicks even in person; a rare few just invent words that have nothing to do with their names and go by that. Maybe it's a unique symptom of growing up in an era where we communicate via online systems that demand unique usernames?

            • nercht12 9 years ago

              Could be. If you've heard of someone primarily referred to by their online pseudonym, it may be less awkward calling them that than trying to guess at how they wish to be addressed using their real name. It's still awkward though.

          • dragonwriter 9 years ago

            > Really? I've been assuming that addressing people by first name--even people you've just met--is now the default, at least for the United States and Canada.

            This is incorrect. Calling people by something less formal than Mr./Miss/Mrs. <family-name> is certainly the current norm, but the alternative used is a personal choice of the person being addressed and often different from (sometimes, though far from always, a shortened form of) the legal personal name.

            > I know that it used to be rude to address someone by their first name unless you knew them well. You had to say, Mr. last-name or Mrs/Ms./Miss last-name.

            It remains rude to address someone by less formal terms until and unless you have sufficient contact with them to know the less formal appellation that they prefer you to use. It is more expected now than in the past that people will very quickly accept the use of less formal address and inform you of their preferred form.

            It is also more common for businesses wishing to feign familiarity to presume that first name information from a customer registry, credit card, or other source is equivalent to stating a preferred form of address and consenting to have the businesses agents use that form; it is not, and quite a lot of people react badly to it. You would be well advised not to imitate those businesses.

          • tehwalrus 9 years ago

            I really really dont like being called "Joseph" in long form. I know there are some Matthews, Benjamins and a Sybille who agree...

          • nercht12 9 years ago

            I personally don't like the shift, even being in the younger crowd. Calling someone by their first name sounds not only presumptuous but leaves you in an awkward spot of guessing whether to use a nickname (you may heard they've been given) or their full first name. With some coworkers, it's fine 'cause we're on a friendly basis anyways, but if I don't know from Adam, I prefer people don't use my first name. It only lowers my respect for them.

            • Thrillington 9 years ago

              Give me a genderless honorific and I'm right there with you. Otherwise, a quick correction to ask for use of a nickname seems easier and less embarrassing all around than presuming gender incorrectly.

          • dom0 9 years ago

            > Really? I've been assuming that addressing people by first name--even people you've just met--is now the default, at least for the United States and Canada.

            Nope.

            > I know that it used to be rude to address someone by their first name unless you knew them well.

            Correct.

            > But I thought that the etiquette has changed completely: First name is fine and last name sounds rather formal. Do others have a different experience?

            Outside of school/university it's simply presumptuous. Don't do it.

            • Frondo 9 years ago

              Perhaps it's regional. These countries are both huge, with a wide variety of social norms represented. Out here on the left coast, first names seem to be the norm, as the GP poster surmises.

              I can't actually remember ever being in a business meeting/academic setting, where Mr. or Ms. was used, at any point. In movies, sure, but it really does seem quaint.

              • dragonwriter 9 years ago

                > Out here on the left coast, first names seem to be the norm, as the GP poster surmises.

                No, it's not. Preferred personal informal names are the norm for anyone you've been introduced to (and are normally part of that introduction); those may be legal personal ("first" in the usual English order) names, but often are legal middle names, derivative forms of either first or middle names, or names distinct from any legal name.

                • Frondo 9 years ago

                  I'm really quite reluctant to deny my life experiences up til now on your word, telling me they've been wrong experiences. I'd guess we've just seen different things out in the world.

                  • dragonwriter 9 years ago

                    I don't think we are, since the region you claim the behavior generalizes too is the one I am also most familiar with. I think you are simply making the mistake of confusing "a person's preferred appellation, which is sometimes their legal first name but very often something either subtly [as in shortened form] or radically different from.the first name" with simply "first name".

        • Frondo 9 years ago

          I understand what you're saying, but I find the first-name greeting stuff to be probably the least problematic of marketing behaviors big businesses can do. Send me your regular catalog of electronics parts, but it's got my name on it? Sure, whatev, Mouser Electronics.

          Far creepier is the thing where they use your purchase history to make predictions about your health condition, like where Target would send out baby-related stuff when its algorithms discerned that customers were likely pregnant: http://www.businessinsider.com/the-incredible-story-of-how-t...

          Compared to that stuff, "Dear Frondo" seems absolutely benign.

      • JoshTriplett 9 years ago

        I've run into a couple of sites that ask for "nickname" or "preferred form of address", which you could store as a separate opaque string, if you really want that.

        • Frondo 9 years ago

          That's great, that's an even better solution. I like that. If I'm ever in a position to make a recommendation about this sort of thing again, I'll use that idea.

        • kibwen 9 years ago

          Yes, I'd so much rather have an explicit nickname field if a company wants to insist on calling me anything other than my username.

  • innocentoldguy 9 years ago

    People's names is one area I think developers constantly bugger up. For example, my last name has a space in it, but half the websites I register on either:

    1. Throw an error and won't let me enter my last name as it is supposed to be spelled.

    2. Truncate the last part of my last name.

    3. Try to be clever and end up shoving the first half of my last name into a middle-name field.

    My preference for names, addresses, and other personal data is to stop trying to constrain people to preconceived "standards" and just let them enter their information the way they want it to be.

    • sdenton4 9 years ago

      I'm a fourth, and occasionally the IV suffix gets horribly mishandled. I recently booked a flight on Travelocity and had to change arrangements by looking up the reservation on the actual airline air, but it turned out they had dropped the space between my last name and the IV, making it impossible for me to log in until I figured out what had gone wrong...

    • franciscop 9 years ago

      As a Spaniard I have two last names (no middle name, no one last name with space) but 0% of the non-spanish websites I have ever seen accommodate for this.

      Normally I just use it with an space in the last name field, but then I get exactly the same problems you mention.

  • artpepper 9 years ago

    What I take from the names article is to make sure you understand your requirements. A blogging platform will have different requirements than, say, electronic medical records.

    • nradov 9 years ago

      For EMRs, the HL7 FHIR HumanName datatype covers most of what's needed. Patients can have 0 or more names, each of which can have a full string representation plus 0 or more separate family / given / prefix / suffix components. And each name can be tagged with a usage type (official / maiden / nickname / etc) and validity date range.

      http://www.hl7.org/FHIR/datatypes.html#HumanName

      • artpepper 9 years ago

        Interesting! Especially the inclusion of a time period field. That's important in time zone information, too.

  • agumonkey 9 years ago

    Wasn't there one about time and timezones ?

    So many mundane things have the "how hard can this be" ..

  • JoshTriplett 9 years ago

    Yeah, the same applies to many of the falsehoods in this list. Some of them you have to get right; some of them you can probably ignore without causing practical problems.

nhaehnle 9 years ago

This is a good list, but it would be so much better with some (brief) pointers to counter-examples to the beliefs.

  • unscaled 9 years ago

    This unfortunately follows the conventions of the genre called "Falsehood programmers believe about X": http://spaceninja.com/2015/12/08/falsehoods-programmers-beli...

    I honestly think this genre is horrible and counterproductive, even though the writer's intentions are good. It gives no examples, no explanations, no guidelines for proper implementations - just a list of condescending gotchas, showing off the superior intellect and perception of the author.

    • simias 9 years ago

      I agree, I think this format works when the subject matter is trivial enough that it's easy to construct counter examples yourself once the contradiction is pointed out.

      The "Name" version is a good example of that, I can easily see how most of the examples on this list can be falsehoods.

      On the other hand in TFA some of the affirmations leave me more perplexed. For instance, regarding color conversion: "converting from A to B is just the inverse of converting from B to A". I wonder what's meant here. Is it just a matter of rounding or is there more to it than that?

      The catch 22 here is that if you understand this list then chances are you already knew about most of these gotchas.

      So yeah, a pretty bad format. Now we just have to write "`Falsehood programmers believe about X` considered harmful".

      • dom0 9 years ago

        > I wonder what's meant here. Is it just a matter of rounding or is there more to it than that?

        Many colour spaces are non-overlapping, ie. one colour space has colours a different colour space simply doesn't have, so converting between them is often lossy and thus non-invertible.

        • dragonwriter 9 years ago

          > Many colour spaces are non-overlapping, ie. one colour space has colours a different colour space simply doesn't have

          Wouldn't that be overlapping but non-coextensive? Non-overlapping would be no colors in common between color spaces, which would be odd.

          • dom0 9 years ago

            Yes. I didn't realize that difference between my intermediate language and the output language of these comments :)

    • kdeldycke 9 years ago
    • inopinatus 9 years ago

      Perhaps there is scope for a list of Falsehoods Programmers Believe About Falsehoods Programmers Believe.

      • mojuba 9 years ago

        Let's start then:

        1. Everything said in every "Falsehoods Programmers Believe..." list is true.

        The Falsehoods sound like ultimate truths only because of the literary genre. They sound like they were written by an expert who not only knows what's true, but also knows what we think we know, which kind of automatically takes him/her to the next level of expertise.

        • unscaled 9 years ago

          3. Every falsehood that is true should be accounted for.

          4. Every falsehood that is true CAN be accounted for.

          5. Making your code compatible with a falsehood doesn't come with a price.

          6. There are no falsehoods which are mutually exclusive.

        • kirillkh 9 years ago

          2. There exists a "Falsehoods Programmers Believe" list that is entirely true.

      • kdeldycke 9 years ago

        Love that idea! Please help me compile a list there: https://github.com/kdeldycke/kevin-deldycke-blog/blob/master... :)

    • TazeTSchnitzel 9 years ago

      Aren't the falsehoods inherently guidelines? They give you an idea of which assumptions aren't safe to make.

    • icebraining 9 years ago

      It's true that examples and explanations would be nice and make for a more helpful guide, but those can usually be found with a bit of legwork, whereas the gotchas themselves are often only discovered by trial and error. In essence, don't look a gift horse in the mouth.

      A better approach would be to pick the list up and turn them into a collaborative work. Wiki, maybe?

  • tedunangst 9 years ago

    Just reimagine it as "implicit assumptions to check for". From my limited experience in the field, a lot of these are things I know (or knew) but could easily forget in the midst of trying to get code to work.

  • imsofuture 9 years ago

    I hate the smug attitude of things like this. I get they're trying to raise awareness of a thing, but maybe take a moment to educate, instead of just smugly dunking on people about how much more you know about a thing that they don't.

donatj 9 years ago

- "all subtitle files are UTF-8 encoded"

Hah, this strikes really close to home. I've had to work with so so many subtile files in Eastern European and Turkish Windows codepages mostly but not entirely compatible with Win-1252. There's no way to tell them apart programmatically, so you check that the extended characters make sense. It's a bit of a nightmare.

smallnamespace 9 years ago

This article would be infinitely better if it any provided counterexamples.

iopq 9 years ago

> my hardware contexts will survive the user’s coffee break

hell, they don't survive alt-tabbing into a game that has a different resolution than the monitor

  • pvdebbe 9 years ago

    Heh... for some reason youtube can't survive when I start a video on my monitor and then I switch outputs to TV using an xrandr script by closing one output and opening the other. I thought it was possible to continue the video that way but once I noticed it doesn't work, it made sense immediately.

    Mplayer and co, on the other hand can cope with it but my window manager can mess it up so I don't bother.

tuxidomasx 9 years ago

This list makes me not want to program any [video stuff]

scottlamb 9 years ago

From the article:

> I can exclusively use the video clock for timing

Heh. I just finished writing up a design doc to address problems I had with this, and I referenced "Falsehoods programmers believe about time". Then I opened Hacker News and saw this article. So this is very timely for me.

(My doc: https://github.com/scottlamb/moonfire-nvr/blob/new-schema/de...)

jheriko 9 years ago

it is true, video is a nightmare mess littered with weird functionality nobody needs. (limited range only just disappeared in rec 2100, optionally??? really??? i'm not worried about my electron gun in my CRT from 1975 these days...nor do i want to know what a Y or a Cb or a Cr means because everything is RGB and B&W TV is long dead... and 4:2:2 is not exactly compression so much as computational overhead etc.. etc.)

its a nightmare, but the reason for these observations is precisely that it shouldn't be a nightmare. this area of programming is a wasteland ... nobody that good wants to solve these trivial problems :/

  • mrob 9 years ago

    Chroma subsampling isn't going anywhere. You'll usually get subjectively better quality with 4:2:0 chroma compared to 4:4:4 at the same bitrate. And this means you can't have everything in RGB, so all the colorspace conversion complexity can't be ignored.

    Try experimenting with chroma subsampling in JPGs, but note that not all image viewers have good chroma upscaling. MPV can display still images as well as video and you can choose the chroma scaling algorithm.

    • haasn 9 years ago

      > Chroma subsampling isn't going anywhere. You'll usually get subjectively better quality with 4:2:0 chroma compared to 4:4:4 at the same bitrate. And this means you can't have everything in RGB, so all the colorspace conversion complexity can't be ignored.

      What's more, YCbCr is more efficiently compressed than RGB even if you don't subsample, for the same reason that a DCT saves bits even if you don't quantize: Linearly dependent or redundant information is moved into fewer components, in this case most of the information moves into the Y channel with the Cb and Cr both being very flat in comparison. (Just look at a typical YCbCr image reinterpreted as grayscale to see what I meant)

      • jheriko 9 years ago

        isn't it the case that amount of data required to store the result of a lossless DCT is bounded below by the size of the data, and this is why lossless JPG compression does not use such a scheme?

        • haasn 9 years ago

          I'm not actually sure. In retrospect, I'm not sure what ‘DCT without quantizing’ really means, since the output of the cosines are probably real numbers? I guess the interpretation would be quantized to however many steps to reproduce the original result when inverted (and rounded).

          In lossless JPEG it seems they omitted the DCT primarily for this reason: It not being a lossless operation to begin with, if you actually want to store the result. What other lossless codecs often do is store a lossy version such as that produced by a DCT, alongside a compressed residual stream coding the difference (error).

          In either case, it's important to note the distinction between reordering and compressing; reordering tricks like DCT can reorder entropy without affecting the number of bits required to store them, but the simple fact of having reordered data can make the resultant stream much easier to predict.

          For example, compare an input signal like this one:

          FF 00 FF 01 FF 02 FF 03 FF 04 ...

          By applying a reordering transformation to move all of the low and high bytes together, you turn it into

          FF FF FF FF FF .. 00 01 02 03 04 ..

          which is much more easily compressed. As for whether that's the case for (some suitable definition of) lossless DCT, I'm not sure.

    • jheriko 9 years ago

      you are right, there is a weird grainy sharpness kind of 'feeling' when chroma is not subsampled...

      but you get the exact same effect from higher resolutions, e.g. going from SD->HD->2K->4K we see the same thing... and we are still doing it, so i would question highly that it is subjectively better in a long-term sense given this continuing trend.

      i remember hearing people discuss this sort of thing when HD was new, and they stopped after while - i suspect because they got used to it, and they now realise how low the quality of the SD image was. i noticed this in myself as well...

      edit: incidentally there is a discussion about this here (first google thing i found): http://www.neogaf.com/forum/showthread.php?t=1308591

      its seems either nobody or very few are taking the perspective that 4:2:0/4:2:2 looks better, and there are even a few descriptions of precisely what they notice as being worse.

      • haasn 9 years ago

        I believe you are misunderstanding the issue.

        Nobody is trying to argue that 4:2:0 video looks objectively superior to 4:4:4 video if given a free choice. Obviously, full chroma information will always be better, such as is the case for something like a PC monitor vs a TV with subsampling.

        The problem is that 4:4:4 chroma requires more bits to compress, so when you're designing a video/image codec, you have to ask yourself whether the difference in bitrate between 4:2:0 and 4:4:4 is worth the difference in quality, and the answer seems to be “no”.

        This means that when you're serving, say, a 5 Mbps youtube video where the bitrate is already fixed, 4:2:0 is going to give you more bits to put into useful stuff (e.g. luma plane) instead of having to waste them on mostly-redundant chroma information.

  • kierank 9 years ago

    Please let me know what you do when you have filter overshoots or undershoots in full range. Limited range is there for good reason.

    • jheriko 9 years ago

      er... what? can you explain that a bit i think i must misunderstand?

      what i think of as undershooting or overshooting is relative to the range... and besides that, what is wrong with clamping? its how computer graphics has always had to deal with these things... limited range simply doesn't exist in that context, and it doesn't harm anything.

      when computer games are forced into limited range for consoles you don't get these unless your tv is applying one of those god awful filters that ruins everything anyway... (i'm still not sure why so many tvs have these - reference monitors never do anything this insane) ... but i can tell you what you do get, a subjectively /and/ measurably worse quality of image than from a monitor.

      (i don't think i'm alone in this based on the contents of the ITU-R BT.2100 either... which defines a full range as well as a 'narrow' one)

      • haasn 9 years ago

        > what i think of as undershooting or overshooting is relative to the range... and besides that, what is wrong with clamping? its how computer graphics has always had to deal with these things... limited range simply doesn't exist in that context, and it doesn't harm anything.

        As far as I understand it, limited range was historically used so you could use efficient fixed-function integer math for your processing filters without needing to worry about overflow or underflow inside the processing chain. You can't just “clamp back” a signal after an overflow happens.

        Of course, it's pretty much irrelevant in 2016 when floating point processing is the norm and TVs come with their own operating systems, so these days it just exists for backwards compatibility with the existing stuff - which is a property that video standards have tried to preserve as much as possible since the early beginnings of television.

lolc 9 years ago

And this is why I don't do video. (And have lots of respect for the people who write the libraries I use.)

FranOntanaya 9 years ago

Could write an entire page just on subtitles.

antirez 9 years ago

There is a lot of potential information in such a list. But in this form is quite a "trust me" thing that does not really add to the reader knowledge.

milansuk 9 years ago

Nice one! Now I would like to see article like this, but about ciphers, hashes, digital signitures, etc.

the_duke 9 years ago

An explanation for each 'falsehood' would have been nice

ryanmarsh 9 years ago

Well video programming just sounds delightful.

/sarcasm

justinlaster 9 years ago

> a H.264 hardware decoder can decode all H.264 files

and

> video decoding is easily parallelizable

At a previous job, I don't know if it was just the field I was in or just bad luck, but having to explain this over and over again was kind of a personal nightmare.

That being said, this is an excellent list!

  • wstrange 9 years ago

    Curious - Why is this? Does this assume streaming video, and you can't look ahead in the stream?

    If you can jump ahead, it would seem to be easy to have multiple threads, starting at key frames to decode the content. You'd have to splice them together, but this seems possible.

    • justinlaster 9 years ago

      > it would seem to be easy to have multiple threads, starting at key frames to decode the content.

      It's a resource issue (memory, cpu, etc; and meeting latency requirements between those constraints), versus the subtly different standards "H.264" hardware and software follow, as well as a few other intricacies with how the whole standard works anyways. Again, it's not that it can't be done, but as the article says it can't be done easily or at least in certain situations done consistently.

      Key frames are a good anchor around anything you're doing with H264 (and other formats), but it's not the end all and be all -- and they may even cause you trouble if you "trust" them too much. It is perhaps a bit like date time programming. You can create something fairly easily that works for a decent amount of time, and even if it ends up being incorrect your clients may not even notice... or it may breakdown in a catastrophic manner in the future. But doing the latter is certainly not correct and it's certainly not professional. Quite honestly, I'd say date time programming looks like a dream compared to the inconsistent nightmare that is video programming. Date/time logic needs to be sound because many programs rely on consistent and sane output from a program perspective, where as video programming gets to slide as long as the output is generally correct from a human visual perspective.

      It's been a few years since I've dived into this stuff, so some things may have changed/gotten cleaned up. But the article seems to indicate that the ecosystem hasn't really changed.

    • saurik 9 years ago

      1) You are now assuming that "seeking to a position will produce the same output as decoding to a position"; even if the video is well-formed (and you don't end up with massive issues where the key frames just don't work correctly) you are likely going to end up with subtle discontinuities between every segment. 2) You are now going to have to be buffering a couple seconds worth of uncompressed video somewhere, probably not on the GPU, leading to a much higher I/O bandwidth requirement somewhere that isn't good at that, so this is only probably going to be sort of parallel (FWIW, I believe most people who try to do parallel video decoding are assuming that they can have different parts of the encoder concentrate on different sections of the screen, which sounds good until you see how non-local video decoding can be).

      • the8472 9 years ago

        > 1) You are now assuming that "seeking to a position will produce the same output as decoding to a position"; even if the video is well-formed (and you don't end up with massive issues where the key frames just don't work correctly) you are likely going to end up with subtle discontinuities between every segment.

        Wouldn't "the keyframes just don't work correctly" result in corrupted output anyway?

        If we're worrying about already-broken situations then it is quite obvious that additional breakage may occur in related features.

        • msandford 9 years ago

          I think the point is that video definitely is that broken and the only reason video does work is because everyone has work-arounds for everyone else's bugs. At least that's my experience with video. It's all a disaster.

          • gkop 9 years ago

            Yes, this. Working with video is as though there were no such thing as a documented API or standards document, but instead, you find the longest-lived bugs in the popular toolchains and in the clients of your customers, and those bugs are the foundation of the interfaces you implement.

          • pdkl95 9 years ago

            I believe[1] this isn't necessarily about broken files. There is a lot of variation allowed by the spec. One example that I've seen in the wild is extra-long (> 60 seconds) periods between I-frames. Seeking to an arbitrary point either requires searching backwards from the seek-point for an I-frame and storing a massive amount of RAM. As this usually isn't possible and would require decoding hundreds of frames, decoding may cheat and make do with as many P and B frames as it can handle.

            [1] I haven't actually read most of the h.265 spec. It's possible these are technically invalid files.

            • the8472 9 years ago

              a 1-minute span for I-frames would not be prohibitive for parallel processing that the quotes part was referring to, with a 60-minute video it would still give you 60 segments to process in parallel.

              • pedrocr 9 years ago

                A single uncompressed frame of 1080p video occupies 28MB in RAM, so 1 minute of 24fps video will take up 40GB. If you want to be able to run 4 cores at once it's 3 times that. You won't be doing that any time soon on your laptop or smartphone.

                • gkop 9 years ago

                  Curious as to your math? My naive thinking is 1920 * 1080 * 8 (generous) bytes is around 16MB.

                  • pedrocr 9 years ago

                    I forgot where I got 28 from but it's indeed a mistake. For normal display you could get away with 1920 * 1080 * 8bit = 6MB. For a 10bit display it would be around 8MB. You do indeed often use 32bit float for high-quality processing but since what we're storing here is the output frame you would finish all that processing and then go down to 8 or 10bit per channel. So recalculating the math that's 8GB for 1 minute of video, still way too impractical.

                  • vilya 9 years ago

                    I think the grandparent post is talking about decoding to RGB with a full 32-bit float per channel, which is 12 bytes per pixel rather than 8. The high precision is needed for HDR and for the extra processing you have to do to the pixels after they're decodeed - motion compensation, gamma correction, etc.

                • the8472 9 years ago

                  The maximum number of references frames, i.e. how much the Decoded Picture Buffer has to hold, is 16. So even if a GOP is 1 minute long you would have to hold at most 16 pictures in memory to have enough information to stream over that 1-minute segment.

                  So I still do not see how this would prohibit parallel processing.

                  • pedrocr 9 years ago

                    Not sure how that would work. You have a thread that's decoding the frames 1 minute in front of where playback is, so if you're not decoding full frames and storing them until you need to display them what is that thread doing?

                    • the8472 9 years ago

                      transcoding or video editing in slices is a common application.

                      you cut the video into a handful of parts at keyframes, process the parts individually in a streaming manner and then splice the partial results together.

                      If we're talking about playback then creating seeking-thumbnails could similarly benefit from parallel processing.

          • the8472 9 years ago

            Many of the listed points in TFA are not about broken-ness. A good chunk cover rarely-used features or less commonly used codecs for advanced applications.

        • brigade 9 years ago

          As an example, there exist bitstreams where there aren't actually any keyframes, but instead the encoder guarantees that the decoder output converges to correct after decoding some number of frames. It's actually kinda how MDCT audio codecs work; it's just very rare in video.

    • jheriko 9 years ago

      seems to be easy, but each frame depends on previous frames... so now you need to share lots of data between threads. its not as embarrassingly parallel as it looks from a naive perspective.

      although i contend that most decoders are very threadable - just that the people trying to do it usually lack the time or the skill, more usually the former.

      the state of video in programming is a total mess from my experiences.

    • frozenport 9 years ago

      Decoding frames ahead of time gives no benifit to a user watching the video. The problem is how to decode a single frame in parallel. Contrary to the video expressed elsewhere, hardware decoders run a lot in parallel. As MultiCoreWare pointed out, one of the biggest challenges is latency.

microcolonel 9 years ago

I don't think programmers believe any of the video decoding falsehoods; not because they know any better, but because they know they don't know.

Also, none of these unfounded preconceptions make intuitive sense, so I don't see why people would believe them.

imaginenore 9 years ago

> interlaced video files no longer exist

Interlaced video files should no longer exist.

Seriously, fk interlaced video.

> upscaling algorithms can invent information that doesn’t exist in the image

That's not a falsehood. Upscaling does invent information that doesn't exist in the image.

  • mrob 9 years ago

    "Information" in the information theory sense. The output of a deterministic upscaling algorithm can be exactly described by the input and the algorithm. There's no added information, only a different way of presenting the original information.

  • jeff_tyrrill 9 years ago

    > Interlaced video files should no longer exist.

    Yes, they should, as should silent movies, black and white movies, old game consoles with exotic output formats like vector graphics, and the like.

    It is a worthy endeavor to create and maintain video playback software that lets people consume beloved content that was made to the technology of its day, including home videos, sports games, TV shows with special effects edited in 60i, and video games.

  • emcq 9 years ago

    Perhaps that author was being pendantic, but from an information theroetic perspective it is correct that you cannot invent information with upscaling.

    The upscaled image does not have more information than what was in the original image; you can reconstruct the upscaled image given only the information available in the original image, the output resolution dimensions, and upscaling algorithm.

    • imaginenore 9 years ago

      That's like saying fractal images are not information. Just because something is generated by a formula, doesn't mean it's not new information.

AznHisoka 9 years ago

can we have falsehoods programmers believe besides video that are more common? this list probably is relevant for 1% of programmers here.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection