Settings

Theme

Windows Is Bloated, Thanks to Adobe’s Extensible Metadata Platform

thurrott.com

321 points by Integer 9 years ago · 186 comments

Reader

tomcam 9 years ago

You would be surprised at how much cruft in Windows over the years has been directly due to Adobe. I had many bug triage sessions where Windows developers at Microsoft had to work around Adobe problems to keep Windows users happy. I always thought it was unfair and was quite impressed by Microsoft at their willingness to handle this so quietly.

  • comex 9 years ago

    Not just Windows. Some choice strings from macOS system frameworks:

    AppKit:

        NSSavePanelOverRetainToCompensateForAdobe
        NSShouldWorkAroundBadAdobeReleaseBug
        Adobe Invener Fusion opted out due to odd event processing
    
    Foundation:

        com.adobe.Reader
        com.adobe.Acrobat
        com.adobe.Acrobat.Pro
        _allocWithZone:.do_adobe_hack
    
    (Though they're hardly alone. AppKit contains a huge number of bundle IDs scattered through the strings list, presumably for various special cases…)
    • brongondwana 9 years ago

      Everyone has hacks for the biggest products out there, it's a fact of life.

      My DAV server has hacks for Microsoft bugs and misfeatures, hacks for older Apple clients, hacks for Konqueror of all things, because I tested against it.

      And our current CalDAV code has just inherited two new hacks this week to work around weird bugs in shit that Google Calendar has been serving up:

      * years with only two digits or two leading zeros rather than 20xx.

      * unquoted TZNAMEs with :s in them.

      At least events from year "0012" are allegedly legitimate parsable ISO8601 times, and the events from year "12" are at least legitimate VCALENDAR. The broken TZNAMES parse legitimately, but

      DTSTART;TZNAME=GMT+10:00:20120101T01010101 needs to be fixed up pre-parse.

      Welcome to interoperability, where liberal in what you accept is the only choice when your communication partner is much bigger.

      • l-p 9 years ago

        Excerpt from the Radicale documentation:

        > The Radicale Server does not and will not support the CalDAV and CardDAV standards. It supports the CalDAV and CardDAV implementations of different clients (Lightning, Evolution, Android, iPhone, iCal, and more).

        • brongondwana 9 years ago

          Yeah, I bet. The standards are getting better, but there are so many vendor specific things that you need to support to have a good experience that just reading the standards in isolation won't help you much.

          Contributing to devguide.calconnect.org on the other hand, helps everybody.

        • gumby 9 years ago

          I wonder if someone has the energy to establish connectathons for "above layer 4". Most of our interoperability problems these days are virtual (thanks to brick and mortar connectathons squeeing out any of the old layer 1-4 bugs). These days we could do them online, and automate the detection of regressions.

          I was actually astonished that connectathons are still running; I haven't been to one in decades: http://www.connectathon.info

  • pc2g4d 9 years ago

    Microsoft's willingness to maintain backward compatibility has been key to their success over the years. With the importance of Adobe's products on their platform, it makes sense that they would go out of their way to ensure their continued functioning.

    • ced 9 years ago

      Raymond Chen from Microsoft on maintaining backward compatibility from Windows 95 to Windows XP:

      Look at the scenario from the customer's standpoint. You bought programs X, Y and Z. You then upgraded to Windows XP. Your computer now crashes randomly, and program Z doesn't work at all. You're going to tell your friends, "Don't upgrade to Windows XP. It crashes randomly, and it's not compatible with program Z." Are you going to debug your system to determine that program X is causing the crashes, and that program Z doesn't work because it is using undocumented window messages? Of course not. You're going to return the Windows XP box for a refund. (You bought programs X, Y, and Z some months ago. The 30-day return policy no longer applies to them. The only thing you can return is Windows XP.)

      ...

      This is just the tip of the iceberg with respect to application compatibility. I could probably write for months solely about bad things apps do and what we had to do to get them to work again (often in spite of themselves). Which is why I get particularly furious when people accuse Microsoft of maliciously breaking applications during OS upgrades. If any application failed to run on Windows 95, I took it as a personal failure. I spent many sleepless nights fixing bugs in third-party programs just so they could keep running on Windows 95. (Games were the worst. Often the game vendor didn't even care that their program didn't run on Windows 95!)

    • Pxtl 9 years ago

      God damn. If I were in that situation as a dev, I'd be tempted to break the latest releases of Adobe products on the day of launch, while still keeping the old ones supported. If I have to keep supporting your old broken garbage, you can't release new garbage.

      • wand3r 9 years ago

        Thats the nature of the beast; everyone wants to release new garbage; which itself was clean until it had to put just a few bits of old rubbish in until it became trash.

        Can't find it now but someone posted a Linux mailing list where Torvalds ripped a Dev for breaking something that impacted a user. The dev responded something like the parent: It's a 3rd party app/not Linux responsibility.

        The OS has to be stable and the users experience is Paramount. Even a technical crowd (based on the thread) is 50/50 split on this being a Microsoft issue. A non-technical users response: "this software is shit, these idiots broke X in [this version] Microsoft sucks"

        • z1mm32m4n 9 years ago

          I believe the thread you're looking for is this one, about never breaking user space:

          https://lkml.org/lkml/2012/12/23/75

          • digi_owl 9 years ago

            And this is why the kernel is "everywhere" while "year of the desktop" never happens. Because Linux userspace is all to happy to break itself again and again and again...

            • dTal 9 years ago

              Isn't this more of a social problem than a technical one? It seems to me like the main issue is a convention that a system only have a single version of a library installed, the anointed "system" version of that library. And often, binary releases of software are built against a particular cocktail of library versions from a particular distro at the time of release, each of which depends on its own cocktail of libraries, etc... In reality it's not complicated to get virtually any binary running on any system (in the worst case, just put the distro it expects into a chroot) - and there's no reason why applications couldn't just bundle all the libraries they need, right down to glibc - but it's just Not Done.

          • wand3r 9 years ago

            exactly that. thank you.

            • cpeterso 9 years ago

              And Linus taking glibc devs to task for them breaking Adobe Flash Player, due to a bug in Flash's use of memcpy():

              https://lwn.net/Articles/414467/

              • danielbarla 9 years ago

                That was an interesting read; he wasn't even very confrontational there. The point was also solid - they had an internal performance optimization change which was unproven, yet broke things in the real world. That, and the lack of guidelines or tests for what kind of usage results in the behaviour, it's a de-facto "feature" of the interface that they should not break.

                • Pxtl 9 years ago

                  I do wonder what the correct workflow is for preventing this - release binaries with assertions in place for development, so that if any of your input constraints are violated that aren't guaranteed to fail fast in the release version, then it's enforced in this debug version?

                • digi_owl 9 years ago

                  And it is a point that so much of Linux userspace continue to ignore to thus day...

    • agumonkey 9 years ago

      Yeah, it doesn't makes me less sad but I can't blame MS to try to keep their position and backward compatibility. Also there were many other partners that didn't want to change their code, so MS had to keep many deprecated or even non public API in early windows 3/95 releases

    • guitarbill 9 years ago

      For enterprise customers, sure. For many consumers, now that they use more modern OS and browsers daily (iOS/Android), it's painfully obvious how terrible this strategy has been - even if they can only articulate that on a subconscious level. Honestly I'd love to see what Microsoft could do if they ditched backwards compatibility completely, except in the server OSes. While hardly revolutionary, Surface Pad Pro is a good indicator/example that there's at least potential there.

      • SCHiM 9 years ago

        Those 'more modern' operating systems have roots and parts in their codebase that are just as old, if not older, than Windows. All platforms are continously adding new pieces and sometimes rewriting old. But I know for a fact that there are parts under Android's hood that a computer scientist from the 1970 would've found familiar.

        There is no reason to ditch the backwards compatibility in my opinion. All components can be kept functional alongside each other (like how UWA can function next to Win32) with some extra effort. I love the fact that there's a host of applications that target Win32 that I've been using since windows 2000 and still work on windows 10.

        • mirsadm 9 years ago

          It's amazing how much technical debt Android has accrued given how much newer it is than Windows.

          • wtallis 9 years ago

            I'm not too surprised. During Android's early years it was chasing a moving target as the standard mobile computing paradigm shifted from Blackberry/Treo type devices to iPhone clones. Then tablets got added to the mix and mobile SoCs outpaced Moore's Law for a while. The higher-level elements of the platform were inevitably going to be subjected to a degree of churn that Windows hasn't needed to deal with in about 20 years. (Of course, Windows has opted in to some churn with the revolving door of UI frameworks, but that's entirely self-inflicted.)

            • digi_owl 9 years ago

              Not helped by Google thinking they could apply Chrome/web methodology to device firmware. Sorry, but there is a reason why MS do year long public betas etc. Apple can get away with being different because they control the whole stack, from the chips to the online services. Right now Google is where Palm was with PalmOS.

      • m1sta_ 9 years ago

        They could create a new version of Windows to mark this retry. Maybe they could call it Windows RT.

        • derefr 9 years ago

          Rather than dropping compatibility entirely, I've always wondered why no modern OS has gone the route of creating a "compatibility layer" that's actually a (very-tightly-sandboxed) VM containing a [stripped-down] older copy of the OS.

          This would be perfect to re-add Win16 support to W10, for example. Try to double-click a 16-bit app? It opens with its handler, "Windows 3.1 Hyper-V Association Background Launcher". As long as no W3.1 apps are running, it unloads, and the rest of the OS can be completely free of 3.1-era cruft.

          You could deprecate features from your latest OS at a breakneck pace if you really took this strategy to heart. WinRT really could have been just Metro/UWP + a "Desktop" that's actually a Windows 7 VM with a high-level paravirtualized shared filesystem.

          (And it's not like Microsoft didn't do this themselves before; this is essentially how DOS applications ran under Windows 3.1, in a [hypervised!] DOS VM.)

          ---

          EDIT: let me clarify the specific point I was getting at.

          If you ship $OldOS as a VM within $NewOS, very tightly sandboxed, then you never have to update $OldOS again, nor do you have to keep the code from $OldOS around in $NewOS. The $OldOS VM is now a "static artifact"—something that will never change, but will continue to run old software perfectly.

          So, if $OldOS gains more security flaws, you don't update $OldOS; you update the virtualization software to make the sandbox stricter.

          And when you find that your $NewOS is now, itself, old—well, you just do the whole thing again, writing an emulator for $NewOS within $EvenNewerOS, and then your $OldOS programs run by launching a VM-in-a-VM. If you like, you can port the $OldOS emulator to run on $EvenNewerOS—but you could also just consider it legacy software of exactly the type your $NewOS emulator is built to run.

          The point of this is to decrease maintenance burden, by freezing the previous OS, so that you never need to think of it again. This wasn't what was done with "XP Mode"; Windows XP continued to receive updates as a component of Windows 7. The point here would be to EOL the older OS, and then virtualize it.

          • ndespres 9 years ago

            Windows 7 included what you describe as a feature called "Windows XP Mode," which would launch applications inside a Windows XP virtual machine which was closely tied to the host OS. It worked really well for a lot of applications which weren't ready for Win 7 yet.

            https://technet.microsoft.com/en-us/library/ee681616(v=ws.10...

            • ygra 9 years ago

              It wasn't really that closely tied, if I remember. It simply used Terminal Services to run a single application on the server. You've long been able to open local documents with remote applications, and have the general experience very close to a local application.

              • soundwave106 9 years ago

                Windows Virtual PC, the core of the technology, is more akin to VMWare, VirtualBox. The "Windows XP Mode" part of the equation was that Microsoft offered a pre-installed image of Windows XP.

                While the virtual environment integrated somewhat with the base (eg audio, printers, some networking shares and hard drives), it still is a separate window running a separate virtualized OS. I'm not sure I would call that too tightly integrated.

                Nonetheless I personally found it handy nonetheless to run the occasional very old 16-bit application.

          • wvenable 9 years ago

            > Rather than dropping compatibility entirely, I've always wondered why no modern OS has gone the route of creating a "compatibility layer" that's actually a (very-tightly-sandboxed) VM containing a [stripped-down] older copy of the OS.

            I think Raymond Chen discussed this on this blog at one point. The answer is that users want a single integrated environment for all their apps and not a bunch of isolated sandboxes. They want to drag and drop, copy and paste, share files, inter-application communication, etc. They also want a single look and feel.

            The other reason is that creating a new platform from scratch is a good way to lose features and alienate developers. The core of Windows NT doesn't look anything like Windows 3.1 but for developers the API they present is similar enough to get them on board.

          • roblabla 9 years ago

            OSX did this when they went from 68k to PPC [0], from PPC to x86[1], and also with Classic to support macos 9[2]. They maintained compatibility by emulated the previous architecture on the new one.

            [0] : https://en.wikipedia.org/wiki/Mac_68k_emulator

            [1] : https://en.wikipedia.org/wiki/Rosetta_(software)

            [2] : https://en.wikipedia.org/wiki/List_of_macOS_components#Class...

            • douche 9 years ago

              Mac Classic was great. For some reason the G4s they issued us in high school didn't have the Classic compatibility mode locked down as tight as everything else useful (to a teenager), so I could play older Mac video games in study hall...

            • derefr 9 years ago

              Ah, this is virtualization, though: the OS was trying to integrate the software from the old OS, letting it interact with all the older APIs of the modern OS as if it was running directly on it. Less like a VM, more like Wine.

              My point was less about compatibility, and more about security. If you took this approach, old software would still expose the new OS it was virtualized atop of, to its old security flaws. So you'd still have to move away from old software, eventually, for security reasons.

              But with proper VM isolation (and checkpointing, and a few other things), you could still have programs like IE6 around today, serving in its last purpose as the '90s-era Intranet web-services equivalent to the '80s 3270 mainframe terminal clients.

              Most any modern VM software can offer this level of isolation, but most have "host integrations" that turn them back into security holes. To be able to ship a static, EOLed copy of XP that could run IE6 indefinitely, without that causing problems, your compatibility layer would need to be a lot less like "Windows XP Mode" or Rosetta, and a lot more like DOSBox or qemu.

              There is one case where this already happens: many pieces of IBM z/OS software have been wrapped in layers of hard-isolated emulators for decades now, such that programs from the 70s are able to continue running without recompilation, atop a stack of VMs, where many of the intermediary pieces of VM software have themselves long been EOLed.

              • therein 9 years ago

                In fact beyond virtualization, it was emulation. I remember trying to install Windows 95 on my iMac G5 under QEMU and having it barely usable. Trying to debug issues with my setup using a very buggy PPC build of Firefox.

          • relevate 9 years ago

            This is partially the motivation behind MSR's Drawbridge project (which is the predecessor for Windows Subsystem for Linux), the idea being that the personality of the OS runs in user space and each application runs in a sandboxed picoprocess. All ~14,000 functions exposed by Windows DLLs could be implemented by a narrow ~45 ABI calls provided by the host.

            https://www.microsoft.com/en-us/research/project/drawbridge/

          • Kipters 9 years ago

            Technically they're already doing it via their subsystem mechanism. They're taking it "to the next level" with the Linux subsystem, where there's a kernel driver emulating a totally different kernel

          • markrages 9 years ago

            > I've always wondered why no modern OS has gone the route of creating a "compatibility layer"

            You could even hold down the Commodore key while booting to get a mode compatible with the previous system.

            • digi_owl 9 years ago

              Yep. Many of their products stood or fell relevant to their C64 compatibility.

              Once a install base has been built up, compatibility (or lack there of) can make or break you.

          • flomo 9 years ago

            Perhaps I'm missing something, but that sounds pretty much exactly how Win16 apps worked on 32-bit NT.

          • kgc 9 years ago

            Early releases of OSX did this using "Rosetta"

        • sorenjan 9 years ago

          I bet it would be a huge success.

      • digi_owl 9 years ago

        MS would crash and burn virtually over night. With no backwards compatibility there is little to no reason for customers (business or consumer) to stick with the brand.

  • frik 9 years ago

    File metadata is not cruft. It's a very good thing. Vista's/Win7 Explorer.exe with its metadata columns is GREAT.

    Even better was Windows (Live) Photo Gallery. Sadly it's dead since Feb 2017, you can't even install it anymore, as only a now broken WebInstaller exists. Photo Gallery was hand down 1000 times better than Picasa/Lightroom/Photos/iPhotos for just browsing photos and videos. And it alo supported tags with hierarchy (eg "City/New York/Manhattan").

    https://en.wikipedia.org/wiki/Windows_Photo_Gallery

    Sadly WinFS failed, metadata is nowadays often misunderstood and persived by companies contra productive in the age of cloud service strategies. Flickr is probably the only well known photo service tht keeps metadata. Facebook made it popular to strip metadata and keep annotations internally (as vendor lockin) - now common also with other services.

    I hope there will be a kind of come back of metadata. People need more education to understand the concept, that's all.

    • mathw 9 years ago

      Sure it's not cruft when they're images in your image library and you need to find them and understand them using said metadata.

      It's cruft when those assets are now embedded into your compiled, bundled, distributed software and such metadata now has no purpose whatsoever, as is the case for the examples cited in the article.

    • daemin 9 years ago

      The article was talking about adobe image metadata embedded in the explorer.exe executable itself, as metadata inside some png images that explorer uses in its interface.

      It isn't talking about the metadata columns for files.

      • frik 9 years ago

        Read the Wikipedia article about XMP. It's a newer metadata format to Exif and IPTC. XMP comes from Adobe, it contains a lot of Adobe strings in the XML schema.

        I wrote about the metadata handling in Explorer/shell32. It's a light version of what was tried out with WinFS.

        • daemin 9 years ago

          Read the article again, take a look at the code. The author is not complaining about Windows Explorer being able to parse and display metadata from PNG files. The author is complaining about how much embedded metadata there is inside PNG files inside the explorer executable file itself, as well as other DLL and EXE files in Windows.

          The metadata that the author finds is actually inside resources inside the executable itself. It does not include any strings found in the executable which are used to parse the metadata from other files that you have on disk.

          I'm not saying metadata is cruft. Metadata is very useful during production or for managing art assets. It is however superfluous once it is embedded in an executable or other code library or packed archive. This is simply because its purpose has already been served and it is no longer needed at that point, thus wasting space.

    • VMG 9 years ago

      > Even better was Windows (Live) Photo Gallery. Sadly it's dead since Feb 2017, you can't even install it anymore, as only a now broken WebInstaller exists.

      Slightly OT: is there any good photo organizer for windows? I would be happy with sth at the level of shotwell even https://wiki.gnome.org/Apps/Shotwell

      • frik 9 years ago

        digiKam runs on Windows too, it's good, it's open source. But only photos.

        Photo Gallery supports video too, and is s lot easier to use and has s polished UI and stable path-like-metadata support. DigiKam can read such metadata tags, even edit them but adding new ones is buggy (path like City/New York/Manhattan are then stored as three separate tags City, New York, Manhattan).

    • i336_ 9 years ago

      > Even better was Windows (Live) Photo Gallery. Sadly it's dead since Feb 2017, you can't even install it anymore, as only a now broken WebInstaller exists.

      This sounds like a challenge. You're on.

      - Do some scratching around; discover sites hosting "wlsetup-all.exe"

      - Point the Web Archive at download.live.com

      - After some trial and error with broken pages find http://web.archive.org/web/20161130174327/https://support.mi...

      - Follow the "Download options" link

      - Eventually land on http://web.archive.org/web/20161226002912/https://support.mi..., and disable JavaScript so the page doesn't kill itself (remember on Chrome you just click the (i) to the left of the URL)

      - Ah, a "Download Now" link!

        $ curl -vv http://go.microsoft.com/fwlink/?LinkID=255475
        Location: http://g.live.com/1rewlive5-web/en/wlsetup-web.exe
      
        $ curl -vv http://g.live.com/1rewlive5-web/en/wlsetup-web.exe
        Location: http://wl.dlservice.microsoft.com/download/C/1/B/C1BA42D6-6A50-4A4A-90E5-FA9347E9360C/en/wlsetup-web.exe
      
      Hmm...

        (note s/web/all/g)
        $ curl -vv http://g.live.com/1rewlive5-all/en/wlsetup-all.exe
        Location: http://wl.dlservice.microsoft.com/download/C/1/B/C1BA42D6-6A50-4A4A-90E5-FA9347E9360C/en/wlsetup-all.exe
      
      Can I...?...

        $ curl -vv http://wl.dlservice.microsoft.com/download/C/1/B/C1BA42D6-6A50-4A4A-90E5-FA9347E9360C/en/wlsetup-all.exe
        < HTTP/1.1 404 Not Found
        (...)
        <div id="header"><h1>Server Error</h1></div>
        (...)
        <h2>404 - File or directory not found.</h2>
        <h3>The resource you are looking for might have been removed, had its name changed, or is temporarily unavailable.</h3>
      
      Hmmm.

      - Try and load wl.dlservice.microsoft.com/robots.txt in the Web Archive

      - Get redirected to Microsoft homepage!!

      - Lookup wl.dlservice.microsoft.com/* in Web Archive

      - "805 URLs have been captured for this domain."

      - Search for "c1ba..." - get hits!

      http://web.archive.org/web/20170416220642/http://wl.dlservic...

      137329840 bytes.

      There are other sites that have copies of the file, but a) this one is from the Web Archive and b) I've verified using a mixture of WA and still-live Microsoft redirects that this is the latest-ever release.

      • acemarke 9 years ago

        Just want to say I love it when people do stuff like this. :) I have no need for that app myself, but I appreciate that you took the time to grovel through the guts of several different pages, work around barriers, solve the problem, and document all the steps you took.

      • frik 9 years ago

        Thanks!

        I previously copy&pasted the folder to another PC, and manually patched the registry and copied some dlls to get it working.

    • krylon 9 years ago

      > And it alo supported tags with hierarchy (eg "City/New York/Manhattan").

      This is kind of OT, but F-Spot on Linux supported hierarchical tags, as well, and I loved it. Was really sad when it was discontinued and distros replaced it with Shotwell.

  • wvenable 9 years ago

    Adobe effectively forced Apple to build the Carbon API for OS X delaying the release by a year or more.

    • jomohke 9 years ago

      Apple used Carbon for a lot of its own built-in apps, such as Finder, didn't it? (edit: and iTunes)

      (Almost?) all other other major vendors at the time used it heavily too: Microsoft Office until 2011, FileMaker until 2010.

      If Apple released the OS a year earlier, but without any available third-party software (waiting for almost-rewrites to happen), would they really have been better off? At this time they were on life support and would have had difficulty convincing third-party vendors to invest heavily in their platform, or to convince users to adopt it without any major applications available.

      In either case, Adobe's sway clearly declined over the years, as Apple cancelled the 64bit version of Carbon while Adobe was still heavily built around it, forcing Adobe to switch (and an awkward year or two when the Mac, but not Windows, versions were stuck with 32bit memory limitations)

      • wtallis 9 years ago

        > Apple used Carbon on a lot of its own built-in apps (such as Finder), didn't it?

        Apple wrote the Finder in Carbon as a dogfooding exercise, to prove to third party developers that Carbon was a first-class fully supported framework. The Finder has since been re-written in Cocoa.

      • DerekL 9 years ago

        > Apple used Carbon for a lot of its own built-in apps, such as Finder, didn't it? (edit: and iTunes)

        iTunes needed to support both Mac OS 9 and X, so Carbon made much more sense than writing all of the UI code twice. Also, it started as SoundJam MP, which was written for Mac OS 8, so Carbonization was much less work than a Cocoa rewrite.

        • simonh 9 years ago

          Right, but the same logic applied to any popular app with users still on Mac Os.

  • zubat 9 years ago

    Now imagine the world where Microsoft became Adobe's direct competitor and cut their oxygen supply.

    • eon1 9 years ago

      I feel like you're implying that would be a bad thing when it could actually be preferable. I don't think MS is faultless or anything, but I'd be surprised if they could make themselves as much of a pain as Adobe does (especially given all the OS integration opportunity).

    • goldfire 9 years ago

      This is exactly what Microsoft were accused of doing to Lotus 1-2-3 back in the day. See [0]. TL;DR: wasn't true.

      [0] https://news.ycombinator.com/item?id=10432608

    • santaclaus 9 years ago

      Didn't they try that with Expression back in the aughts and fail? Doesn't mean it couldn't work now, though!

      • at-fates-hands 9 years ago

        I actually found out about Expression way after the fact and thought it was a viable alternative - but by then, MS was already in the process of sending it out to pasture.

        It's like a co-worker said, "Adobe's software is bloated and cumbersome, but they have a top notch marketing team that keeps it afloat." But yes, if there is a company that is ripe to have some serious competitors it would be Adobe.

  • aswanson 9 years ago

    I've heard that Adobe's poor software engineering has provided a huge malware/virus vulnerability surface for windows. Maybe MS should force them to break/rewrite.

  • batter 9 years ago

    Doesn't it makes you wonder why MS created that situation on the first place?

    MS have no one to blame except themself and their 'legacy code'.

    • commandar 9 years ago

      It's hard enough to get enterprise customers to stay current when they do provide legacy support. Fighting with IE version compatibility, for example, is still a daily, very real issue for enterprise size organizations.

      Blame the customers. Microsoft never would have captured the marketshare they have if they didn't cater to them.

      • DaiPlusPlus 9 years ago

        Microsoft does deserve some blame for deciding to support backwards-compatibility in-situ instead of introducing proper API versioning, packaging, isolation, and other proven techniques. Win32 wasn't versioned for processes until Windows 7 (using app.manifest) even though the need for such as system was blatantly obvious during the Windows 2000 days.

        • commandar 9 years ago

          Sure, that's valid.

          But most of the real nightmare scenarios I've heard related to backwards compatibility have more to do with third-parties doing things they were never supposed to do.

          Things like hitting private, undocumented APIs. Or checking the Windows version with a "9x" wildcard, giving us the jump from W8 to W10 over a decade later.

          Microsoft has made their own mistakes, but supporting the mistakes of third parties has been absolutely vital to them keeping their core customer base.

          • DaiPlusPlus 9 years ago

            All of those scenarios you describe can be solved with appropriate application sandboxing and shimming.

            I don't personally believe the "Windows 9" story - if a program is old enough to feel the need to check for Windows 95/98 then it should already be fine to run under Windows' own app-compat layer which spoofs the Windows version string anyway. I believe it's marketing-based out of fear consumers would see "MacOS 10 vs Windows 9" (like how it was PlayStation 3 vs Xbox 2 - hence "Xbox 360").

            • nemothekid 9 years ago

              MacOS 10 wasn't rebranded from OS X until 2016, a full year after Windows 10 GA - and I doubt anyone outside a select few at Apple knew that the OS X line was going to be renamed.

              In any case, your reasoning doesn't really make sense. I can run a program that was written for Windows XP on Windows 10 without the need from a app-compat layer. Given that a developer can hide/show all sorts of random functionality with an if-branch-on-version - the user will see a broken or strange app and it won't be clear (and MS probably didn't have the means to detect) that the app should run in compatibility mode.

              I still believe that MS wanted to ship an OS that "just worked" and did so under 10, than trying to compete in version numbers with an OS that has had 10 in its name for last 18 years.

              • frik 9 years ago

                > MacOS 10 wasn't rebranded from OS X until 2016

                X means 10 in latin.

                Look at old MacOS X 10.3 books, X always stood for 10.

                • talmand 9 years ago

                  Even your mac will say 10 if you get to speak out Mac OS X. But I have had many a people insist it was not pronounced as Mac OS 10, but as Mac OS X.

                  I could understand the confusion.

            • carey 9 years ago

              Some of those programs would have been Java programs, where java.exe is modern, but the program is not. The definition of os.name in Java makes checking the string prefix particularly likely.

              • WorldMaker 9 years ago

                Google Code Search, when last I was curious on the topic, turned up a lot of open source Java with starts with "Windows 9" checks, including some deep in the Java framework itself. It's hard to imagine there isn't as much littered in closed source and proprietary code. (Even probably code that never actually ran on Windows 9x in the first place.)

          • ClassyJacket 9 years ago

            " Or checking the Windows version with a "9x" wildcard, giving us the jump from W8 to W10 over a decade later."

            This is just a rumor made up on reddit. It's completely false. Windows returns its version number as a series of integers. And even if it was a string, they could've just called it "Windows Nine".

            • WorldMaker 9 years ago

              Windows version APIs have always had a "name" string and bad developers do have a wild tendency to just use string manipulation instead of more obvious methods. The Windows version APIs now even "lie" by default since around Vista because they presume a developer isn't querying the version API for the right reasons and you now have to essentially tell Windows you are built for/have tested on the right version to get the right version response, but the right way to do things is capability checks rather than checking the version API so it shouldn't matter that you need to do a lot of work to get the actual current version information from the version API.

              Also: https://news.ycombinator.com/item?id=14205899

          • tremon 9 years ago

            Things like hitting private, undocumented APIs

            That's entirely Microsoft's fault too: they made internal APIs accessible to applications, and did not provide comprehensive documentation on the full features of their public API.

            The latter means that application developers were forced to just guess what a function could do. And since no API performs proper input validation, undocumented usage became the norm.

            • ygra 9 years ago

              Of course internal APIs are accessible to applications. You can always call them somehow. Whether it's fishing "function #183" from a DLL, or reflecting on a managed type and calling stuff that way ... I wouldn't exactly blame MS here.

              Public APIs are documented, guaranteed to work like documented (otherwise it's a bug and will be fixed) and intended for others to call. Internal APIs are none of these things. They are not documented because they're not supposed to be called. Anything not documented in an API it (to me at last) something that's not guaranteed to remain like that. Whether it's a complete function or just a side-effect of something that is public.

              • talmand 9 years ago

                Never mind the fact calling an undocumented API within your app could lead to really bad things for the user.

        • UnoriginalGuy 9 years ago

          > Microsoft does deserve some blame for deciding to support backwards-compatibility in-situ instead of introducing proper API versioning, packaging, isolation, and other proven techniques.

          They did, in Windows 8. It is called Metro/Modern/UWP. Everyone hates it.

      • tjalfi 9 years ago

        My employer has an application that requires IE5 compatibility mode. We are a version behind and will be upgrading it this year. The sad thing is that the new version uses Silverlight.

mih 9 years ago

In one of the comments on the page, a reader ran the analysis [1] on a windows installation and reports the bloat size.

Total bytes wasted: 5341278

[1] https://gist.githubusercontent.com/riverar/f4a56b91580af1bd3...

II2II 9 years ago

By the sounds of it, this bloat is minor. (Keep in mind, the author is pointing out the two most extreme examples.)

Bloat arises from a lot of different places, a lot of which cannot realistically be controlled without drastically affecting user expectations, system performance, and how software is developed.

Consider graphics. If you are quadrupling the color depth, you are quadrupling the amount of memory required for graphics resources. Even more fun, if you are doubling the resolution you are quadrupling the amount of memory required for graphics resources. Going back to the olden days would only be an option if they are willing to compromise on the quality of the graphics.

At the other end of the spectrum are developers. Should they really be choosing things like the type of an integer to reduce the size of code and data? Old software was often limited due to such considerations. In some cases software used bizarre tricks to reduce bloat, such as cramming data into the unused bits of an address. (Apparently that was common on 68000 based personal computers.)

Don't get me wrong, there is a lot of unnecessarily bloated software. Yet I suspect that the vast majority of that bloat exists for very good reasons.

  • userbinator 9 years ago

    Consider graphics. If you are quadrupling the color depth, you are quadrupling the amount of memory required for graphics resources. Even more fun, if you are doubling the resolution you are quadrupling the amount of memory required for graphics resources. Going back to the olden days would only be an option if they are willing to compromise on the quality of the graphics.

    I suspect a lot of this bloat is due to the current and IMHO horrible trend of "every UI element is a separate bitmap image", even if the image is trivial to generate procedurally; consider gradients, for example --- they're easily described by an equation that may take no more than a few dozen bytes of code, yet developers seem to think it requires storing a multi-MB PNG image, maybe even in multiple resolutions (smartphone apps are a particularly notable example of this).

    The irony is that this wasteful technique is often combined with the "flat" UI trend, which would be even easier to generate procedurally, so we've arrived at UIs which are less featureful and yet more bloated.

  • lefty2 9 years ago

    It doesn't sound too minor to me. It was 20% of windows explorer. But the worrying thing is that it's indicative of sloppy software development. If this basic stuff slips through, what else slips through as well?

    • II2II 9 years ago

      The analysis was done on \Windows\explorer.exe, and it was noted as being an extreme case. The distinction is important for a couple of reasons. First, it is an extremely small part of Windows as a whole, weighing in at about 4.5 MB on disk (Windows 10 AE). Second, chances are that it is also a very small part of what people think of as Windows Explorer since it also depends upon external libraries.

      While Windows may suffer from unnecessary bloat, this article is not a very good evidence in that direction. Windows itself is much (much) more than a 4.5 MB binary and the growth of Windows over the years likely has more to do with changes in technology and the market than anything else.

      I also doubt that this is an indicator of sloppy software development, nor is this basic stuff. It simply indicates that the resources were added to the executable file more-or-less as is. Developers are unlikely to be concerned with the structure of the resources as long as those resources are in a format that is well understood by the software. Graphics designers are unlikely to be concerned with how the embedded data bloats the size of the executable. While stripping the excess data may seem like a basic good practice in retrospect, it is not basic nor is it sloppy in the sense that it doesn't strictly fall in the domain of the two groups responsible for handling the data.

  • jessermeyer 9 years ago

    >the vast majority of that bloat exists for very good reasons.

    "We didn't have time to do it right."

    And yet they always have time to do it over.

  • vidarh 9 years ago

    > (Apparently that was common on 68000 based personal computers.)

    I wouldn't say it was very common, but there are some notable examples, such as Amiga BASIC which was incidentally written by Microsoft. As a result it needed a patch to run on machines with the bigger M68k CPUs (which had 32 address lines vs. 24 address lines on the 68000), but it was so awful it died a swift death in any case.

    • AstralStorm 9 years ago

      Not quite because it was awful, but because Amiga REXX was so much better.

      • vidarh 9 years ago

        AREXX ports were fantastic, but the language is awful. But Amiga BASIC was no longer shipped wit the OS as of 2.x anyway, so unless you got a copy elsewhere there was no choice. But Amiga BASIC had falled out of favour long before that.

      • LocalH 9 years ago

        AmigaBASIC was horrible. It didn't really make use of the OS facilities that well, and it was slow and cumbersome. Almost identical to Microsoft's BASIC for the Mac from what I've seen, and that was universally hated compared to MacBasic

Esau 9 years ago

It's not just Windows that is bloated; so is macOS, Android, and iOS. The wastefulness annoys me and I don't want to hear its okay because we have tons of disk space and RAM - it is still wasteful.

I understand why they kitchen-sink operating systems - its mainly so they can crow about new features when releasing new versions of the OS. But I wish they would offer alternate installs for those of us who are proficient.

  • flukus 9 years ago

    On android I'm still severely limited by disk space and RAM, the waste is very noticeable. It still is on PC for most users too, cheap SSD's are still quite small, there are a lot of new laptops sold with 64GB SSD's and 2GB RAM. Good luck running anything in node.

  • saagarjha 9 years ago

    Looks like Linux is what you're looking for…

    • realusername 9 years ago

      I would really like to have a Linux-style OS on my smartphone and not the bloated Android Java which looks like the Win32 API re-engineered for smartphones.

  • zacmps 9 years ago

    I agree, it's a lot of the reason I like distros like Arch or Alpine.

  • Fifer82 9 years ago

    409 grains of sugar please or I am going elsewhere.... Actually....

0x0 9 years ago

It's not the first time Windows has shipped with shameful metadata. For example, a .wav file shipped with Windows XP appears to be authored with a pirated version of SoundForge: https://web.archive.org/web/20060721090521/http://techrepubl...

  • userbinator 9 years ago

    While it may be more journalistically appealing to promote the "Microsoft uses pirated software" angle, keep in mind that cracked does not necessarily mean pirated. They could have paid for it, then cracked it to avoid things like lost hardware dongles or unavailable Internet licensing/activation servers; I've seen this a lot in things like industrial control software, where even brief outages can mean very high lost profits.

  • douche 9 years ago

    Eh, that sort of thing happens all the time. We have customers that are national banks of foreign countries that are running on unactivated or cracked Windows servers.

    I would bet that case was a contractor who was using their own equipment. Similar things have bit us in the ass with freelancers that have ripped off stock images and presented them as their own.

asveikau 9 years ago

It's great investigative work into Windows binaries, and I hope it gets addressed for the sake of people's disk space, but I think the tone is too harsh and overstated.

Example: He cites effects on startup time - but has he considered the existence of virtual memory? When explorer.exe loads and maps the bloat into address space, it doesn't need it in RAM until the first page fault accessing it which likely will not even happen.

  • codys 9 years ago

    So that is true as long as all the bloat is contiguous. If it is spread out throughout the file (in such a way that bloat doesn't fill an entire page) it will still end up loaded. Or even if it is "unused", that doesn't mean something isn't scanning over it byte-by-byte.

    In the happy case, yes, virtual memory will save us. But there are a lot of ways we could still end up loading the junk into ram.

    Also, there are potential runtime costs to it being larger just on disk (need to seek over it, etc).

  • withinrafael 9 years ago

    Most binaries in Windows are signed. This requires the loader to load each PE section, hash, do some number crunching, and compare the result -- which results in the entire file being loaded into memory before execution. This involves (potentially random?) disk I/O, which can be surprisingly slow on certain platforms (e.g. Xbox One, HoloLens, anything IoT, anything with eMMC).

msimpson 9 years ago

Given this bloat resides in the metadata of PNG assets exported from Photoshop, couldn't this affect any operating system?

How many applications on Mac OS utilize PNG assets which were exported from Photoshop without any further optimization?

  • 0x0 9 years ago

    When you add a .png file to an xcode iOS project, it will add a build step to pngcrush the asset automatically. https://developer.apple.com/library/content/qa/qa1681/_index...

    • msimpson 9 years ago

      Interesting. I did not know that. Would the same hold true outside iOS targets?

      • saagarjha 9 years ago

        macOS supports true vector assets, so I'm guessing it reduces the need for such measures. But coming back to your question: I'm not sure, but both iOS and macOS use the same asset filetype, so I'd assume so.

  • ghostly_s 9 years ago

    I would put good money on 'none'. Apple is way too much of a stickler about MacOS' graphics APIs to not be using custom authoring or optimization routines on assets. Edit: I presumed you're talking about first-party applications only, as you said 'operating system'.

blibble 9 years ago

I wonder how many hundreds of kilobytes that adds up to in a 20gb windows install

  • kevindqc 9 years ago
  • ashark 9 years ago

    What's the story on that footprint, anyway? We went from (from memory) ~250-350MB Win98 to ~700-800MB XP to ~10-15GB(!?!?!) Win7, and just up from there. Plus the default settings seemed to starting going really crazy with swapspace/caching around the time of Win7. Another 10+GB if you didn't tell it to knock that crap off. Why the sudden, giant shift? They didn't add 10-15x the features, that's for sure.

    • jug 9 years ago

      WinSxS (Windows Side By Side) assemblies were introduced to avoid dll hell by allowing Windows to store multiple versions of installed dll's. So even a minor security patch may leave the former version around because other apps may use/expect it. I think that might add some bloat over time? Also Windows Update installer caches. A ton of Windows updates actually leave their installers around in case you want to uninstall them. That can add up! I've seen it easily get to 1-2 GB.

    • bastawhiz 9 years ago

      They did, to some extent. Plug in almost any piece of standard consumer hardware and it'll probably mostly just work without a network connection. All those drivers don't take up zero space, but the benefit when my mom plugs in a printer and it just works makes it worth it.

    • rpeden 9 years ago

      I think at least some of it is the Windows on Windows stuff to allow 64 bit machines to run both 32 and 64 bit software. Weren't the 32 but versions of Win7 about half the size of the 64 bit ones?

      There's still a lot of size growth over time, of course.

  • moolcool 9 years ago

    You don't keep a 20gb Windows install in RAM all the time though. Bloat in explorer.exe is the issue here

    • jszymborski 9 years ago

      To this point, the files in the install are compressed, and I'm sure XML metadata is the sort of thing that compresses well with the DEFLATE algorithm they likely use.

      • PaulHoule 9 years ago

        I discovered this a few months ago, when I went looking for XMP metadata in the filesystem and used the magic number trick to extract it from files of all kinds.

        I found it is common to find XMP inside media files embedded inside Windows EXE, as well as Linux binaries, JAR, Microsoft Word and other composite formats.

        Complex media objects frequently use an encapsulation system such as ZIP. When a PNG file is incorporated into a JAR or a Word Document, the XMP content in the file may not be compressed because the archiver may not attempt to compress the png file since it assumes the data is already compressed.

        XMP is very good from the viewpoint of content creators in terms of having comprehensive metadata incorporated into files so that it does not get out of sync. XMP data is RDF data using an improved version of Dublin Core, IPCC and other industry RDF vocabulary. You can write SPARQL queries right away, plus XMP specifies a way to make an XMP packet based on pre-existing metadata in common industry schemes.

        The XMP packets can get big, and you sometimes see people make a tiny GIF image (say a transparent pixel GIF) that is bulked up 100x because of bulky metadata. Once you package data for delivery to consumers you want to strip all that stuff out.

        The XMP spec is here:

        http://www.adobe.com/devnet/xmp.html

        There is some brilliant thinking in there, but also things that will make your head explode such as the method for embedding an XMP packet into a GIF

        • jszymborski 9 years ago

          Hmm... would be interesting if we started taking XMP into account when designing compression programs then...

          • abdias 9 years ago

            You could actually take any ancillary chunks into consideration, ie. chunks starting with a lower-case first letter. These are non-critical/mandatory.

        • TazeTSchnitzel 9 years ago

          > When a PNG file is incorporated into a JAR or a Word Document, the XMP content in the file may not be compressed because the archiver may not attempt to compress the png file since it assumes the data is already compressed.

          PNG can apply DEFLATE to blocks though, right? Does XMP not use it?

          • abdias 9 years ago

            Deflating can be applied to some chunks, but not at will. The zTXt chunk can be compressed while for example the tEXt chunk cannot. The newer iTXt chunk can vary.

            The two former are limited in scope and language encoding support, so iTXt is typically used for extended textual data such as XML/XMP etc. But if is saved compressed or not depends on the PNG encoder/host used (there can also be multiple instances of these chunks in the same file).

            Photoshop for instance saves uncompressed, I guess to give fast access for performance reasons (ie. file viewers using galleries for numerous images while displaying their meta-data).

    • galacticpony 9 years ago

      data contained in an exe (or dll) is not necessarily in RAM at all time

  • astrobe_ 9 years ago

    One could fit about 10 Linux live distros in those 20Gb. Those "few hundreds of kilobytes" are indeed trivial.

    • 21 9 years ago

      Windows comes with a ton of builtin drivers. So it will work with a lot of devices out of the box without needing an Internet connection to update.

      • icebraining 9 years ago

        Same is true in most Linux distros. My stock Debian kernel comes with 2300+ drivers, and which only take ~130MB.

        • pikzen 9 years ago

          https://blogs.msdn.microsoft.com/e7/2008/11/19/disk-space/

          Bear in mind, that was the Vista days, and Windows 10 now supports even more devices. 800MB of drivers at the time. I would not be surprised if Windows supported by default upwards of 10000 drivers. It works pretty much flawlessly on even somewhat obscure and old hardware. And when your OS is installed on that many consumer devices, and not informally standardized servers, you are going to meet those weird devices one way or the other.

          Windows drivers may also take up a bit more space individually because of the overhead caused by either the Windows Driver Model or Windows Driver Framework, but that's the price to pay to not have a driver crashing and bringing down your entire system. Yes, Linux, I'm looking at you.

        • simooooo 9 years ago

          None of them are ever for any of my hardware

      • astrobe_ 9 years ago

        The second thing you want to make work is probably the network anyway, so I don't really see where's the quality here.

squarefoot 9 years ago

I can't comment on this issue, but if you want to get an idea of how a company can take over an excellent software and ruin it making it beefier and slower, just take a look at the wonderful snappy gem that was Cool Edit Pro and what it became after being morphed into Adobe Audition.

  • Intermernet 9 years ago

    I still have an installer and valid license for the last version of Cool Edit Pro. So glad I hung on to it!

JayXon 9 years ago

shameless plug: you can throw a windows pe file (exe, dll, etc.) at leanify and it will remove all the garbage in pngs in that pe file (even those embedded pngs in high res ico file in pe file), and it will also optimize png compression with zopfli. But don't use it on windows system files because modifying those pe files will definitely break the digital signature.

sp332 9 years ago

Is it feasible to remove this junk yourself, or will the system freak out about hacked binaries? Would it also complain if I just applied to the PNG files?

  • INTPenis 9 years ago

    I'm a Linux specialist and a stranger to Windows but logically any modification of binaries should result in security issues.

    And in a perfect world the external PNG content would also be verified.

    • 21 9 years ago

      All Windows binaries are signed. Changing the embedded PNG will void the signature. Not sure what Windows will do if explorer.exe has a bad signature.

      There is also a Windows system integrity checker service which disallows changes to protected Windows files, and repairs them automatically (using a cached copy).

      • kuschku 9 years ago

        Well, then you modify that, too.

        It’s your computer, you installed the software, you have a license, therefore you own that copy, and can modify it however you wish. (EU Copyright Directive, especially Article 6 and following).

        Now, the question is, why does Windows not allow me to add signatures that should be considered acceptable by default, why can I not modify my own OS installation?

        • 21 9 years ago

          Can you modify Android or iOS system files or add your key? Rooting is not really an answer, because in that case you can also root Windows.

          I was under the impression that you can only legally modify files for fair use or compatibility purposes, not just because you want so.

          • kuschku 9 years ago

            Yes, and no.

            On Android, I can change which keys the bootloader accepts for signing, and add my own.

            From then on, the system will allow me to normally push updates, etc.

            Apparently, on Windows, even as Root/Admin, I can not do so.

            Additionally, that is correct in the US, but in the EU, having a license is equivalent to owning the copy, and having all relevant ownership rights, such as the right to modify, right to rent out, right to resell your copy, etc.

            If you can buy a car, add a different FM radio, and resell it, so you can buy a Windows copy, modify the start button to show a penguin eating an apple, and resell it.

            • 21 9 years ago

              You can do whatever you want on Windows if you know your way. You can disable the system integrity checker if you are admin.

              You are wrong regarding the right to modify software. It seems to say pretty clearly that you are only allowed to modify software only to make it work as intended:

              Exclusive rights of the rights-holder: the translation, adaptation, arrangement and any other alteration of the program;

              Limitations of those exclusive rights: A lawful acquirer of a program may reproduce, translate, adapt, arrange or alter the program, when it is necessary in order to use the program in accordance with its intended purpose.

              http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=URISERV%3...

              • kuschku 9 years ago

                > You are wrong regarding the right to modify software. It seems to say pretty clearly that you are only allowed to modify software only to make it work as intended:

                That’s your interpretation of the law, the ECJ has ruled otherwise.

                • 21 9 years ago

                  Please link to said ruling. The one I could find was related to the right to sell an unmodified backup copy.

        • eridius 9 years ago

          The general answer to "why can't I modify my own OS installation?" is "if you could, so could malware authors".

          • kuschku 9 years ago

            No, it is not.

            That's why I asked if there was a way to add your own key to the keyring.

            On Android, I can add my own keys to the bootloaders allowed keys, and lock the bootloader again. As is the recommended installation type for copperhead OS.

        • userbinator 9 years ago

          You can theoretically add/modify your own certificates, but it will take much more than clicking in the GUI --- they're hardcoded in the binaries.

          ...and then there's "Secure" Boot, which may get in the way of this all; I don't know, I don't use such locked-down hardware myself.

    • sp332 9 years ago

      I think it's possible to boot into a different shell program instead of being stuck with Explorer. So even if you leave the original copy where it is, you might be able to run a modified version as your desktop.

wtbob 9 years ago

Egad that XML is horrible! Whoever thought that could possibly be a sane format?

And to repeat it over and over — it's like a boot stomping on disk space, forever.

  • flukus 9 years ago

    Adobe doesn't really get xml at all. A couple of times I've had to make software produce InDesign templates (https://spin.atomicobject.com/2017/04/25/dynamic-indesign-te...), you have to do string templating because real xml serializers aren't compatible with the "sortof xml" that adobe uses.

    • davidwtbuxton 9 years ago

      I did a big project involving InDesign and the XML import a few years ago (CS4 times). Once I learnt to be very careful when editing the templates, it was pretty satisfying.

      I remember that certain XML tags had to use the exact namespace defined in the Adobe spec, but other than that it all seemed pretty XML compliant.

      I was using Python / ElementTree, and had to override the namespaces to make sure the exact name was being used. Or something.

      https://docs.python.org/2/library/xml.etree.elementtree.html...

      What other problems did you encounter regarding XML compatibility?

      • flukus 9 years ago

        It's honestly too many years ago now to remember, I just remember trying a couple of perfectly reasonable things (maybe attributes, maybe multiple sub lists, I'm not sure) and having it break. But I think it only broke when you created a new template, not when you were using an existing one with new data, or something along those lines. You could make a change and not realize it broke things for a week.

        It was incredibly temperamental too. I got the initial bare bones demo working and showed the powers that be, but after that a spent a week trying to do it again and it wouldn't work. I one stage me and a colleague went through a tutorial on it in sync and the same steps would work on one computer but not the other.

        Had we got it working well we might have even saved this particular company (mostly a graphic design one), they were in the advanced stages of circling the drain. It's frustrating when you see the potential of huge productivity improvements but their just out of reach.

21 9 years ago

I remember an article saying that making a trivial change to Windows requires 5 minutes to change the code and 2 weeks to deal with the aftermath (testing/...)

I wonder how easy it actually is to remove this XMP metadata, considering that it could potentially break some application which loads a PNG directly from explorer.exe with a broken PNG parser or something.

fiatjaf 9 years ago

Ok, how can a Windows user defend himself from that?

  • Analemma_ 9 years ago

    You can't. This metadata is in the system binaries, which are cryptographically signed and can't be changed. Hopefully the article spurs Microsoft to fix it though.

zeveb 9 years ago

Wow, that's a bloated format. Here it is as XML:

    <?xpacket begin="?" id="W5M0MpCehiHzreSzNTczkc9d"?>
    <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.4-c002 1.000000, 0000/00/00-00:00:00        ">
       <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
          <rdf:Description rdf:about=""
                xmlns:xmp="http://ns.adobe.com/xap/1.0/">
             <xmp:CreatorTool>Picasa</xmp:CreatorTool>
          </rdf:Description>
          <rdf:Description rdf:about=""
                xmlns:mwg-rs="http://www.metadataworkinggroup.com/schemas/regions/"
                xmlns:stDim="http://ns.adobe.com/xap/1.0/sType/Dimensions#"
                xmlns:stArea="http://ns.adobe.com/xmp/sType/Area#">
             <mwg-rs:Regions rdf:parseType="Resource">
                <mwg-rs:AppliedToDimensions rdf:parseType="Resource">
                   <stDim:w>912</stDim:w>
                   <stDim:h>687</stDim:h>
                   <stDim:unit>pixel</stDim:unit>
                </mwg-rs:AppliedToDimensions>
                <mwg-rs:RegionList>
                   <rdf:Bag>
                      <rdf:li rdf:parseType="Resource">
                         <mwg-rs:Type></mwg-rs:Type>
                         <mwg-rs:Area rdf:parseType="Resource">
                            <stArea:x>0.680921052631579</stArea:x>
                            <stArea:y>0.3537117903930131</stArea:y>
                            <stArea:h>0.4264919941775837</stArea:h>
                            <stArea:w>0.32127192982456143</stArea:w>
                            <stArea:unit>normalized</stArea:unit>
                         </mwg-rs:Area>
                      </rdf:li>
                   </rdf:Bag>
                </mwg-rs:RegionList>
             </mwg-rs:Regions>
          </rdf:Description>
          <rdf:Description rdf:about=""
                xmlns:exif="http://ns.adobe.com/exif/1.0/">
             <exif:PixelXDimension>912</exif:PixelXDimension>
             <exif:PixelYDimension>687</exif:PixelYDimension>
             <exif:ExifVersion>0220</exif:ExifVersion>
          </rdf:Description>
       </rdf:RDF>
    </x:xmpmeta>
    
    <!-- whitespace padding -->
    
    <?xpacket end="w"?>
And here it is as SXML (https://en.wikipedia.org/wiki/SXML):

    (*TOP* (*PI* |xpacket| "begin=\"?\" id=\"W5M0MpCehiHzreSzNTczkc9d\"")
     (|adobe:ns:meta/:xmpmeta|
      (@ (@ (*NAMESPACES* (|adobe:ns:meta/| "adobe:ns:meta/" . |x|)))
       (|adobe:ns:meta/:xmptk|
        "Adobe XMP Core 5.4-c002 1.000000, 0000/00/00-00:00:00        "))
      "
       "
      (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:RDF|
       (@
        (@
         (*NAMESPACES*
          (|http://www.w3.org/1999/02/22-rdf-syntax-ns#|
           "http://www.w3.org/1999/02/22-rdf-syntax-ns#" . |rdf|))))
       "
          "
       (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:Description|
        (@
         (@
          (*NAMESPACES*
           (|http://ns.adobe.com/xap/1.0/| "http://ns.adobe.com/xap/1.0/"
            . |xmp|)))
         (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:about| ""))
        "
             "
        (|http://ns.adobe.com/xap/1.0/:CreatorTool| "Picasa") "
          ")
       "
          "
       (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:Description|
        (@
         (@
          (*NAMESPACES*
           (|http://ns.adobe.com/xmp/sType/Area#|
            "http://ns.adobe.com/xmp/sType/Area#" . |stArea|)
           (|http://ns.adobe.com/xap/1.0/sType/Dimensions#|
            "http://ns.adobe.com/xap/1.0/sType/Dimensions#" . |stDim|)
           (|http://www.metadataworkinggroup.com/schemas/regions/|
            "http://www.metadataworkinggroup.com/schemas/regions/" . |mwg-rs|)))
         (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:about| ""))
        "
             "
        (|http://www.metadataworkinggroup.com/schemas/regions/:Regions|
         (@ (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:parseType| "Resource")) "
                "
         (|http://www.metadataworkinggroup.com/schemas/regions/:AppliedToDimensions|
          (@ (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:parseType| "Resource")) "
                   "
          (|http://ns.adobe.com/xap/1.0/sType/Dimensions#:w| "912") "
                   "
          (|http://ns.adobe.com/xap/1.0/sType/Dimensions#:h| "687") "
                   "
          (|http://ns.adobe.com/xap/1.0/sType/Dimensions#:unit| "pixel") "
                ")
         "
                "
         (|http://www.metadataworkinggroup.com/schemas/regions/:RegionList| "
                   "
          (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:Bag| "
                      "
           (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:li|
            (@
             (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:parseType| "Resource"))
            "
                         "
            (|http://www.metadataworkinggroup.com/schemas/regions/:Type|) "
                         "
            (|http://www.metadataworkinggroup.com/schemas/regions/:Area|
             (@
              (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:parseType| "Resource"))
             "
                            "
             (|http://ns.adobe.com/xmp/sType/Area#:x| "0.680921052631579") "
                            "
             (|http://ns.adobe.com/xmp/sType/Area#:y| "0.3537117903930131") "
                            "
             (|http://ns.adobe.com/xmp/sType/Area#:h| "0.4264919941775837") "
                            "
             (|http://ns.adobe.com/xmp/sType/Area#:w| "0.32127192982456143") "
                            "
             (|http://ns.adobe.com/xmp/sType/Area#:unit| "normalized") "
                         ")
            "
                      ")
           "
                   ")
          "
                ")
         "
             ")
        "
          ")
       "
          "
       (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:Description|
        (@
         (@
          (*NAMESPACES*
           (|http://ns.adobe.com/exif/1.0/| "http://ns.adobe.com/exif/1.0/"
            . |exif|)))
         (|http://www.w3.org/1999/02/22-rdf-syntax-ns#:about| ""))
        "
             "
        (|http://ns.adobe.com/exif/1.0/:PixelXDimension| "912") "
             "
        (|http://ns.adobe.com/exif/1.0/:PixelYDimension| "687") "
             "
        (|http://ns.adobe.com/exif/1.0/:ExifVersion| "0220") "
          ")
       "
       ")
      "
    ")
     (*COMMENT* " whitespace padding ") (*PI* |xpacket| "end=\"w\""))
The only terrible thing about the SXML is the preserved-whitespace from the XML (which of course wouldn't exist in pure SXML); otherwise it's much nicer and contains exactly as much information.
  • plaguuuuuu 9 years ago

    Not the only one. Whoever decided that namespaces should have "http://" needs to be fired out of a bloody cannon.

    • DaiPlusPlus 9 years ago

      The URI<->URL equivalence that justifies "http:// namespaces" was a neat trick insofar as it means you can use the URL of the XML schema definition file (.xsd) as the URI of the namespace in a document that contains it - thus allowing XML readers to automatically perform schema validation when encountering a new schema.

      ...but given how well DNS-based package names in Java have worked out (i.e. poorly) I'm surprised they went in that direction.

      On the bright side - URIs (and so, XML namespaces) don't need to use the http:// scheme - they could easily switch to urn: http://stackoverflow.com/questions/4116282/when-to-use-a-urn...

  • zerocrates 9 years ago

    RDF-XML anything is never going to look particularly nice.

  • sundvor 9 years ago

    That whitespace is making my head explode, though.

    E.g. that shrinking tunnel under "normalized".

    • zeveb 9 years ago

      Yeah, it's pretty terrible. I kinda wish I'd preserved the namespace shortnames too, as it'd have made my point even better. Still, I regret nothing: XML is the JavaScript of data-interchange formats.

thinknot 9 years ago

Next time, run `optipng -o9 -strip all` on all your png files!

pavement 9 years ago

Which is funny, because all of the things I hate about Microsoft and Windows have absolutely nothing to do with whether or not they provide bloated binaries, containing PNG images that are bundled with extra XML tags and descriptors.

Gee whiz! What a world!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection