Settings

Theme

Apple releasing segmentation/pose for humans and animals, embedding for 27 lang

developer.apple.com

220 points by sumodm 3 years ago · 74 comments

Reader

lgrebe 3 years ago

I remember when pose detection was announced, showing an app that corrected your workout movements. i have yet to see an app that actually does that. i'd love to have the equivalent of a personal trainer showing me where i need to adjust my pose in say pushups or other simple excercies.

thus im equally sceptical of seeing these apis used. it seems developers are mostly porting web apps to all platforms ignoring neat but platform specific apis like this.

please prove me wrong and link some awesome apps that use pose detection.

  • scottymac 3 years ago

    There are multiple apps in the App Store that do this. I spent last year implementing pose detection in an exercise app and we used both Apple’s pose detection and a 3rd party’s. The pose (each point of the human form) itself was sent to a machine learning backend at around 30 fps, analyzed, and data returned at about the same speed using gRPC. Each exercise had a set of specific feedback for both positioning (“Stand facing the camera with you arms at your side/Stand sideways to the camera/etc”) and form correction (“Raise your right arm higher above your head etc”). Feedback was spoken out loud to the user and there was a relatively complex set of rules governing which feedback got priority and how often feedback was spoken. I also implemented an on-screen “skeleton” of the user’s human form points that rendered on top of the camera view. Pretty fun project from a tech point of view.

    • supermatt 3 years ago

      The signal to noise in fitness apps is high. The mainstream ones don’t do this, or if they do the implementation is so bad it’s not worth using, and discovery of anything else is fraught with shitware that wants a subscription to “unlock” it’s unknown potential.

      • mynegation 3 years ago

        Did you mean to say signal-to-noise [ratio] is _low_? Meaning that you get way too much noise for the amount of signal. Or did you mean to say it needs to be high (I.e. low noise) to be useful?

        • SuperShibe 3 years ago

          For every good fitness app that does what it promises (-> signal) there are at least 50 bad fitness apps that promise too much and let you pre-pay for the (broken) features you wanted, money you'll never get back (-> noise).

          The amount of noise in Fitness apps is so high, nobody really dares to try out small apps. Therefore cool implementations from small devs like the workout-correction might stay unnoticed for years.

        • supermatt 3 years ago

          i did mean low, yes :)

      • anonymoose4 3 years ago

        Yeah you mean low

    • paisawalla 3 years ago

      Can you name the app?

    • smugma 3 years ago

      What’s the best app you’ve tried?

  • whywhywhywhy 3 years ago

    The “there’s an app for that” world where small to medium sized teams can build a very iOS native app that takes advantage of the latest and greatest of the device is long gone.

    It’s kinda weird how Apple doesn’t realize this and continues to build for that world. Maybe if they were willing to shift on their % for devs that do build that way but unless they did there just isn’t the audience buying apps outright and the only ways to profit are tricking people into abusive subscriptions or building on ads and their personal data.

    Until then no idea why any dev would build just for the Apple ecosystem and not something agnostic.

    It’s telling to me that the biggest tech apps of the last 2 years all ran web/desktop first.

    • jessekv 3 years ago

      There are still a few, e.g. https://halide.cam/

      But if you are successful, there is a chance of getting sherlocked, so its a risky business model.

      • dingledork69 3 years ago

        > But if you are successful, there is a chance of getting sherlocked, so its a risky business model.

        What does this even mean? Watson shows up to help out?

        • wlesieutre 3 years ago

          Watson was an independent search application on Mac OS, until Apple basically photocopied it and named theirs Sherlock. Since then it's become a verb for when Apple takes your app and builds it into the OS.

          Another blatant example was Dashboard, which copied Konfabulator, Night Shift is a copy of F.lux, etc.

          • countvonbalzac 3 years ago

            I still use F.lux, it's much better than night shift IMO, just wish they had it for iOS.

          • qubex 3 years ago

            When they demonstrated on-desktop widgets the other day, I thought to myself “there goes Konfabulator/Dashboard 2.0”, and then I thought to myself “you’re old enough to remember that and to be caustically cynical”.

          • ladberg 3 years ago

            Technically Sherlock predates Watson, it's just a lot of the useful additional features added by Watson were copied to Sherlock.

            • wlesieutre 3 years ago

              Ah you're right, it was Sherlock 3 that copied the Watson features. Mostly related to searching the web for things like ebay listings, recipes, stocks, software.

              So the Watson name was probably inspired by Apple's from previous versions, but the Sherlock 3 feature set definitely got cloned from Karelia's.

        • scyzoryk_xyz 3 years ago

          "Getting sherlocked means that Apple just announced the software, or feature that a developer built their business on."

          https://www.howtogeek.com/297651/what-does-it-mean-when-a-co...

    • kemayo 3 years ago

      > Until then no idea why any dev would build just for the Apple ecosystem and not something agnostic.

      The standard reason given is that iPhone users are much more valuable than Android users, in that they're a lot more likely to pay for things. If I'm creating a workout app with a fancy form-correction feature then I might well want to use Apple-platform things that make it quicker to develop, at the cost to me of only slightly restricting my actual market.

    • codeflo 3 years ago

      It's not just the machine learning stuff, they have a non-portable approach for everything, including the platform's primary programming language. They still seem to live in a world where a significant niche of developers targets Apple platforms and their bespoke APIs only.

      The problem with that world view is that (a) everything with a network effect can't target a single platform anymore, and (b) the business model for old-school professional single-user apps was killed by the App Store.

      • actualwitch 3 years ago

        They are relying on people looking into spending statistics by platform and realizing that if they want those sweet sweet $$$, they are forced to deal with apple and their walled garden.

      • nerdbert 3 years ago

        Unlike Microsoft? Windows APIs are just as "bespoke".

    • fnordpiglet 3 years ago

      I think they seem to be doing fairly well as a company, and part of that is not letting themselves be tethered by a standard to allow competitors equal access to their walled garden. Whether you like that or not, it’s the strategy they’ve taken. They would rather not have your app than distort their platform to accommodate its ability to run on another platform.

      For developers the reason to adopt the apple ecosystem is fairly simple. People willing to pay for an apple device are likely willing to pay for a subscription. The apple model is essentially you buy a subscription to their hardware - they release at a regular clip, they anticipate most customers will refresh, there’s no meaningful upgrade path, etc. As a developer I prefer subscriptions over one time purchases because it incentives my maintenance and growth of features for existing customers rather than a never ending grab for new customers. As a consumer while my pocket book certainly prefer one time pay, I actually do see the benefit in incentivizing continuous improvements for existing customers. (I do however wish that apple didn’t hide the subscriptions management so deeply and made it very prominent, and until they do it falls into the abusive category IMO)

    • Someone 3 years ago

      > It’s kinda weird how Apple doesn’t realize this and continues to build for that world.

      If you’re a hardware manufacturer, I don’t think building for the common denominator of the web browser is a viable strategy. Looking at various of their competitors, it certainly brings in less money.

      How many people would buy an iPhone that’s basically a “browser device” if, for 50% of its price, they could get something that’s 80% as good (percentages for illustration purposes)?

    • lwkl 3 years ago

      > It’s telling to me that the biggest tech apps of the last 2 years all ran web/desktop first.

      What are the two apps that you are referring to? No snark just curious. Because the only thing I can think of are video calls or social media (which are arguably older than two years).

      • whywhywhywhy 3 years ago

        ChatGPT, Stable Diffusion. Both web/terminal first.

        Think 5-8 years ago at least one of them would have been app first.

        • pzo 3 years ago

          You already have native chatgpt app by openai. Also Microsoft have integration on native edge browser, SwiftKey keyboard and Bing app.

          There are also many mobile stable diffusion apps and even native mobile discord app which is UI for midjourney

          And since all those app require a lot of typing or prompt tweaking they where better suited for desktop first

  • dmix 3 years ago

    The big question is whether it's even capable of making recommendations like that. You'd have to combine it with your own model.

    Having read books on strength training and tried to learn stuff like squatting perfectly myself I'm skeptical it could be to grasp the nuance.

    But for dancing and other stuff where it doesn't matter as much it could be useful (health/safety wise when carrying load).

  • nbaugh1 3 years ago

    There a home gym setup called Tempo that claims to do this. I have the scaled down mobile app version so idk if the full setup is more informative, but it doesn't really give a lot of feedback at all. It basically just tracks the movement of the dumbbell plates that they send you to count your reps, and if you aren't moving the weight to the correct position your rep won't count. It definitely doesn't do anything like correct your form based on your body position like "hey straighten your back" or anything

    • agentdrtran 3 years ago

      I switched from the Move to the Studio (freestanding version) and the form feedback triggers __slightly__ more often but it's still not worth bothering about imo. the rep counting not working bothers me way more.

  • rickguru 3 years ago

    Hey, I'm the founder of Guru (https://getguru.ai/), a video AI dev platform. Developers are using our movement APIs to build some cool form feedback apps, including NFL coaches.

    - Demo: https://www.formguru.fitness/video/c96fa975-fd9e-4912-8f60-1...

    - Blog: https://blog.getguru.ai/guru-sports-powering-the-top-prospec...

    - Customers: https://www.cadoo.io/, https://www.breakawaydata.com/, https://pharosfit.com/, https://www.producthunt.com/posts/fitx.

    We've trained our own models (and customers can finetune them), but it exports cleanly to iOS (and Android!).

  • ChrisMarshallNY 3 years ago

    One of the reasons that I write native, is so that I can access stuff like this. It usually takes quite a while for hybrid platforms to catch up.

    That said, I don’t have an immediate need for this particular SDK, in the project I’m developing. I just like to have the option to integrate stuff like this.

    Also, I’m not a “bleeding edge” developer. I’m still using UIKit/AppKit/WatchKit (as opposed to SwiftUI), and my software supports one OS version back, upon release.

  • deeesstoronto 3 years ago

    I'm part owner of a company (Halterix) that used pose detection (alternately via smartwatch accelerometer) and machine learning to quantify how well exercises are performed.

    We built a demo app for use in physiotherapy to improve outcomes and ran a few clinical studies. The detection accuracy was excellent and patient reception was warm.

    There are a number of competitors, some with multi-sensor systems targetted to pros, some with vision systems, etc.

    We met with all the big fitness app makers and found generally while they weree somewhat interested in pose detection/accuracy assessment and feedback, it's not at the top of their list of priorities to implement (even to incorporate our 3rd party service).

  • YouWhy 3 years ago

    Proper disclosure: I had been formerly involved with them.

    Try Kemtai.com.

    There's a demo section at https://app.kemtai.com/sample-workouts

    We took workout experience super seriously, and in my biased view, got it to be a usability joy.

  • dbtc 3 years ago

    For a more immediate and quite practical solution, I've had good results from simply taking videos of various movements and watching them right afterwards.

  • unstatusthequo 3 years ago
  • SoftTalker 3 years ago

    So hire a personal trainer for a session or two?

  • madsbuch 3 years ago

    > it seems developers are mostly porting web apps to all platforms ignoring neat but platform specific apis like this.

    I would be weird to have your social app trying to correct your pose ;)

    • mbork_pl 3 years ago

      > I would be weird to have your social app trying to correct your pose ;)

      It would be less weird for it to call home so that Z*ck knows to serve you ads for painkillers for your spine...

      • madsbuch 3 years ago

        This comment seems to be unrelated to the subject. Is there a specific reason you mention it (along with the down vote I take is also from you, given your passive aggressive tone)?

        The latent point is that most applications don't need specific APIs for their value propositions, why it would not make sense to write them in a native framework that enables these APIs.

        • TeMPOraL 3 years ago

          > This comment seems to be unrelated to the subject.

          Isn't it obvious? It's tech companies and mobile ecosystem. Everything that can be used for ads and surveillance, will be used for ads and surveillance.

          Pose detection can provide a lot of insights about the user's overall health. All the unscrupulous players now need is some bullshit reason to convince the user to a) install their app, and b) use the feature. Like, idk, high-fidelity dancing AR avatars to use as stickers on social media (a real thing, btw.).

          Apple may or may not make this hard, but it's a real thing to be concerned about. Arguably, it's the most obvious use case, given the state of this industry.

        • mbork_pl 3 years ago

          > Is there a specific reason you mention it (along with the down vote I take is also from you, given your passive aggressive tone)?

          Yes, the reason is I considered it funny. Sorry if that didn't work.

          Also, I didn't downvote your comment, but I just upvoted it to compensate for someone who downvoted you, seemingly without a good reason – at least I agree with it.

iFire 3 years ago

To be clear does it mean access on Apple devices and not like an Apache 2 licensed Github repository?

https://twitter.com/yeemachine/status/1656391928223768576?s=...

https://mediapipe-studio.webapps.google.com/demo/face_landma...

https://github.com/google/mediapipe

yyyk 3 years ago

Where are the actual bindings? Linking to a page with lots of long videos that are mostly not available (it says "Available on June 6" (or 7,8,9) with a editorialized title that is not even on the page is below HN standards.

lukko 3 years ago

Just in case you were wondering, animals seems to be just cats and dogs: https://developer.apple.com/documentation/vision/vnanimalide...

  • enlyth 3 years ago

    > static let cat: VNAnimalIdentifier

    This feels funny to read, like from one of those inheritance tutorials on object oriented programming

  • CrampusDestrus 3 years ago

    Seems logical, most people interact with their pets which are mostly cats and dogs.

    Horses will probably be next

  • yreg 3 years ago

    According to the keynote the first party app is going to recognize the family cat/dog as a "person".

ajayxtra 3 years ago

We had tried their vision framework for pose, the accuracy was not great compared to other open source models. Hope they solve the issues with the new release.

@lgrebe: Check XTRAVISION and let me know if that is what you were looking for. Demo: https://demo.xtravision.ai/

egonschiele 3 years ago

Does this mean Apple is making it easier to run models on Macs? I have a fairly powerful Mac studio, but I've found it very hard to run any model on it.

itake 3 years ago

(feel free to correct me if I am wrong), but my main gripe against mobile ML frameworks (Android too) is they require the app to embed the ML model with the app (as opposed to the OS storing the model like a shared library).

People with limited storage on low-end device don't have enough memory to store the apps.

  • dimatura 3 years ago

    CoreML has various models already built-in, although as black boxes that accomplish some task like OCR or rectangle detection. There's also a "feature print" model which I believe are intended to be used as hard-coded features for simple ML tasks. In either case I strongly suspect that when you use them, they're not being embedded in the app.

    Another thing to consider is that you don't have to embed the model in your app; at least in CoreML you can download (and update) the model weights over the network.

    • itake 3 years ago

      People have lower HD size (think 64GB iPhone X/iPhone SE) on unreliable internet networks. Downloading five 200MB models to perform five layers of processing (OCR, rm background, object detection, etc.) would take hours and consume too much cellular data.

      Sending a 10kb image to the cloud for processing is much faster and user friendly.

  • carstenhag 3 years ago

    On Android, you can choose to include the ML model dependency "bundled into your app in compile time" or shipped through Google Play Services. Saves many megabytes. Drawback there: if the device doesn't have Play Services, nothing works. Also, on first download it takes some seconds to work.

tracerbulletx 3 years ago

Is there an app using this for coaching running form, or doing a custom bike fitting? That would be awesome.

  • cj 3 years ago

    There's nothing technically stopping us from doing that even 10 years ago.

    MTailor [0] is/was a company that, using your phone camera, could measure you for pants/shorts/shirts/etc... it was a YC company 2014 and also on Shark Tank.

    I hope the AI hype brings back some of these sort of use cases that might have been ahead of their time.

    [0] https://apps.apple.com/in/app/mtailor-custom-clothing/id8160...

heliophobicdude 3 years ago

I would be interested to know of a consistent, on-board embeddings model. Trying to reduce latency and dependence on API calls for simple vector database search will go a long wag

YouWhy 3 years ago

TL;DR: Good step for the entire market, productization is the harder problem.

I had been formerly involved with Kemtai, which built a fantastic physical therapy/fitness experience (in my biased view) using motion tracking.

If anyone's interested, it is running well and quickly over WebGL on a pretty impressive share of regular phones and laptops across all platforms with WebGL (not just Apple)

My learnings is that the hard part is the productization on top of motion tracking: what constitutes an exercise? What is a "good" performance? How to build the authoring workflow for the many hundreds to low thousands of exercises necessary to reach a typical user base?

In any case, that's awesome news. There are literally billions of people whose condition is going to be better via motion tracking based health and fitness. May it grow there, and quickly!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection