Settings

Theme

Stretch iPhone to its limit: 2GiB Stable Diffusion model runs locally on device

liuliu.me

749 points by GrantS 3 years ago · 183 comments (182 loaded)

Reader

ollin 3 years ago

here's a direct app store link, if anyone wants to try the iPhone app immediately: https://apps.apple.com/us/app/draw-things-ai-generation/id64...

congratulations to liuliu on the launch!

  • ttyyzz 3 years ago

    Crashes on my iPhone SE 2. Gen which was to be expected :-(

  • antihero 3 years ago

    Crashes on my 13 Mini, which is unexpected! Anything we need to do to prep the phones?

    • rsynnott 3 years ago

      I think that probably is to be expected, unfortunately. The model is 2GB; per the article iOS will kill apps that use 2GB on 4GB RAM devices (and the 13 Mini is 4GB).

      • harrisi 3 years ago

        It works just fine on my iPhone 12 Mini, which has (basically) the same RAM, so it seems like it's something else. They're basically the same phone in general though, so I would be surprised if it was a hardware issue.

    • pulvinar 3 years ago

      It's running on my 13 Mini (so far).

      I did have trouble closing the "adjustments" dialog (upper-right button) due to its close button being underneath the status bar, but found that I could just drag the dialog down to the bottom and it closed.

Retr0id 3 years ago

This is absolutely incredible. It takes about 45 seconds to generate a whole image on my iPhone SE3 - which is about as fast as my M1 Pro macbook was doing it with the original version!

  • liuliu 3 years ago

    SE 3rd Gen has 4GiB RAM, therefore the app defaults to 384x384 size. This is about 1/2 computation of your normal run (512x512) and the original version uses PLMS sampler, which defaults to 50 steps, while this one uses the newer DPM++ 2M Karras sampler, that defaults to 30 steps. All in all, your M1 Pro MBP is still 4x of your SE 3rd Gen in raw performance (although my implementation should be faster than PyTorch at about 2x on M1 chips)

    • codetrotter 3 years ago

      > although my implementation should be faster than PyTorch at about 2x on M1 chips

      Do you plan to make a macOS version of your app also? Hope you will :)

      • ShamelessC 3 years ago

        For what it's worth, you can decrease resolution and use the sampler mentioned on the pytorch versions. The AUTOMATIC web UI supports this, for instance.

        I would also welcome the additional optimizations, however.

    • 2Gkashmiri 3 years ago

      hey. 9th gen ipad 10.9, i think it has 3GB ram so it will never work on it?

sneak 3 years ago

Extreme respect to the developer for not including the "industry standard" clientside tracking/analytics/phone-home in this app. The fact that this runs locally on-device and doesn't send any information to anyone anywhere about what you're doing on your local device is wonderful.

All apps used to be like this, and now the ones that actually respect user privacy are a rare and glorious exception. Thank you!

WantonQuantum 3 years ago

This is cool and looks like a great way to drain my phone battery :)

I just used the prompt "A person looking at their phone in amazement" and got a good picture.

Beware that on startup the app downloads almost 2 gig of data.

  • dylan604 3 years ago

    It would be awesome if it just quickly took a picture from the front camera for that particular request and then just filter it to finish the "in the style of andy warhol"

  • donlinglok 3 years ago

    I have generated some images, I think it only takes less than 1% of the battery for an image, this is already much better than most of the game(for having fun).

    • isoprophlex 3 years ago

      It took my battery from 80 to 77% for one generation on the default settings (384^2, 30 iterations). Less than a minute of compute time to complete a generation.

      Iphone battery health reports a battery at 100% health. This is an iphone SE3.

      Amazing how huge the difference in energy consumption is for the system in standby vs going full throttle.

      EDIT: I generated 3 more images; every subsequent generation reduced battery capacity by another 2%. My phone doesn't seem to heat up at all, interestingly.

      • 4ggr0 3 years ago

        My iPhone 12 mini dropped from 76% to 71% and got noticeably warmer. Battery health is way worse than yours, 85%.

        I wonder if generating a picture uses so much juice that it even drops the percentage while charging...

        EDIT: Generating a picture while charging did not drop the percentage.

        • skykooler 3 years ago

          I generated a bunch of pictures while charging and the percentage still went up (albeit slowly).

        • wut42 3 years ago

          Late to the party but depending on your charger voltage, it may be. e.g. I can charge my mbp on a cheap usb-usb-c charger but as soon as I use it _too_ much, it will stall, or worse, lower.

  • tibbon 3 years ago

    It does warn you on startup about the download if you're not on wifi.

    • odysseus 3 years ago

      Good time to try 5G ultra capacity if you have it on an unlimited plan - will be faster than most people's wifi.

      • jcims 3 years ago

        Pivot alert

        I reached 3Gbps over Verizon 5G in San Antonio last year and this year i get about 4Mbps over Verizon 5G in Ohio. It’s so bad I disabled it. I did read an article that iPhone 12 (which is what I have) have some kind of radio issue with 5G. Can anyone in here confirm?

        • badwolf 3 years ago

          Verizon has been real wonky all over Austin. If there's more than a handful of people in the area, bandwidth just goes to the crapper. I'll go from a couple hundred mbps on a good day with no clouds/wind/holding my phone just perfectly right in the right space, but usually get less than 1mbps on their 5g UW.

  • ncr100 3 years ago

    (I have not downloaded it.)

    Q: How long does a run typically take? 60 seconds?

    • ASalazarMX 3 years ago

      From the link:

      > It took a minute to summon the picture on the latest and greatest iPhone 14 Pro, uses about 2GiB in-app memory, and requires you to download about 2GiB data to get started. Even though the app itself is rock solid, given these requirements, I would probably call it barely usable.

      also

      > Even if it took a minute to paint one image, now my Camera Roll is filled with drawings from this app. It is an addictive endeavor. More than that, I am getting better at it. If the face is cropped, now I know how to use the inpainting model to fill it in. If the inpainting model doesn’t do its job, you can always use a paint brush to paint it over and do an image-to-image generation again focused in that area.

      Seems very worth a try. I'm downloading the model right now, it's going a bit slow, ~2MB/s.

ZeroCool2u 3 years ago

This is super cool. I just tried the default prompt on my iPhone 13 with the image size set to 768x512 and using the 3D Model (Redshift v1) and it just crashed the whole phone and restarted. Just like when I get BSOD's at work on my Windows GPU desktop :)

  • liuliu 3 years ago

    You should see a warning when selecting that size? 4GiB model cannot run at that resolution until someone implemented FlashAttention on Metal :)

    • GistNoesis 3 years ago

      Nice work :)

      Porting FlashAttention to Metal will be quite hard. Because for performance reasons, they did a lot of shenanigans to respect the memory hierarchy.

      Thankfully, you can probably do something slower but more adapted to your memory constraints.

      If you relax this need for performance and allow some re-computations, you can write a qkvatt function which takes q,k,v and a buffer to store the resulting attention, and compute without needing any extra memory.

      The algorithm is still quadratic in time with respect to the attention horizon (although with a bigger constant (2x or 3x) due to the re computation). But it doesn't need any extra memory allocation which makes it easy to parallelize.

      Alternatively you can use an O(attention horizon * number of thread in parallel) (like flash attention) extra memory buffer to avoid the re-computation.

      Concerning the backward pass, that's the same thing, you don't need extra memory if you are willing to do some re-computation, or linear in attention horizon to not do re-computation.

      One interesting thing to notice in the backward pass, is that it doesn't use the attn of the forward pass, so it doesn't need to be kept preserved (only need to preserve Q,K,V).

      One little caveat of the backward pass (which you only need for training) is that it needs atomic_add to be easy to parallelize. This mean, it will be hard on Metal (afaik they don't have atomics for floats though they do have atomics for integer so you can probably use fixed points numbers).

    • ZeroCool2u 3 years ago

      I did, it's fully my fault, I just wanted to see what would happen.

      Love the work, really great job!

aarkay 3 years ago

This is amazing! Finally a use case for using all that compute power on the phone

  • cwmoore 3 years ago

    This illustrates the beginning of use cases for computing/ML on the edge. The total power of all the phones and their sensors is mindblowing.

    • criddell 3 years ago

      Some javascript crypto miners have been taking advantage of this for years now.

desro 3 years ago

This is incredible work and a tremendous achievement. Bravo and thanks for sharing.

nl 3 years ago

This is some impressive work.

You might like to look at the work HuggingFace has been doing (on non-iOS versions). They can run it in under 1GB RAM:

> If is also possible to chain it with attention slicing for minimal memory consumption, running it in as little as < 800mb of GPU vRAM

https://huggingface.co/docs/diffusers/optimization/fp16#offl...

RBerenguel 3 years ago

It works extremely fast on an iPad Pro M1 (kind of expected, but it's _impressive_) although the app is built as "iPhone only", and strangely enough the iPad is cropping the upscaled iPhone app so the lower bar of image controls don't show at all, which is a pity

  • spideymans 3 years ago

    This is about to become this single most important app on my iPad Pro. It totally accelerates my workflow

  • jamil7 3 years ago

    Reach out to the author, they can likely fix this easily.

    • RBerenguel 3 years ago

      Yup, done. I thought the author would see it better here (also would make it visible for other people stumbling on the issue) but I have contacted separately explaining the issue.

tacotime 3 years ago

Hahah now I can use my phone as a hand warmer this winter. It's incredible that this is an app!

miohtama 3 years ago

Could in-memory compression used to bring down the RAM requirements?

There are some performance compressors like Blosch tuned for this:

https://www.blosc.org/pages/blosc-in-depth/

“Faster than memcpy” is the slogan.

  • addaon 3 years ago

    MacOS has transparent memory compression. Unclear to me if that's made its way to iPhone, but if it hasn't yet it will sooner or later.

    • astrange 3 years ago

      Memory compression is a generalization of swap, which is only for dynamic memory; files on disk don't need it because you can just read them off the disk.

      The problem is that GPUs don't support virtual memory paging, so they can't read files nor decompress nor swap anything unless you write it yourself, which is a lot slower.

      Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!

      • earthscienceman 3 years ago

        Wait. This comment just blew my mind. Does that imply that you might be able to measure the efficiency of a model by it's compressibility? Note, I'm trying to recognize efficient and accurate are not the same. One could imagine evaluating a model on a 2d performance and compression map somehow.

      • ColonelPhantom 3 years ago

        > Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!

        I feel like they're kind of two sides of the same coin: learning is about putting more information in the same data, while compression is about putting the same information in less data.

        I'm wondering if some lossy floating-point compressor (such as zfp) would work.

        • astrange 3 years ago

          > I'm wondering if some lossy floating-point compressor (such as zfp) would work.

          Well apparently this can work; StableDiffusion comes as 32-bit and 16-bit float versions. I'm kind of surprised they both work, but that's lossy compression.

          • ColonelPhantom 3 years ago

            Sure, but 16-bit float is pretty primitive compression, as it does not exploit any redundancy in the input. zfp groups numbers together in chunks, which means that correlated numbers can be represented more precisely. Its algorithm is described here: https://zfp.readthedocs.io/en/release1.0.0/algorithm.html#lo...

            I would like to see if the zfp can be applied to something like Stable Diffusion (or other ML models) and give better results than regular floats at the same size.

    • comboy 3 years ago

      Memory compression? I can't find any good resources to read about it, any hints? I'm having trouble imaging how could it possibly work without totally destroying performance.

      • kccqzy 3 years ago

        It doesn't destroy performance for the simple reason that nowadays memory access is slower than pure compute. If you need to use compute to produce some data to be stored in memory, your overall throughput could very well be faster than without compression.

        There have been a large amount of innovation on fast compression and fast decompression in recent years. Traditional compression tools like gzip or xz are geared towards higher compression ratio, but memory compression tends to favor speed. Check out those algorithms:

        * lz4: https://lz4.github.io/lz4/

        * Google's snappy: https://github.com/google/snappy

        * Facebook's zstd in fast mode: http://facebook.github.io/zstd/#benchmarks

      • miohtama 3 years ago

        On Mac, you can find Compressed memory in Activity monitor.

        It’s something similar to swap - apps do not need to have built in support for it.

        • flatiron 3 years ago

          It segments a certain amount of ram to “swap” to which means compress and store. Normal blue sky ram operations are not compressed on macOS

        • smcleod 3 years ago

          Many operations are actually a lot faster with compressed memory than without. It's all about where the bottleneck is.

        • comboy 3 years ago

          Oh, yes compressed swap makes much more sense, thanks.

          • kergonath 3 years ago

            It is not compressed swap, the compressed data is still in RAM. The OS just compresses inactive memory, with a couple of criteria to define “inactive”.

    • miohtama 3 years ago

      My guess is that iPhone is purely “kill app” instead of “compress memory / swap” OOM model. This makes more sense for mobile.

      • Sirened 3 years ago

        iOS uses memory compression but not swap. iOS devices actually have special CPU instructions to speed up compression of up to page size increments specifically to aid in this model [1]

        [1] https://github.com/apple-oss-distributions/xnu/blob/bb611c8f...

      • musicale 3 years ago

        IIRC from WWDC they said that inactive/suspended apps get their memory compressed to free up memory for the current active/foreground app.

        Seems to mesh well with the iOS idea of using a single app at a time and minimizing background processing in apps that you aren't actively using.

        In an out of memory situation I think apps just get killed as you suggest.

  • liuliu 3 years ago

    Should be doable for parameters but at that point, you don't need compression rather just LLM.Int8 tricks would be sufficient. For activations, I wrote about it a while back: https://liuliu.me/eyes/reduce-another-70-memory-usage-for-de...

    It is not as useful for this case (inference) because the activations holds long (UNet holds downsampling passes' activations and use that for upsampling) is not that much of a memory (in the range of a few megabytes). If it is for training, it is probably more useful.

  • conradev 3 years ago

    In-memory compression means the memory is inherently dirty memory

    On Apple platforms if you mmap a read-only file into the process address space, then it is "clean" memory. It is clean because the kernel can drop it at any time because it already exists on disk. You essentially can offload the memory management to the kernel page cache.

    The downside is that if you run up to the limit and the "working set" can't fit entirely in memory, then you run into page faults which incur an I/O cost.

    The advantage is that the kernel will drop the page cache before it considers killing your process to reclaim memory.

    That said, I don't know the typical access patterns for neural network inference, so I don't know how the page faults would effect performance

  • Scaevolus 3 years ago

    No. The memory usage is due to huge series of floating point numbers without much redundancy that you could squeeze out with a compressor.

  • tehjoker 3 years ago

    zfp compression might be an interesting thing to try. In fixed rate mode, it supports random access too.

smcleod 3 years ago

Has anyone found anything similar (self-hosted or local on device) - but for text generation?

vardump 3 years ago

Now I see why that coffee cup icon is on the "generate images" button... fingers burning after a few images.

Haha, awesome app!

vletal 3 years ago

Awesome! Although I wish to see the intermediate denoised image instead of the progress bar. Just a suggestion.

  • machina_ex_deus 3 years ago

    This isn't recommended, the decoding takes as much time as processing next step. I learned it the hard way when I tried displaying the intermediate steps for debugging.

    • ollin 3 years ago

      yeah, running the full decoder takes a while. though, since the "latent" is just 4 channels and pretty close to representing RGB, you can use a linear combination of latent channels and get a basic (grainy, low-res) preview image like this [0] without much trouble. I expect you could go further, and train a shallow conv-only decoder to get nicer preview results, but I'm not sure if anyone's bothered yet.

      [0] https://github.com/madebyollin/maple-diffusion

globalvisualmem 3 years ago

I just tried it works beautifully! I suggest defaulting to lower resolution 384 X 384 since it will speed up.

sisama 3 years ago

Unstable trying with 10 iterations on 348x348 on iPad 9th Gen https://support.apple.com/kb/SP849?locale=en_US. Looks cool tho!

  • kossTKR 3 years ago

    Yeah same for me. App just closes after 5 seconds. Would be fun to try!

  • anaganisk 3 years ago

    iPad has 3GB ram and model needs 2GB, so, I think your device is too underpowered.

dt3ft 3 years ago

Congrats on the release!

I gave this and other available applications a try and I don’t understand what people see in ai image generation.

A simple prompt generated a person with 3 nostrils, 7 squished fingers, deformities everywhere I look, it just mashes a bunch of photographs together and generates abominations.

Pay close attention to generated models and you will find details which are simply wrong.

What is the use case that I’m missing?

  • fragmede 3 years ago

    Early cars were terrible too, but here we are. The promise is that future versions of the technology will be able to draw anatomically correct people and images. A computer program that can do in mere minutes what takes a person hours. If you've never wanted a picture of something you can describe but aren't able to draw in your life, then there is no use case for you. For anyone else that's interacted with the world of art and graphic designers or used stock photos; this goes an order of magnitude faster, and is basically free, compared to hiring a skilled professional for hours. It's a game changer for an industry that it sounds like you've just never interacted with.

    • fock 3 years ago

      > Early cars were terrible too, but here we are.

      Were they? https://en.wikipedia.org/wiki/Benz_Patent-Motorwagen - as fast as a carriage, about the same stink. Carriages clearly had a usecase.

      Generated images now: take enormous energy to generate. Main current usecase is to gobble up more energy (mass media/entertainment).

      • fragmede 3 years ago

        They were. They were loud and stinky and were unsuitable for dirt roads, spooking horses, causing the UK to basically ban them. Some were powered by steam or coal but those that were powered by gas had a different problem - there were no gas stations. You had to hand crank them to start. Moving goods and people around was already a solved problem with horses and trains and boats.

        Cars then: take enormous energy to move very little, and slowly. Main use case then was as a rich person's toy (entertainment). They'll never replace work horses with them.

        It's easy, in hindsight, to see cars as inevitable. But you had to see past the shortcomings of the earliest cars to "get it", much like you have to see past the 3 armed monstrosities that current image generation techniques produce and see the promise of the technology. There were undoubtedly those who saw cars as hype, much like image generation is seen today; I'm sure buggy whip manufacturers saw cars as hype and refused to get on what looked like a hype train to them.

      • toqy 3 years ago

        And images have a clear use case. Stable Diffusion is effectively moving the horse under the hood.

  • cdrini 3 years ago

    I can't speak for others, but I've personally been quite impressed by the dalle output. It creates things that would take me hours (if not days) to create, which no other tech I've tried has been able to generate. It feels like it can absolutely replace at least the stock photo industry. It's also terrific for things like blog photos if you don't have the time or talent to create something yourself, but want some creative control.

    Expansions like dream booth, which let you fine tune the system with your own submitted images are also quite amazing. Being able to give it just a few photos, and say things like "show me surfing in the ocean" and get a reasonable image back.

    _Much_ more broadly, this space in AI/ML with GPT3/Dalle is exciting because it feels kind of like what the internet was made for. There's too much data on the internet for any one person to ever meaningfully process. But a machine can. And you can ask that machine questions. And instead of getting just a list of references back, you get an "answer". Image generation is the "image answer" part of this system. It's an exciting space because it feels like these systems will affect large chunks of how we use computers.

    Here's a cool GPT3 "programming" example: https://twitter.com/goodside/status/1581805503897735168

    And here are some of my dalle uses I've been impressed by, that I feel is publish-ready:

    - https://labs.openai.com/s/nkOTLRWzjgQTe4QsgoWChP7n

    - https://labs.openai.com/s/uSP55qRf1SqCbYTa2UDXXEfA

    - https://labs.openai.com/s/kO2purvEodK5UUxPIpL78bQh

    - https://labs.openai.com/s/2P1Mb75JbS1mmpyi86xwyyUg

    • dt3ft 3 years ago

      Thanks, this was helpful to better understand the use cases.

  • grumbel 3 years ago

    The 3 nostrils, 7 squished fingers are not that big of a problem, you can run other image enhancing AIs on top of the generated images to fix that, or just use inpainting and give it a few more tries to get it right. The models are also slowly getting better at it.

    > What is the use case that I’m missing?

    It's generating images from nothing more than a text description, a year ago that was something you'd only saw an StarTrek. Now it's real and we have barely scratched the surface of what is possible.

    The images still need some manual work, but try to generate images of that quality and complexity by hand and you might have more appreciation how mindblowing it is that AI can not only do it, but do it in seconds.

  • saberience 3 years ago

    Already on some of the homegrown models (https://rentry.co/sdmodels) these things are fixed already. For the Stable Diffusion "enthusiasts" the tools and models have improved at least 100% since the original release.

  • dave_sullivan 3 years ago

    It's more of a cool technology that is rapidly advancing. A couple years ago, it couldn't do this much. A couple years from now, it will be much better. It does much more than mash images together, which you would know if you dug into it a bit. That's it, that's the whole thing.

  • pdntspa 3 years ago

    try some of the prompts listed on lexica.art. Stable Diffusion needs good prompt engineering to turn out well

  • pmarreck 3 years ago

    There needs to be some sort of piece or filter that understands body geometries and inverse kinematics to prevent things like generating people with 3 limbs or joints in positions that would not normally be feasible without injury =). It'll come.

  • frankzander 3 years ago

    Nothing. It's IMHO just a hype of the younger nerdy generation. The real world applications of NN-based (there is no I in A) image generation are limited. One hype comes the other hype goes. IMHO it did not come to stay ;-)

    • gpderetta 3 years ago

      As a 40ish year old, the future shock from all these AI image generators is extreme.

      This stuff was literally science fiction just a couple of years ago. Now you run it in your phone.

  • stefandesu 3 years ago

    I haven't been able to get any good results with Stable Diffusion (via DiffusionBee on my M1 MacBook Air), but I've seen really good images of other AI generators like Midjourney.

  • gpderetta 3 years ago

    It does require a lot of cherry picking and fine tuning to get anything good, and yes, SD is terrible, terrible at hands.

    It is still extremely impressive and is improving every day.

  • nelsondev 3 years ago

    Play around with other prompts. For example, “golden gate bridge in the style of Van Gogh futuristic”.

    Architecture is cool. It doesn’t do people well

  • scrumlord 3 years ago

    You're missing imagination and the ability to write a good prompt. Come back when you're smarter.

  • juliendorra 3 years ago

    The use cases that are being explored now are:

    Movie preparation, storytelling https://twitter.com/juliendorra/status/1590058518174134272 https://twitter.com/mrjonfinger/status/1590021753979670528

    Fan art! https://twitter.com/rainisto/status/1581169461167816704 https://twitter.com/rainisto/status/1579474636202708993

    Product shots and generative marketing https://twitter.com/dtcforeverr/status/1589916644939161600 https://twitter.com/kylebrussell/status/1590563734317338624

    2D game assets, character design https://twitter.com/emmanuel_2m/status/1588249026272448512 https://twitter.com/elsableda/status/1562465392563351552

    Imaginary selfies (self-portrait is a huge human use case!) https://twitter.com/stevenpargett/status/1590047241183821824 https://twitter.com/dh7net/status/1581298913637646336 https://twitter.com/fabianstelzer/status/1579818105672302592

    Styling by example https://twitter.com/norod78/status/1590056501544386560

    Raw sketch to final image https://twitter.com/nousr_/status/1564797121412210688

    Editing in the most generic sense (replacing part of an image) https://twitter.com/bigblueboo/status/1585761916718383110

  • tomcam 3 years ago

    > and generates abominations

    You make it sound like that’s a bad thing

    • dt3ft 3 years ago

      Horror movie material for sure :) Who knows, maybe this inspires better horror movie creatures :)

sabalaba 3 years ago

liuliu has always been a fucking god

donkeyd 3 years ago

This is awesome. It has killed my battery over the last couple of hours, because I couldn't stop generating new images.

skykooler 3 years ago

I wonder if any of these tricks would be applicable to make a version of Stable Diffusion which could run on the Steam Deck.

  • ShamelessC 3 years ago

    Apple's mobile processors (especially the M1) are waaay faster than a steam deck. Even with the optimizations, it would take like half an hour to run I bet.

  • IceWreck 3 years ago

    I dont think the Steam Deck is powerful enough. It might be possible but it will take hours.

  • asadlionpk 3 years ago

    cant you just run it the regular way? it's a pc.

    • skykooler 3 years ago

      It doesn't have enough VRAM to load the regular model; it ends up triggering the OOM killer.

lll-o-lll 3 years ago

Any tips on how you “ If the face is cropped, now I know how to use the inpainting model to fill it in. If the inpainting model doesn’t do its job, you can always use a paint brush to paint it over and do an image-to-image generation again focused in that area.” using the app?

  • liuliu 3 years ago

    I have a Twitter thread on this: https://twitter.com/liuliu/status/1587978815208407041?s=46&t...

    (Note that at that time, there is an implemention bug in inpainting model that caused the weirdness that I need to manually fix)

    • ragazzina 3 years ago

      liuliu, this is simply incredible.

      Were you focused on just making it work on the iPhone, or do you think you will keep adding functionalities to the app? Do you think it will ever be possible to train one's own model on an iPhone?

      • liuliu 3 years ago

        I think that fine-tuning the whole model (a.k.a. Dreambooth) on iPhone would require more RAM / processing power than it currently has. More viable path is to implement Hypernetwork + Textual Inversion, that is within possibility of today's hardware.

senthilnayagam 3 years ago

on my iPhone 13, generates 384x384 images in under a minute.

discovered they have stable diffusion 1.4,1.5, waifu diffusion(Anime), redshhift(3d model) and other models.

iPhone becomes warm after couple of runs and starts draining battery, so do it while connected to charger.

haunter 3 years ago

Default prompt gives me an arm, or feels like a crop of a full photo (iPhone 11, iOS 16.1) https://files.catbox.moe/ivy15m.PNG

  • nier 3 years ago

    Same here. Going to the maximum of 512 × 320 pixels on my device gives me the feeling that only more capable devices can see the whole picture.

    This is not a resolution setting but a crop setting.

mrtranscendence 3 years ago

Even though I understood very little of that it was still wild fun reading it. I'm glad such wizards exist, because I and most people I know certainly don't qualify.

TuringNYC 3 years ago

On a related note, has anyone been able to utilize Apple silicon GPUs? Running CPU-only is incredibly slow (and sad, since i've got these Apple accelerators idle!)

drawingthesun 3 years ago

Is there any option to set so that every image is automatically saved, and not to camera roll, but to the local app folder (same folder that contains this apps data)

burk96 3 years ago

This looks so cool! Unfortunately, my 16.1 11pro is crashing before an image can be generated, let me know if I can do anything to debug/test this.

simonh 3 years ago

Thanks, Ive been looking forward to something like this coming out. Runs great on my iPad but a UI optimised for the form factor would be nice.

holoduke 3 years ago

I cant wait till this is possible with 24fps and do live camera view modifications. That woud be insane during meetings :) I guess 5 more years.

secretsatan 3 years ago

Really cool, but for some reason, it fails to share the images using Airdrop for me. I have to save the images to photos then airdrop from there

donlinglok 3 years ago

I have tested the app on my old iPhone XS Max, it takes less than 3 mins for an image.

And also can choose a model, steps, scale, and sampler!

Thank you for your great work!

stephenitis 3 years ago

small add. display multiple examples while it's downloading the model. Its a long time to stare at an astronaut on a horse in space

xingped 3 years ago

Fantastic! Can it be made available for iPad?

gok 3 years ago

Did you try using Core ML for inference?

ShamelessC 3 years ago

The quality of comments here is absolutely abysmal given the deeply technical nature of the article.

ActionHank 3 years ago

I wonder how long before the appstore policies are updated to require models be embedded in apps.

  • ajconway 3 years ago

    App Store has a CDN-like feature that allows uploading large resources separately and download them after running the app for the first time.

dirtyid 3 years ago

Is this something that might make it's way to tensor/pixel or unique to Apple silicon?

propogandist 3 years ago

the developer is about to have a MASSIVE hosting bill

the download restarts from 0% if the app is sent to the background, as there does not seem to be a download manager. This is especially problematic for the large 1.5gb file.

  • liuliu 3 years ago

    I am using Cloudflare R2, which doesn't have egress fee and I am getting about 5k Class B operations right now. Unless Cloudflare changes their end of the deal, I think it is OK.

    • propogandist 3 years ago

      great to hear! please consider introducing a mechanism to suspend the download vs restarting, this may be especially valuable for those with a slower connection. With the traffic growth you'll be seeing chances are Cloudflare's enterprise team will soon be in-touch ;)

  • Gigachad 3 years ago

    I wonder why the file can’t be distributed inside the app on the App Store. That way downloads would be much more convenient.

    • dylan604 3 years ago

      Is P2P torrent type of sharing the load possible under AppStore guidelines (be they iOS or Android)? I've honestly never even thought about this being a thing, but with large shared data that doesn't change, why not?

      • nicd 3 years ago

        P2P, as in hosted from other people's phones? I think the issue is that people generally wouldn't be happy with P2P data uploads from their phones (compared to P2P on desktop, where internet is cheaper/faster, and battery isn't an issue).

      • Miraste 3 years ago

        They are banned on iOS but not on Android.

    • asadlionpk 3 years ago

      appstore has app size limits.

pellias 3 years ago

Hmmm, my ipad keeps crashing and redownloading the sd_v1.4_f16.ckpt file.

  • liuliu 3 years ago

    Yeah, seems iPad has bunch of issues (curiously most of it related to how I translate tensor back to CGImage ...) stay tuned.

  • MarcusE1W 3 years ago

    On my iPad mini 5th generation with A12 the download is fast and fine. But with standard settings it first warns “Device capability warning” and then indeed crashes every time. Is there a way to solve this? A12 chip should work, no?

iseanstevens 3 years ago

This is really impressive !

boppo1 3 years ago

Aww man, no ipad version? Tsk what did I get this 16gb of ram for?

  • liuliu 3 years ago

    Given couple of weeks. Still playing to see what's the optimal UI looks like for such large screen. 16GiB should be able to generate several images to select from at once.

2Gkashmiri 3 years ago

i have an ipad. the regular non air/pro/m1 one and the app installed and when i run it, it says "could be device incompatibility" and subsequently crashes

patentatt 3 years ago

Crashes every time... am I doing something wrong? iPhone 11 Pro.

  • liuliu 3 years ago

    Does it crash upon downloading models, or generating? I haven't tested on all the devices, but 11 Pro seems have 4GiB RAM, and should run with 384x384 resolution (check if that is the selected resolution at top right).

    There are reports that iOS is not happy with how I computing SHA256 for downloaded model file by loading them all in memory for Xr (3GiB RAM). If this is happening for other devices, I may need to do streaming hash computing and put up a bugfix.

    • olliej 3 years ago

      It _might_ be easier just to use mmio and save any futzing around rewriting the actual code.

      • liuliu 3 years ago

        Yeah, I thought Data(contentsOf:) already do that, but it appears not (tested, indeed allocated all the memory to load the data). Adding `mapIfSafe` in the reading options solved this.

  • lll-o-lll 3 years ago

    I have an iPhone 11 and it works for me. Had to keep the app in focus until all the downloading completed.

  • gorbypark 3 years ago

    I can also confirm it works (for me) on iPhone 11 Pro.

alex_suzuki 3 years ago

Kudos. I hope liuliu has a good deal on egress traffic... :-)

2Gkashmiri 3 years ago

Uh..... Android is there? I mean android version?

Kye 3 years ago

I'm downloading it now. This should be fun.

annoyingnoob 3 years ago

Makes a nice hand warmer on a cool evening.

cph123 3 years ago

Works great on my iPhone 14 Pro Max

hamilyon2 3 years ago

This will sell Iphone to me.

wwarner 3 years ago

works really well thank you!

tluyben2 3 years ago

Brilliant!

secretsatan 3 years ago

Preserve your privacy, send us your phone number!

happyopossum 3 years ago

Kinda ironic that this links to a site that is almost unreadable on an iPhone (14 pro max if you’re wondering).

  • 4mitkumar 3 years ago

    I frequently see such comments about sites not loading fine on iPhone 13/14, while they continue to load just fine on for me on a 4 year old Android device (not pixel / Samsung).

    I wonder if it's the hardware or just the blockers that i use. Might be worth trying using blockers to see if it makes the general browsing experience better on Apple devices

  • Tagbert 3 years ago

    The site loads fine but the font size is a little small for mobile. It reads fine if you rotate to landscape, though.

sosodev 3 years ago

This is amazing. I'm kind of surprised that it doesn't have an NSFW image blocker. I want to be able to generate NSFW images but it probably should have one enabled by default.

mistersquid 3 years ago

Update: Draw Things uses “One-time photo selection” which according to Settings > Privacy & Security > Photos “Even if your photos were recently shown to you to select from, the app did not have access to your photo library.” Still, I didn’t realize apps could save to Photos without explicitly asking permission.

I don’t recall giving “Draw Things” permissions to access my photo library, yet the app is able to save to my photo library without prompting and able to read existing images.

I may have misunderstood what permissions apps should ask for when saving to the photo library.

  • liuliu 3 years ago

    I use PHPickerViewController: https://developer.apple.com/documentation/photokit/phpickerv... which runs out of the process such that when you want to select a photo into the app, I have no access to any information about your other photos and the location info is erased from what PHPickerViewController passed to me.

    When save the photo, I only use UIImageWriteToSavedPhotosAlbum (https://developer.apple.com/documentation/uikit/1619125-uiim...) which asks you permission to write to the album, not read permission (they are separate). There are more things I can do if I have read permissions (like create a "Draw Things" collections and save to that, rather than save to generic Camera Roll). Ultimately decided to not do that because I don't want more permissions than I minimally absolutely need.

ir77 3 years ago

Meh… 2gb download only to try “image of unicorn pooping” to get the lamest results ever. If I can’t amuse my 5 yr old with this AI it’s nonsense.

It transcribed as “image of unicorn poo ping” in tags :(

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection