DALL-E Mini – Generate images from a text prompt

app.baseten.co

52 points by tuhins 4 years ago · 24 comments (23 loaded)

Reader

Wow, this author is very dishonest as it does not mention any of the people who created this project in the first place. I was one of the people who worked in this project.

This was spearheaded by Boris Dayma, now at Weights and Biases.

This is an Open Source project with all code and methods in public.

See either GitHub (https://github.com/borisdayma/dalle-mini) or the hosted space in Hugging Face Hub (https://huggingface.co/spaces/dalle-mini/dalle-mini) or the project report (https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini-G...).

This project was also covered in the NYT article on Dalle2 by Cade Metz.

The author gives no credits at all. That is apalling.

(Also, the one hosted in the HF Hub gives you better results)

I just realized that this person is either using our model (some point in the past) and not giving us due credit, or they trained a new model and the name just happens to match.

In the latter case, please ignore my rant and use my links as a reference to another project than the claim that this prpject is our project.

smcleod 4 years ago

This one seems really poor compared to the other minis I've tried. Mostly unrecognisable, blurred shapes

roseway4 4 years ago

It’s likely the site is using the smaller “toy” model configured as default with the DALL-E mini code base. The larger “mega” model, used by the official demo, is far superior but requires significant GPU memory.
Unfortunately, despite the model authors adding a significant number of GPUs to the official demo, it has been hugged to death following recent Guardian, NYT, and other coverage.
nsonha 4 years ago

which ones have you tried?
- blarneystone 4 years ago
  
  I've seen this link passed around a lot (usually it reaches traffic limits) https://huggingface.co/spaces/dalle-mini/dalle-mini
  - smcleod 4 years ago
    
    Yeah this one seemed to do a better job

wbraun 4 years ago

Are there different variants of DALL-E Mini? Running prompts through both this version and the one hosted on huggingface gives noticeably different results. The one on huggingface seems to give more accurate responses.

longtimelistnr 4 years ago

Yes there is a few different model variations is what I’ve heard

masswerk 4 years ago

Interesting results: I tried "a train entering a station" and "a train in the countryside". Both images showed a track with rails and some kind of distortion (somewhat reminiscent of speed, more so the first one), but no train, omitting the subject in favour of circumstances.

So, a touch of Rain, Speed and Steam?

So I tried "a train speeding in rain" and got a somewhat car-like out of the window view on a rainy landscape, with a hint of rails somewhat mangled into what looked more like a road for automobiles to me. — However, no Turner… ;-)

scottlawson 4 years ago

I tried

a green bowl a green bowl with an apple a green bowl with an apple inside a banana in a bowl

the only one that seemed correct was "a green bowl", all of the others were very different.

jerpint 4 years ago

How is this different from dall-e mini on huggingface?

clowd 4 years ago

That one errors out with "Too much traffic" and this one doesn't.
- acherion 4 years ago
  
  It may error out heaps of times, but requests that do make it through actually seem to come back with images that have considered more than just one or two words from the request.
  This one tends to come back with blobby images that don't seem to take in at least half the words in the query (and yes, I'm only using 3-4 words, just like the example).
  Out of the handful of DALL-E clones I've seen so far, this is by far the worst performing wrt results returned I've come across.

userbinator 4 years ago

The results are amusing but not particularly accurate; "cat" resulted in a recognisable but distorted cat, "dog" produced a barely recognisable nightmarish blob of fur and eyes, and "pig" output something with nothing more than the general texture of a pig.

ncr100 4 years ago

Check out the horror show that is "carrot top comedian".

For out of four queries resulted in synthetic portraits that are terrifically scary.

athorax 4 years ago

Mostly just getting unrecognizable blobs

filoleg 4 years ago

Which prompts have you tried? I have no idea which prompts to even input to get unrecognizable blobs intentionally.
Spent all evening yesterday having fun as me and my friends tried all sorts of inputs, including pretty specific/obscure ones (but we also did plenty of rather vague and generic inputs as well). Not even once we got unrecognizable blobs. Sometimes we got results that were more on the van gogh side rather than the realism side, but not even once we got something that was unrecognizable.
Even without knowing the prompt, you could tell "oh this looks like a human dressed in a suit with a weird body shape, as he is standing bent over, with a giant person in a dark dress from behind a desk across, everyone is from the waist up, all looking like a very stylized drawing", despite the prompt itself being something like "courtroom sketch of a man getting sued".
To anyone who wants to see something I personally found extremely interesting, try a prompt like "Google streetview of XYZ", with XYZ being something absolutely difficult to even imagine visually or vague, like "Godzilla" or "statue".
- inetsee 4 years ago
  
  I tried "a boy standing in front of a house", then "a girl standing in front of a house". Both results would make good surrealist paintings, but they weren't even close to what I would expect from the regular DALL-E images I've seen.
  - filoleg 4 years ago
    
    Oh, if you were expecting something like the images produced by the original DALL-E, then I totally agree with you, DALL-E Mini doesn't get even close to that level.
    My guess is that it is be due to the combination of it being a "DALL-E Lite" model overall and (most significantly) only running for ~150 iterations max (using the website linked in the OP). ~150 iterations is way too few iterations to get something approaching the type of images you've seen from the original DALL-E
- athorax 4 years ago
  
  One I tried was "raccoon riding bear" and it was just a horrific amalgamation of faces and fur
- konart 4 years ago
  
  "Tom and Jerry playing Contra"
emmelaich 4 years ago

I tried "samuel pepys on acid" which gave me a blurry Samuel Pepys portrait.
Then "samuel pepys on a horse" which gave me a disturbing ghost on a deformed horse.

Settings

DALL-E Mini – Generate images from a text prompt

Keyboard Shortcuts