RedditStorage

371 points by newtonapple 11 years ago · 174 comments

Reader

If you had a language model (say, trained on existing comments from Reddit), you could encode the data in the comments in English, and make the abuse a little more subtle.

guidopallemans 11 years ago

.. Much like how gfycat encodes their links
eg.:
- https://gfycat.com/JaggedIdealFrillneckedlizard
- https://gfycat.com/ThirstyAmbitiousBuzzard
- https://gfycat.com/AlertSpicyBlueandgoldmackaw
- gabemart 11 years ago
  
  This is also how hipku stores ip addresses as haiku
  demo - http://hipku.gabrielmartin.net
  explanation - http://gabrielmartin.net/projects/hipku/
  - malkia 11 years ago
    
    No wonder why this is everywhere:
    The hungry white ape aches in the ancient canyon. Autumn colors crunch.
    https://www.google.com/search?q=%22%22The+hungry+white+ape+a...
    
    catmanjan 11 years ago
    
    http://hipku.gabrielmartin.net/#127.0.0.1
    Confirmed for localhost
    
    titaniumdecoy 11 years ago
    
    According to the overview on the of hipku website, it uses only monosyllabic words, unlike this haiku.
    
    gabemart 11 years ago
    
    That's not entirely true - it only uses monosyllabic words for IPv6 addresses because there's no other way to fit enough bits into the right number of syllables.
    For IPv4 addresses, there's loads of space, so I can afford to use some longer words.
- archagon 11 years ago
  
  I wonder if there are any libraries that can do this? I was thinking of writing a password generator web-app that creates full diceware sentences (TheBlubberyPythonFloatedDownThePurpleFunicular == lots of entropy and easy to remember), but I'd need a decent language model for that. (And I don't feel motivated enough to write my own.)
  - dragontamer 11 years ago
    
    Why not "Article Adjective Adjective Noun Adverb Verb Article Adjective Adjective Noun"?
    If you're making full sentences anyway, the grammar of the sentence doesn't need to change much. The vast majority of the entropy is already in the words themselves.
    Example sentence (generated by me, not a RNG): "the tiny hairy fish quickly paints a big scary monster".
    EDIT: With 10 words, each from the 252 most common words... sentences of this type would have an entropy of more than 10^24 or 2^80. I guess "articles" are pretty much "The" vs "A / An" however, so there really are only 8 words of note...
    
    gabemart 11 years ago
    
    Sorry to double comment, but this is exactly how my hobby project hipku works
    http://hipku.gabrielmartin.net
    
    archagon 11 years ago
    
    Well, one, I'd love a more general solution where I could just say "generate a sentence with n bits of entropy" and my algorithm would spin out a sentence of the correct (arbitrary) length. (Hmm... Markov chains?) Or maybe add other mnemonic modifications, like rhymes. And two, I still need an algorithm to conjugate verbs and whatnot, though I suppose that part could just be left to the user. (You get n diceware words — make your own sentence out of them.) But that's boring!
    In regards to word commonality, I'm pretty sure you could in fact use something like the 5000 most common words. The people who care about this kind of stuff tend to have large vocabularies!
    
    daveid 11 years ago
    
    I think Markov chains would be a bad idea for the use case of passwords because some words always follow certain words.
    
    archagon 11 years ago
    
    Ah, true — I was thinking more along the lines of "part of speech" Markov chains, if that's even possible. (As in, just an endless stream of "article noun verb adjective noun adverb conjunction adjective noun verb adjective noun conjunction adjective etc." that could then be mad-libbed by diceware.)
    
    delluminatus 11 years ago
    
    It is possible (and, I think, a rather clever idea).
    You could, for example, use a part-of-speech tagged corpus (a large collection of text where each word was tagged with its PoS by a grad student). Just train a Markov model on the parts of speech instead of the words themselves, and you would be able to generate English-like mad-libs.
    
    TazeTSchnitzel 11 years ago
    
    To make it less obvious, have a few other kinds of sentence. But make those kinds clearly differentiated by length.
- TTPrograms 11 years ago
  
  It would be cool if they biased the name generation with descriptions using some of the recent work in high quality machine image tagging.
  Throw in some ambiguous adjectives and you should have a large enough namespace that matches up with common image contents.
hhm 11 years ago

See my other comment on this post for an example of this (encoding data using Markov chains)
goatslacker 11 years ago

Reminds me of http://www.spammimic.com/encode.shtml
bikeshack 11 years ago

Wikipedia is full of hidden messages. A common pattern I have observed is the first letter of a sentence being used to string together a message. You can read more about this tactic here https://uncyclopedia.wikia.com/wiki/Subliminal_Messages
- bikeshack 11 years ago
  
  Also in 2010 there was a file storage system based on Bitly: https://nealpoole.com/blog/2010/12/bit-ly-file-storage-cleve... Super clever idea, except for the part where Bitly now sprinkles links with affiliate codes to make $$$

rndn 11 years ago

There should be a contest: Who can find the most implausible data storage medium? (Rated according to various criteria such as ingenuity, reliability, max. data read/write rates, latency, storage size, costs…)

elwell 11 years ago

Convert data to binary. Use Amazon Mechanical Turk API to create tasks for people to remember the index of each bit (the value of the task would be $0.01 for binary 0 and $0.02 for binary 1). And, for reading memory, a new task to input the index they remembered and the value they were paid.
- pimlottc 11 years ago
  
  You'd have to factor in a ton of redundancy to account for the human bits who just got bored and wandered off.
  Anyway, people would probably just start saving the bits on this computers after first job or two. Which would be an amusing result for being just a convoluted interface to a remote hard drive, but it's conceptually less interesting then actually using distributed human memory as a digital storage medium...
  - xanderjanz 11 years ago
    
    you could structure it such that the longer they sit there remembering the data, the more they get paid. When they want to leave, they enter what they remember and get paid.
- oconnore 11 years ago
  
  http://bash.org/?98
fatratchet 11 years ago

To get reliable and free storage, photo hosting is usually the easiest way. Flickr offers 1TB, picasa/g+ offers unlimited storage with some hidden quoats. Everything that allows lossless photos lets you store arbitrary data. Depending on how careful you wanna be you can store hundreds of GBs per account.
Email attachments used to be a great way a while ago but nowadays using multiple gdrive/dropbox/onedrive accounts is much easier.
They are easy to create in large numbers (especially if your ISP has dynamic IPS) and as long as you're even a little bit careful, nearly impossible to ban. Add some redundancy across different services to that and a $2 VPS that gives you tons of upload bandwidth and you've got yourself as many TBs of free,fast and reliable online storage as you want.
I spent so much time as a teenager with no money and some python skills coding storage solutions like that. I'd say it was to store movies and tv shows for myself but in retrospect I mostly did it because it was so much fun to develop.
- userbinator 11 years ago
  
  Video hosting (i.e. YouTube) is another potential repository for massive amounts of data.
  Combine that with the fact that data which is encrypted looks practically like static, and you could potentially overlay it on top of an existing video of something mundane.
  You'd need to use strong ECC to get past the lossy encoding, but as things like QR codes show, that is not so hard.
  The audio channel is also usable...
- r-w 11 years ago
  
  Trying to get the greatest entropy possible through arbitrary-strength JPEG compression would be an interesting problem to solve.
- darkstar999 11 years ago
  
  The new Google Photos storage is lossy unless you pay for it. That doesn't rule out using the images in a different way though.
  - huckyaus 11 years ago
    
    I thought it was only lossy if the originals you uploaded were >16MP. I tested uploading some <16MP images and redownloading them, and they didn't seem to have undergone any lossy conversion.
    
    elinchrome 11 years ago
    
    Did you compare a hash of the file, or did they just look the same?
- conductr 11 years ago
  
  I've done similar with images for the fun of it. The simplest solution that I recall finding was to base64 the file/data, then turn to hex, then use those hex data to create pixels in RGB. I would line them up top-left to bottom-right.
  Probably not the most efficient but easy and fast and the resulting images would look... interesting. For large files, the decoding would be difficult mostly just due to reading the image of so many pixels into memory. So, that's when I began fixing the image size to a smaller size and having multiple images that I would later convert to 60fps video. I could then use ffmpeg to convert images to frames and frames back to images.
  I had no practical use for this but, was a fun project on a rainy afternoon.
- namwen 11 years ago
  
  Yeah, I wrote something that stores data to Flickr last summer: https://github.com/namwen/hoardr . I kind of had a reason but it was more for the enjoyment of getting it to work.
  - adrian_blx 11 years ago
    
    There is also hyperglobalmegastore https://github.com/adrian-bl/hyperglobalmegastore All data is encrypted and you can even mount your flickr 'drive' using fuse.
    
    vivab0rg 11 years ago
    
    This project needs more stars!
  - rogeryu 11 years ago
    
    Until this gets so popular that Google or Flickr start to analyse photos, and come to the conclusion to either delete those photos and videos, or to convert them and destroying the data for you. Then, years later, you need your backup and ....
- mayli 11 years ago
  
  I did the same thing for google photos, but just for test purpose. https://photos.google.com/album/AF1QipOjZrywipm-SSH9jVNsKVF5...
- rakoo 11 years ago
  
  The front-end is already there : https://tahoe-lafs.org/trac/tahoe-lafs. Backends are currently being developed (https://github.com/mk-fg/tahoe-lafs-public-clouds), and there will even be a public offering from the very same guys (https://leastauthority.com/)
empyrical 11 years ago

My personal favourite:
https://github.com/philipl/pifs
- Hortinstein 11 years ago
  
  its like the dust theory in Permutation City...
  ", he became convinced of something he came to call the Dust Theory, which holds that there is no difference, even in principle, between physics and mathematics, and that all mathematically possible structures exist, among them our physics and therefore our spacetime. These structures are being computed, in the manner of a program on a universal Turing machine, using something Durham refers to as "dust" which is a generic, vague term describing anything which can be interpreted to represent information; and therefore, that the only thing that matters is that a mathematical structure be self-consistent and, as such, computable. As long as a mathematical structure is possibly computable, then it is being computed on some dust, though it does not matter what dust actually is, only that there be a possible interpretation where such a computation is taking place somehow. The dust theory implies, as such, that all possible universes exist and are equally real, emerging spontaneously from their own mathematical self-consistency."
  Great book!
  - ngoldbaum 11 years ago
    
    Also massive spoilers for the end of Contact. The novel, not the movie.
- tacone 11 years ago
  
  That just segfaulted my brain. Everything we may ever write in the future is already there, you just need the address.
  - zedadex 11 years ago
    
    > Copyright infringement? It's just a few digits of π! They were always there!
    You really have to admire that creativity
  - Lawtonfogle 11 years ago
    
    While I'm not sure every number is in pi (see my other comment to grandparent), there is a similar really weird feeling I get when I consider all digital data is really just numbers. That means there is a number, that when turned into a .avi (or format of your choice), shows anything you can imagine. Imagine yourself talking with Plato. There is a number that produces a 1080p video of you doing just that. Actually, there are a lot of numbers that do that, as every little difference in the setting would be a different number.
    There is a number that produces a high def photo of when you married your high school sweetheart, even if you never actually married her. There is one of you being awarded the Nobel prize. If there is a proof that P = NP, or that it doesn't, or even a proof that it can't be proven either way, then there is a number that would be the PDF version of that document.
  - jordigh 11 years ago
    
    The problem is that the address is typically larger than the actual data you want to store.
    
    mafuyu 11 years ago
    
    Luckily, I know of a scheme to compress the address 100%! ;)
    
    tacone 11 years ago
    
    Oh no! I sketched up a script to gzip the chunks, hashsum them, and then find out how many collisions there are before the real occurrence starting from an approximate address in the PI digits chain, so that I could have: ($address*1e12)$hash$collisioncount
    The resulting string is 10% of size of the gzipped string, at the expense of CPU. But when I read you achieved 100% compression I just deleted the script and got out to get a beer. :-(((
- bryogenic 11 years ago
  
  Up next, PiCoin: proof of work is finding the index of the goal data in pi.
- Lawtonfogle 11 years ago
  
  Is this actually proven? Pi is irrational, but is it proven to be random (or normal)?
  http://www.askamathematician.com/2009/11/since-pi-is-infinit...
  Also, assuming that it is, if 'start as position X and read Y bits from pi' produced an illegal image (top secret document, abuse images, etc), what would be the legality of trading such information?
dbarlett 11 years ago

Packet juggling http://lcamtuf.coredump.cx/juggling_with_packets.txt
- andrewstuart2 11 years ago
  
  Packet juggling over RFC1149-compliant networks.
  https://www.ietf.org/rfc/rfc1149.txt
  - daveloyall 11 years ago
    
    When you say that, I picture this.
    https://duckduckgo.com/?q=starlings+swarm&iax=1&ia=images
hhm 11 years ago

Steganography is always interesting for data storage. It is pretty easy to hide data into pretty much any medium.
See http://jthuraisamy.github.io/markovTextStego.js/ and https://github.com/hmoraldo/markovTextStego
- chrissnell 11 years ago
  
  Combining steganography with Reddit could be interesting. Random (mildly interesting) photos pushed to imgur and posted to /r/pics by the same user every time.
Zikes 11 years ago

A stenographed image embedded in a Word document, printed and faxed to a document archive that scans and digitizes it, embeds the scan in a PDF, and emails it back to you.
- iblaine 11 years ago
  
  Pretty sure this happens in Washington DC when bills need to be reviewed by various departments.
  - Lawtonfogle 11 years ago
    
    The original image is a picture of a worker's monitor displaying some error message that IT asked for.
    I'm not joking either.
- Cacti 11 years ago
  
  During this process you will lose data. A lot of data.
- baddox 11 years ago
  
  Can you make that fully automated from the end user's perspective?
- prawn 11 years ago
  
  Hey, client of mine, you need to pay my invoice. Also, that photo you sent me won't open.
notacoward 11 years ago

Erasure-coded comments distributed across the huge number of abandoned Wordpress blogs and phpBB forums that are out there. Plenty of storage, pretty readily accessible, low probability that even one fragment will get deleted, and even if one does that's what the erasure coding is for.
EDIT: also, Wikipedia never deletes anything. Even if your "edits" get reverted, you can still find them via the history page. Hmmm.
- Hello71 11 years ago
  
  no, deleted media is gone forever IIRC.
  deleted pages are not visible to people with less than sysop rights (on enwp), and multiple methods are always available to deal with troublesome people, ranging from revision deletion to blocks and eventually ISP contact.
- joliv 11 years ago
  
  Wikipedia is a bit more vigilant with banning than abandoned blogs are :)
  - cmdrfred 11 years ago
    
    A single user storing a reasonable amount of data though might get away with it... I know what I'm doing this weekend.
    
    PostOnce 11 years ago
    
    Abusing one of the most important, non-profit resources on the internet?
    Just because we can doesn't mean we ought to.
vidarh 11 years ago

Usenet messages and mail systems are both good old ideas (I don't know of any actual implementation, but it's certainly been discussed at least back to the early 90's).
For Usenet you could depend on widespread resilien distribution + reasonably long retention periods for a lot of groups (but risked having messages killed by admins if too obvious spam).
For e-mail, anything reflecting your e-mail back can be used to juggle data: Send messages with attachment, refuse to accept the inbound reflected messages for a couple of days to let the other party store the data for you while they retry, then accept the message and instantly send it back out again.
Then there's the old Linus Torvalds quote:
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it."
- fatratchet 11 years ago
  
  Usenet is perfect for that since binary newsgroups for piracy have gotten really popular over the last few years. You can basically use it as a reasonably reliable key-value store that lets you store 300kb to 1mb blobs. Add some encryption and parity and you've got yourself nearly unlimited storage, even for free if you use trial accounts from certain providers.
- 0x0 11 years ago
  
  Yeah, I remember reading about the e-mail reflection idea in the book "Silence on the wire", authored by "lcamtuf", the guy who's more recently known for writing afl-fuzz.
alfg 11 years ago

Something not too far off that I made a couple of years ago for fun. Stores small snippets of data in the URL.
https://github.com/alfg/jot with demo.
- rcthompson 11 years ago
  
  So, with this plus a URL-shortener as a frontend, you're essentially using the URL shortening service as the data storage.
  - alfg 11 years ago
    
    Ha, right! Especially since the URLs can get very lengthy depending on the message.
- _lce0 11 years ago
  
  genius!! ready for t.co and bit.ly
  perfectly for small pieces of immutable data!
SilasX 11 years ago

How about just a project that implements an S3-style directory system, with a "fill in the blank" for you to implement the storage backend?
That is, for a given storage medium, all you have to do is implement methods for "write key-value pair" and "read value at key", and you get to piggyback off that medium for your storage.
ryan-c 11 years ago

Here's my entry.
https://github.com/ryancdotorg/dnsstore
- mmahemoff 11 years ago
  
  A similar service is here: http://www.cambus.net/interesting-dns-hacks/
  Interesting about DNS stores is they save a round trip, so it's not just a weird abuse of the protocol to store content, it's also potentially a performance optimisation.
  - silverwind 11 years ago
    
    And almost every provider around the world provides a free 'CDN'. It's win-win!
    
    mmahemoff 11 years ago
    
    Half-CDN
    Half-DNS
    I call it the CDNS :)
    
    jackgavigan 11 years ago
    
    nslookup -type=TXT jackgavigan.com
    ;-)
- an_account_name 11 years ago
  
  Licensed under the WTFPL! One of my all-time favorites.
antihero 11 years ago

Connect ethernet cables in a loop, keep sending data back and forth "around" the loop. Data is stored in cables.
I think this was from an old BOFH.
- voltagex_ 11 years ago
  
  http://webcache.googleusercontent.com/search?q=cache:786NsZY... and https://github.com/yarrick/pingfs
  OT: Let archive.org save your pages, people!
- nosuchthing 11 years ago
  
  Apollo 11 used Rope Memory on its voyage to the moon. [1] [2]
  [1] http://news.bbc.co.uk/2/hi/technology/8148730.stm
  [2] http://en.wikipedia.org/wiki/Core_rope_memory
- joezydeco 11 years ago
  
  Mercury delay lines always fascinated me:
  http://en.wikipedia.org/wiki/Delay_line_memory#Mercury_delay...
rrrrob 11 years ago

I'd love to exploit ad networks user profiles for this. I.e., store some bits as "interests", by running a few appropriate google searches or hitting a few web sites, read the bits by seeing what ads you're served. This would probably require a bit of learning and a redundant encoding to make it work, but...
haylem 11 years ago

I had a few in mind when I was back in uni and hosting and cloud storage prices were still up.
I hadn't thought of reddit, as the abuse would be clearly visible, but I had used back then that Gmail Drive some guy had implemented using emails for storage, and it led me to think a lot of the Google Systems had non-obvious "unlimited" storage options.
For instance, I don't know if that's still the case, but Google Calendar surely seemed pretty fit for abuse: while calendar entries were limited in size, you could have as many as you wanted. And calendars can be private, so it's even better.
The problem with such systems will be the integrity of your data, when you start being forced to chunk things up. If they change one thing under your feet, you're a bit screwed. Also you have to detect all the undocumented pitfalls (e.g. forbidden characters in an edit field).
mmahemoff 11 years ago

Furl - Storing data in URL shorteners and aptly refers to itself as "parasitic storage". Some precursors referenced on its homepage.
https://code.google.com/p/furl/
leni536 11 years ago

I like this one: https://github.com/yarrick/pingfs
baddox 11 years ago

The Bitcoin blockchain works fine, but is fairly implausible for interesting amounts of data.
Mithaldu 11 years ago

> most implausible data storage medium
That still works well!
It's easy to make something bizarre and unusable. Have it bizarre and surprisingly usable. :D
- Zikes 11 years ago
  
  Darn, that rules mine out, then.
  Fax is about as unusable as you can get...

Vexs 11 years ago

Well there's some pretty amusing abuse. I recall there was a botnet a while back that got it's commands from a subreddit as well. Quite brilliant actually- who would suspect reddit as a command server?

dragontamer 11 years ago

> who would suspect reddit as a command server
Everyone who used IRC as a command server from years past. It turns out that things useful for human communication tend to be useful for computer communication.
Usenet, Email... hell... I'm sure BBS would have been used if modems were popular enough back in the day.
- cmdrfred 11 years ago
  
  Every time I see an api for sending and receiving any type of file or text, I think botnet/building a secret chat system on top of it.
  - SilasX 11 years ago
    
    Every time I see a service offering some resource as "unlimited", I think of using it as a free backend.
    
    cmdrfred 11 years ago
    
    me too, I have about 3tb of stuff from the Napster days and I've always wanted to upload it all somewhere so I can stream it on my phone.
    
    _lce0 11 years ago
    
    google photos!!
    encode your info as bits in the image ;)
  - stephengillie 11 years ago
    
    Can we encode C&C commands into a blockchain?
    
    sam_bwut 11 years ago
    
    yes - its been done several times.
  - pavel_lishin 11 years ago
    
    I'm toying with the idea of building a client for Hipchat that would allow people to use encryption. Sorry, boss, the "offtopic-no-suits" room means what it says.
- hippich 11 years ago
  
  actually, i personally witnessed C&C based on BBS :)
  - dragontamer 11 years ago
    
    Sounds like a blogpost you should share with everyone :-)
  - nissehulth 11 years ago
    
    Fidonet, in some obscure distributed echomail area? :)
    
    hippich 11 years ago
    
    No, actual BBS with software running, processing uploaded files, and software running on machines calling in at night. It was proof of concept and not malicious, but rather fun exercise :) It was too long time ago, the only thing I remember now is that BBS software was ProBoard, and bot was spread via demo .exe file using fidonet echo :)
mtw 11 years ago

"its"
- r-w 11 years ago
  
  ye's
  - yellowapple 11 years ago
    
    Ye's what?

gkop 11 years ago

If this idea appeals to you, you may also be interested in the 2009 paper Graffiti Networks: A Subversive, Internet-Scale File Sharing Model [0] by Andrew Pavlo.

tl;dr: the researchers discovered that MediaWiki instances were good soft targets.

[0] https://www.cs.cmu.edu/~pavlo/static/slides/graffiti-dc401-o...

zedadex 11 years ago

The mini-saga embedded in the presentation was pretty funny
> Concluding Remarks >  Off probation at the end of this semester!
- zatkin 11 years ago
  
  I got put on probation for redirecting my ~/.bash_history to /dev/null and removing my `finger` information with `chfn`. Universities can be pretty ridiculous with their disciplinary actions.
  - pavel_lishin 11 years ago
    
    Why... why would redirecting your .bash_history to /dev/null be a punishable offense? I assume it's so they could check for evildoing on your part, but that seems like a ridiculously idiotic way of doing it.
chillingeffect 11 years ago

Ah so that's who those weird, mostly spammers are.... Wow.... They used me for data storage, the after school special.

Goronmon 11 years ago

An expected reaction from the reddit admins...

http://www.reddit.com/r/programming/comments/38kn2g/redditst...

jamesjwang 11 years ago

One of the co-creators here; as a disclaimer, we didn't mean to threaten to break reddit at all. We're amazed that someone even found this repo since we abandoned it back in January, and that it's even gotten any amount of attention. Honestly we just built this in a week over winter break cause we were bored

jakejake 11 years ago

This is pretty much exactly how binary newsgroups got started. Not to be all "I thought of it first" but I had thought it would be funny to do something similar on Twitter.

joshstrange 11 years ago

>This is pretty much exactly how binary newsgroups got started.
Yeah minus the encryption (well that's not 100% true as you could post encrypted files and people do but it's less of a part of the "protocol" than it is in this example). The beauty of newsgroups is they are replicated to other NNTP servers. Distributed file stores fascinate me (I know this reddit protocol is not distributed or rather it's wholly owned by 1 entity even if the data is distributed across datacenters) and I'm very excited to see where things like IPFS [0], freenet [1], internet2 [2], etc turn out.
[0] http://ipfs.io/
[1] https://freenetproject.org/
[2] http://p2p.internet2.edu/
yaeger 11 years ago

Woah, that'd be a lot of tweets. Even reddit with its 10000 char limit per comment has loads of comment trains if you want to store a sizable amount of info that way. With twitters 140 char limit, that would be a huge amount of tweets. You'd probably run the risk of being identified as a spammer if you send that many tweets at once...
- Grue3 11 years ago
  
  140 unicode characters. Which actually gives you quite a lot of space to work with.

exacube 11 years ago

I like the proof of concept, but I hate that anyone would abuse Reddit this way.

supercoder 11 years ago

More insightful than most of the comments on there.

diminish 11 years ago

Can anyone do a rough cryptoanalysis of the code? It uses AES block cipher in CBC mode with a random iv. Which attacks is this open to?

First, I suspect it's lacking a secure integrity check (MAC), so is weak against chosen ciphertext attacks.

    def encrypt(self, plaintext):
        plaintext = self.pad(plaintext)
        iv = Random.new().read(AES.block_size)
        cipher = AES.new(self.key, AES.MODE_CBC, iv)
        return iv + cipher.encrypt(plaintext)

I'm also not sure about his padding of zeros to attain the AES block size - was there a more secure padding?

    def pad(self, s):
        return s + b"\0" * (AES.block_size - len(s) % AES.block_size)

jedberg 11 years ago

Wouldn't it be funny if reddit just randomly edited the comments to break the encoding...

aquilaFiera 11 years ago

This sounds like a /u/jedberg type of thing to do.
- jedberg 11 years ago
  
  I'd only do it to people I know after backing up the original. I wouldn't want someone to actually lose their files.
  - aquilaFiera 11 years ago
    
    One could argue that that's their fault for giving /u/rram "root" access to their "database."

Someone1234 11 years ago

Shame an encryption key is REQUIRED, could be a useful way to transfer files between Reddit users. Of course the file has to be encoded, but the encryption should be an optional extra.

tschuy 11 years ago

You could always share the password, or even redistribute a modified version of the program with a hardcoded password.
- jamesjwang 11 years ago
  
  That was the idea; one of our original goals was to make a system to quickly share small files over reddit. The issue is you have to store the password for each file somewhere
LeoPanthera 11 years ago

Binary files over a 7-bit medium is a very old, long-solved problem.
For example, here's a base64'd tiny jpeg of me: http://pastebin.com/VTLBG3Ji

Freaky 11 years ago

Don't use this for anything important, and certainly not with a non-unique password.

Key is derived from a single SHA256 (can be brute-forced very rapidly), cyphertext isn't authenticated (can be tampered with or corrupted without anything noticing), and the padding function is broken (strips trailing NULLs, so no good for binary files).

stephengillie 11 years ago

Interesting idea... Since image formats already store a huge BLOB, how much more would it take to make ImgurStorage?

(Ideally, it would be slightly more elegant than just renaming a zip file.)

mdadm 11 years ago

This isn't an area that I'm particularly strong in, but I think that the way that imgur compresses images[0] might have a noticeable effect on this.
[0] https://help.imgur.com/hc/en-us/articles/201424706-How-does-...
- dexterdog 11 years ago
  
  I've run a few photo sites and one of the things I do on all wild incoming JPGs is do a minor compression on them and if that saves more than about 30% on the file size I just use the compressed version. Then anything that's been camouflaged in there gets dropped.

empyrical 11 years ago

Pretty clever. If it was stored in reddit's wiki system instead of comments, it could have a revision history!

s_dev 11 years ago

I think this will break your ToS with reddit and result in a ban on the account. That said, I don't know. It's kinda cool though.

pstuart 11 years ago

Only in the "hacking the system part". Otherwise it's an abuse of a service. There's plenty of cheap data hosting elsewhere on the net.

kej 11 years ago

Presumably something like this is what's happening in /r/A858DE45F56D9BC9/

mdadm 11 years ago

Possibly. Searching up some of the content on there revealed this[0], so at least some of it is (most likely) data.
[0] http://a858.soulsphere.org/
Edit:[1] shows that this is most likely a false-positive.
[1] https://www.reddit.com/r/Solving_A858/comments/24vml1/mime_t...

deelowe 11 years ago

Welp. This won't last very long. :-)

math0ne 11 years ago

I've been preaching the similarities of reddit to newsgroups and IRC forever so this seems like a natural evolution to me. Probably fairly easy for reddit to shut down though unfortunately.

Now if ISP's would start offering their own cached usable versions of reddit we would be getting somewhere :)

yellowapple 11 years ago

And that somewhere would be Usenet 2.0.

SyncOnGreen 11 years ago

I had the same idea few months ago, I've even coded simple POC in Java which mapped submissions in subreddit to files. You could use FUSE to create virtual device and map files in mounted folder to comments. For Java I was using fuse-jna - there should be binding for Python.

empyrical 11 years ago

Someone made a reddit FUSE filesystem (I don't know if it still works though)
https://github.com/ianpreston/redditfs

lucb1e 11 years ago

Lol, I've thought of doing this so many times on Facebook, Google+, Twitter and reddit. Seeing the amount of points this gets, I guess I should have done it. I didn't because it seemed so pointless: they'll just block accounts using this.

meesterdude 11 years ago

Somewhat related project i had going... https://github.com/meesterdude/reddit-rust-servers (http://ruru.name/reddit-rust-servers/ show/hide columns to see more options)

I used to run the rust servers sub. I would have people post JSON posts, which i would then spider and generate a JSON DB from, and created a UI (see the gh-pages branch) to grab the JSON and present a searchable/filterable way of finding servers that are relevant to you.

vbezhenar 11 years ago

I thought about creating an anonymous peer-to-peer network like BitMessage but over Twitter instead of over TCP/IP. The main benefit is that for the watching government hardware your traffic will flow to twitter, not to some suspicious computers. Of course if government can talk to Twitter, it might find out that activity, but not all governments can talk to Twitter.

Another improvement might be not to send base64 abracadabra, but instead send some readable texts (autogenerated or fragments from wikipedia) and encode message as a slight deviations (typos, etc) using steganography. But it would require a lot of messages to transmit enough data.

jamesjwang 11 years ago

yeah that'd probably speed things up significantly; we already ran into speed issues with PRAW in terms of how fast it can upload comments

KeytarHero 11 years ago

Perhaps something like this could explain http://www.reddit.com/r/A858DE45F56D9BC9

mtanski 11 years ago

You could randomly spread this over various subs, that and add erasure coding. This way if a chunk or two goes missing you can reconstruct the original blob.

nickpsecurity 11 years ago

A nice new example of what's called "parasitic storage." This kind should be easy enough to detect on Reddit's end: encrypted and binary data look very different from text. Further, if a site allows binary, it's different from crypto. The only type that's hard to filter is custom stego whose patterns look similar to normally accepted traffic. Extra true if it's a high volume site.

gprasanth 11 years ago

From 2010: https://nealpoole.com/blog/2010/12/bit-ly-file-storage-cleve...

yuhong 11 years ago

Reminds me of: https://twitter.com/manzoor_e/status/604072602114605056/phot...

biturd 11 years ago

how do you get a Mac OS X GUI around this if it is written in python? Can you do the same with perl, php, and other languages? Interface Builder has always been a stumbling block for me to even begin to learn Obj-C or Swift.

ssalenik 11 years ago

It says in the readme he uses wxPython (wxWidgets). You could also use Qt as I believe both use native Cocoa underneath. You can't do everything that is possible if you were coding on Obj-C or Swift, but the stuff you can do looks native. Both have bindings in many languages.
jamesjwang 11 years ago

We used wxpython, which uses native GUI elements based on the OS. There are bindings for other a bunch of other languages too: https://www.wxwidgets.org/

justintbassett 11 years ago

Please don't do this . . .

zedadex 11 years ago

I remember once briefly thinking how fun it'd be to do something like this, before realizing with the spam filters the way they are it'd probably be the last thing I ever did on the site.

Neat proof-of-concept though

digitalsushi 11 years ago

RedditStorage reminds me of a couple business models we tried out that tanked..

The first was a new business where we would go to trade shows, conventions, hell even fast food places, and just collect as many free beverages, condiments, napkins et cetera as possible. Then we'd sell them online.

The other one didn't do much better. We'd go to a Lowes Tool Rental, and just rent a bunch of tools and then re-rent them out of our truck in the parking lot. They had to have them back an hour before Lowes closed for the night.

Our current business model is, we go to bars and hit on people, and if we get their phone numbers, we add it to a subscription service where other people can have access to it.

Honestly, I feel we're no more in the wrong than RedditStorage is.. /s

tomphoolery 11 years ago

> RedditStorage uses an AES encryption algorithm which requires you to choose a password (e.g. "bunny")

Some people still don't know what a password is? =D

harel 11 years ago

What happens when someone uses this to pollute popular subreddits? People will get pissed off...

scrrr 11 years ago

I guess they will simply block the username and delete the comments.

vladtaltos 11 years ago

that is awesome in its complete disregard of reddit :) and a death sentence to itself if it gains popularity as reddit-admins will have to ban the accounts/discard the content :) so it's not that secure a storage idea...

nice little engineering work though. kudos.

mihau 11 years ago

Yeah, it can be done, but who the fuck needs that ?

spydum 11 years ago

does reddit not have some sort of posting throughput limit?

broodbucket 11 years ago

If they don't, they will soon.
thanatropism 11 years ago

For very low karma accounts, yes. After a while, not noticeably for human operators (I delete my accounts frequently; it gets annoying.)

Settings

RedditStorage

Keyboard Shortcuts