Morphis: Encrypted distributed datastore
morph.isSo Morphis is both a distributed data store and the ultimate solution to human suffering?
"MORPHiS is a global encrypted distributed datastore"
...
"It's about all of us, working together to create the world we want to live in, free from corruption, slavery, evil and manipulation; a vision many have thought impossible, until now."
I'd love to try out Morphis' anti-corruption and anti-manipulation features. Is there an API for these?
Seriously though, the project sounds interesting. It's cool to have a grandiose vision like this, but as the reader, I just don't see how it relates back to the project.
The end goal of my life's work of which MORPHiS is only the beginning is The World Brain. That is, humanity linked together as a unified conciousness. No matrix style needed; a keyboard is enough. An uncensorable high performance internet with an inherent web of trust acting as the synapses and our brains acting as the individual super neurons of that then neural net.
Read H.G. Well's essay by that name: https://sherlock.ischool.berkeley.edu/wells/world_brain.html
Wells is not talking about a centralized wikipedia. He is talking about a decentralized (even mentions Iceland as a safe harbour) and unified human conciousness.
Humanity can then determine its own destiny instead of having to use the stone-age system of paper ballots and inherenlty corrupt elected representatives who's representation has absolutely zero correlation with what the people they are claiming to represent actually want.
See the Princeton study that proved that: http://scholar.princeton.edu/sites/default/files/mgilens/fil...
For the World Brain to function, it needs a ultra high performance and non-kill-switch able distributed datastore to operate upon. Thus, MORPHiS is the foundational layer for that end. Of course such a layer also inherently is a better file sharing app than BitTorrent, a better communication platform than Email (See Dmail, already working great!), Etc., a better comment system than Disqus, a better Forum than HN. Expect the Disqus deprecating implementation out by the end of this month is my estimate.
Come join me on https://reddit.com/r/morphis
When the Disqus/HN deprecating layer is out later this month hopefully, the community can move onto MORPHiS itself as well :)
For those looking for a more practical decentralized storage solution that you can use to build apps with today, take a look at remoteStorage:
The client-side library acts as an abstraction layer for multiple storage backends, including DropBox, Google Drive, and remoteStorage servers, and handles client-side sync, caching and persistence for you.
The remoteStorage server has multiple implementations that anyone can host and is based on an open spec.
I don't think it handles encryption though, but it really should. Its mission is to allow users to "own their data", and that won't ever be true as long as storage providers can freely inspect and copy data belonging to their users. Hopefully that's something on the roadmap.
You can't host the World Brain on such a thing.
MORPHiS is already live. Very practical.
Check out Dmail to see a brand new invention, the likes of which you have been waiting for and have never seen before.
> multiple storage backends, including DropBox, Google Drive
seems like dbox, and gdrive are still work-in-progress, otherwise very interesting project...
Have you considered hosting this in a collaborative git platform such as GitHub? Are there any plans for an http "gateway" like ipfs has?
I am deprecating GitHub. It would be sort of a hypocrisy to host there.
You can join us on IRC: #morphis @ freenode! (this is first announcement of that :)
Come the Disqus feature, github becomes deprecated. I also envision adding GIT over SSH protocol to MORPHiS. (MORPHiS is already SSH protocol based.)
Dmail will be exposed through POP3 and later IMAP. It will also support IRC protocol. It will also support SFTP and RSYNC.
All that stuff takes is time to add higher level stuff that the underlying layer was archtected from the beginning to do (SEE DMAIL!)
If anyone wants to join me, I WOULD LOVE HELP TO MAKE THIS GO FASTER! Join US!
This is going to be awesome.
I appreciate your enthusiasm and agree that your stated goals are important. But if you are serious about this project having an impact, you may want to get some feedback and help with the voice of your presentation. You talk of saving the world, but you are mainly talking about "you" saving the world. It comes across as grandiose ... Please speak clearly about your project and its specifics and less about you if you want people to listen.
Very good point. I'm not sure I ever said 'I' in that context, I certainly don't mean so.
I see myself as just the campaign manager for the World Brain, and one of the first coders of this component of it. The World Brain, is you.
Others working on open hardware, hardware mesh net (wireless, Etc.), Etc., all of these people have already done just as much as I or more in making necessary components. So it certainly isn't about me. In this post-Snowden world, so many people know exactly what to do now, it is amazing really, so many people united and working together and knowing exactly what is needed!
Thanks for that input still! I will consider that in how I word stuff in the future. I am just /very/ excited about what I've already calculated as a certain path and inevitable future for humanity.
I found out late in my coding (only 2-3 months ago) that H.G. Wells certainly beat me in 1937 to the idea, no, realization, of a unified human consciousness :) (See his essay entitled World Brain.)
https://sherlock.ischool.berkeley.edu/wells/world_brain.html
Inspiring web page, but where does a user start? I can download, install, run, and then? You are missing some instructions about an interface.
Edit: Ok I looked harder. The RUNNING file has the critical protocol/port combination (http://localhost:4251) but this could also (or rather) be in the README, and on the web page.
Thank you for that feedback. You make a good point. I will reorganize the instructions to be more up front instead of buried as you discovered :)
I went and made the windows install a one click affair, but neglected to realize the Linux one, while likely easy to install, does not launch the browser like the windows batch file does and thus unless you read the RUNNING file then you wouldn't even know it had the web UI! Which is the whole point of running it :)
I will do as you suggest and move that into the README, and as well document the fact of the web UI and its port on the website. Thank you so much!
If this is a distributed datastore, what controls do you have about what it stores on your computer for others?
Data is broken up into 32k blocks. (That might change to 512k blocks as it gives it a 10x throughput improvement with less CPU / overhead.)
Random of those blocks, based on your nodes ID and the hash of each block's data, are stored on your computer. Before they are, your node encrypts them and throws away the key. All that is ever stored on your harddrive is random encrypted data and no key. The key is the original key that the uploader is provided with. The encryption key is the hash of the data, the ID of the data is a hash of that key. Your node knows only the ID, it cannot derrive the key to decrypt the data. Your node cannot be tricked into storing unencrypted data.
Now, a soon future version will support seeding v1. This is the ability to essentially 'like' a file/website/etc, and that will ensure that your node never throws out the data blocks it does store for that file/site/files_that_site_links_directly. This is mostly UI work that is missing. This feature will also automatically seed for example your Dmails, comments, trust publications, own site, Etc. (Although a UI will let you control it).
After that, more distant, will be seeding v2. This will be more like the file sharing that is similar in concept to how https://peeriodproject.github.io/dl/peeriod_an-anonymous-app... envisions file sharing. Their paper came out as I was coding and I noticed they were the closest in many ideas to mine of any papers I've seen. Their project seems stalled, as most coming out of the academic world are. That is why I started with code and not a paper :) I will write the paper before 0.9 release.
FYI, your nodes ID is a SHA-512 hash (everything is SHA-512 hash) of your node's RSA-4096 public key. (All crypto is: SHA-512, RSA-4096, and DH-group14-sha1 (for compatability with openssh)).
That relation of your node's ID is one of the Sybil proofing measures designed early on.
I designed everything to be transparent and modular with crypto so I can switch to better algos as they become available (quantum proof, I am watching you!). The first addition will likely be ECC once I have decided upon a safe (as in non NSA backdoored) ECC curve which I believe the community has not proven yet as safe as RSA-4096.
If you're the author, can you give specifics on performance?
Sub second latency, and it will already download as fast as your pipe will allow for most people, and it will only get orders of magnitude faster! This is first draft, tech preview, yet due to how good its design is, it is fully functional and beyond what is out there.
Thanks for your reply. Pretty good project you have there, and your overall mission, well, it looks like a moonshot but it's noble and pure. I like that.
Have you measured how many requests per second (reads/writes) can you get at max? (For some given hardware configuration)
Thank you very much!
Yes, on my 2nd generation low power i3 running my normal desktop at the time, I get individual ~200ms response time, and that is with 115 32kbyte requests per second.
I am rewriting the high level protocol code that is a bit rickety because it is snowballed from the earliest code in the project other than the asyncio SSH library I implemented from scratch.
When that rewrite is done (a week or two), that will decrease latency greatly, and improve efficiency greatly, and thus should even improve throughput (although that is already max out your pipe with actual data as it is low overhead).
If I switch from 32k blocks to 512k blocks which I did some testing on (Freenet is 1meg blocks), that gives me a 10x !! throughput improvement with same CPU usage and no increase in latency of per request.
The only reason I am 32k blocks originally was the ssh protocol is 35k max packet size and I don't want to break spec so as to be able to hide as normal ssh traffic :) The 512k blocks test I did as multiple packets and was 10x faster, because that means 512k per FindNode operation instead of just 32k :) I was sitting on the fence on switching to it because I want to do the rewrite of the high level code first because the multiple data packets per request complicates that snowballed code even more :)
Also, I am using pycrypto which initial tests show is actually much slower than the other library I will likely switch to (it is called simply cryptography, it wraps platform openssl instead of implementing itself as pycrypto does). I went with pycrypto to minimize dependencies. I will have it detect if you have cryptography installed and use that optionally. I've already abstracted the pycrypto api so I can easily have it switchable at runtime. This should decrease latency a good amount as well.
Ok, try testing with packets of around 128-1024 bytes.
A 'good' result for an i5-i7 core is to get at least 10k requests per second on that situation.
You are gonna live or die by this measure dude, so work on improving it. You are around 100x far from it but if you're lucky you can get there with 'just' two 10x improvements. I suggest you to look into Flame Graphs [1], they are awesome. I have used them to pinpoint exactly where are the 10-100x bottlenecks on my code and unclog them.
Also, about your website, just make it sound less like an infomercial and you'll be fine.
And last but not least, best of luck!
[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
I should have mentioned one very important point with the figure I provided. That is 115 requests per second on /one/ i3 /core/. The network itself isn't sweating. Every user of a distributed Reddit for example would be getting those 115 requests per second themselves if they weren't a sizable portion of the network. So 100 users and your figure is matched. The bittorrent mainline DHT has multiple millions of simultaneous nodes at any one time. MORPHiS as is already deprecates bittorrent so is destine to absorb all those nodes. Also, the network scales log, and not just log base 2. Kademlia has an 'accelerated lookup' where you can control the base with memory cost to achieve O(log base 2^b) lookups. A b=3 is a resonable value for memory usage.
I should have mentioned, it is currently limited to one core, due to python. I will make it multicore probably before 1.0. It will be relatively easy to do with the block based design of morphis and the multiprocessing pool api of python. I already use the multiprocessing to great effect in the proof of work and prefix generation.
Also, this is written in a scripting language, Python. Also it is first draft and unoptimized.
If the 115 requests is enough of a problem, which I don't see it being because that is per node, not of the whole network. If that is a problem and can't be improved enough with python, the idea was originally to port it to Rust. Rust wasn't even 1.0 when I started coding, never mind their asyncio io library didn't exist until a couple months ago. Come the Rust port using their newly released asyncio library which performs nearly as well as the libev C well known one, we will be talking the kind of performance you are talking, although 10k requests per second just isn't needed on one node. It is certainly doable though if given the time! Remember, a distributed app doesn't run on one node, and thus need all 10k requests per second on one node.
Thanks for that link, that is a good idea to use that instead of just Python profilers. I will give it a try when I have the time to do optimizations.
Thanks also for the wish of luck!
sorry, I am not and I haven't used it yet, but it looks like an interesting project.
Cool idea and thanks for implementing morph.is.
How does it compare with ipfs?
Biggest thing that nothing else has:
MORPHiS has TargetedBlock technology which enables Dpush technology which enabled Dmail:
Dead simple and fool-proof to use decentralized network hosted spam proof high performance uncensorable encrypted authenticated unsolicited messaging.
Email is deprecated already by MORPHiS as is.
Tied with security (Sybil proof and uncensorable) as the #1 goal of its design and architecture is that even a child can use it, fool-proof.
It is also here already and higher performance.
It is also not stopping at a datastore.
Will this work with Python 3.3?
In order to be Python + High Performance I went with asyncio, which is amazing, blew me away in performance, very glad I went with it. However, that unfortunately means it needs 3.4.0 or greater.
Just download Python 3.4 and compile yourself. It is dead simple (./configure; make; make install). It then can be referred to as python3.4, pip3.4, Etc., not conflicting at all with your existing system.
Another possibility is to run Morphis on top of Anaconda3:
asyncio is available on 3.3 too https://pypi.python.org/pypi/asyncio