Modeling Molecules with Recurrent Neural Networks
csvoss.github.ioThat is a very strange way to generate molecules. It would be interesting to see if this is any better than other approaches used to generate molecules like those used in drug design. It would be particularly interesting to see how this compares with the approach that was used to generate GDB-17[1] a database of randomly generated molecules or at least see if generated molecules pass through the filters used to make GDB-17. Grammatically correct does not necessarily mean physically reasonable, recall Chomsky's "colorless green ideas sleep furiously."
While it would be interesting to model reactions with RNN, I'm not so sure this would offer any advantage over simply searching a database of reactions like the Crossfire Beilstein database[2][3]. I am also curious if you investigated reaction MQL in work with the carbonate project.
This work is interesting though. Essentially you are using a neural network to generate graphs. There are a lot of things that can be represented with graphs, IE electric circuits and such. Maybe you could make a RNN for generating neural networks!
[1]http://www.gdb.unibe.ch/gdb/home.html [2]https://en.wikipedia.org/wiki/Beilstein_database [3]http://www.ncbi.nlm.nih.gov/pubmed/21378798
Haha, interesting, but I thought this was going in a different direction. There's a lot of work going on lately to try to model molecular potentials (for molecular dynamics / quantum chemistry) using neural networks. Accurate potentials are extremely expensive to calculate.
Not the direction I was expecting either, but interesting nonetheless. Regarding modelling molecular potentials with neural nets, do you have any recommendations in terms of recent papers on the topic?
The list of speakers at the "Machine Learning Methods in Materials Modeling" stream at this recent conference [1] are a few people in the area, though some of those are more predicting properties than making atomic potentials.
I chatted to Gabor Csanyi (Cambridge [2]) during the conference, who I think is probably one of the furthest ahead in the area, and they've recently moved away from their gaussian process based methods to kernel methods. With regard to NNs, he seemed of the opinion that CNNs (the more obvious NN model) were too expensive and ultimately unnecessary compared to carefully chosen, physically motivated kernels. I have to admit I didn't quite understand everything he presented, and I can't seem to find a recent publication, but I'm sure there's one out there.
Despite enthusiastic presentations with lovely results, I suspect from the slow progress in this area that transferability is the main problem plaguing these methods. You want a local atomic potential which doesn't depend on its environment beyond a certain radius, sort of like a convolutional kernel, but a lot of this sort of materials modelling/quantum chemistry is pretty inherently delocalised. Machine learning isn't magic and ultimately has to reflect and represent the underlying physics.
[1] http://nano-bio.ehu.es/psik2015/programme.html
[2] https://camtools.cam.ac.uk/wiki/site/5b59f819-0806-4a4d-0046...
One of Gabor's grad students, Alan Nichols, gave a half-hour talk (Learning Quantum Mechanics: Machines versus Humans) that was previously posted to HN: https://news.ycombinator.com/item?id=8912703
Where did your data set come from? It contains at least one error:
WOF2, tungsten(VI) oxytetrafluoride
that correct formula is WOF4
Nice catch! That one is from Wikipedia: http://en.wikipedia.org/wiki/Dictionary_of_chemical_formulas
I've updated the offending page.