The Holy Grail of Crackpot Filtering: How the arXiv decides what’s science
backreaction.blogspot.comThis is dismaying. I can sort of understand that the physicists want lots of filtering, but in the area I follow on Arxiv (machine learning), it seems to me as if there's a) little filtering and b) this works out great.
I have seen crank papers, but they are just ignored and quickly crowded out by papers people are actually interested in reading.
A while back they ran surveys on whether they should filter more. I and everyone I talked to said, don't worry about filtering, don't worry about prestige (yours or author's), just give a place to publish and good tools for searching/browsing (a la external sites like arxiv-sanity), and it's perfect.
But it didn't occur to me that attitudes might be very different in other fields.
[I am conjecturing]
1. In a recent field like machine learning, there has not been enough time for rank amateurs to assimilate the relevant language without understanding the core material, but in physics, people have had centuries for the Newtonian model and a hundred years for quantum mechanics.
2. Because machine learning is so recent, there is less difference between a rank amateur's quackery and the set of ideas "unexplored which might work". That is to say, there isn't a strong established model and supporting theories about how and why techniques work. The state of the art in machine learning is still so empirical that it hasn't even developed its own "phlogiston".
3. As a new field, it is "easier to make progress" in machine learning. In part because the low hanging fruit has yet to be plucked. In part because people are less able to point to what has worked in the past. And in part because the pace of advance creates more tolerance toward mistakes...it's currently in a continuous delivery mode.
But I could easily be wrong.
Mostly reasonable, I think. But there's also something more: experiments are readily reproducible, and it's becoming a norm to include code for reproduction.
About assimilating the language, I'll add that old ML papers use a lingo which is almost incomprehensible to me, whereas with newer ones, they obviously place high value in being as clear and readable as possible, to get the most attention from practitioners as well as researchers.