Settings

Theme

To Know, but Not Understand: David Weinberger on Science and Big Data (2012)

theatlantic.com

60 points by reedwolf 5 years ago · 19 comments

Reader

redelbee 5 years ago

At what point do we shift our investment in time and energy from building models like those mentioned in the article to the bigger picture? Maybe it’s just my perception but it doesn’t seem like we have very many people thinking deeply about what models we should build and to what ends. Instead we are just building the models and hoping we can put them to good use afterwards.

For example, what’s the end game for the cellular signaling modeling outlined in the article? It seems like the result isn’t valuable in and of itself, and it can’t be much more than that because the scientist “doesn’t understand it, and doesn’t think any person could.” So we now have an equation that expresses constants within a cell and that’s it. We don’t understand it and we can’t put it to good use. So was that time and effort well spent? Do we just put this work in a drawer so we can pull it out if it could be useful at some point in the future? Is that what we’re doing with all the similar advances in modeling?

There’s nothing wrong with knowledge for knowledge’s sake, but I think we’ve way over indexed on the tools and predictions side of the system. If we continue to constantly create new tools/models/predictions we might find a use for them by chance. It just seems more efficient to focus on what outcomes we really want and then put the models to work in pursuit of those outcomes. Perhaps we focused more on the outcomes in the past because we didn’t have the technological horsepower to constantly churn out new models.

Maybe I’m wrong and there are people working on the big picture. Are there modern day philosophers doing this work? Do they make up a significant portion of the work being done? If not, why?

  • throwawaygh 5 years ago

    > what’s the end game for the cellular signaling modeling outlined in the article?

    Pharma.

    Most of the modeling work that people do is fairly well motivated. Going from models to working technology is indeed a huge leap, but everything starts with the basic scientific understanding.

    > Maybe I’m wrong and there are people working on the big picture.

    You can usually find the "big picture" behind a paper by reading the recent grant applications from the PI who funded the research (or the funding lines explicitly mentioned in the paper, if any).

    • randcraw 5 years ago

      Pharma may be the intended target for the signaling work, but as a data scientist who works in pharma, I can say with certainty that no biologist or chemist here would entertain for a minute any model that can't explain its mechanisms of action. Nor would the FDA, who wants any model not only to accurately predict the intended outcome but also reflect awareness of the contextual circumstances that surround and lead to it.

      No competent physician would be satisfied with a disembodied diagnosis. The constituent symptoms and assay metrics that support that diagnosis are essential to know, especially as disease is often complex and dynamic, and no single diagnostic label should ever hope to supplant a deeper understanding of each patient's unique mix of normality and abnormality. A diagnosis using ML may be a useful starting point in treatment, but never should be the endpoint.

      • PaulDavisThe1st 5 years ago

        >no biologist or chemist here would entertain for a minute any model that can't explain its mechanisms of action.

        Entertain? Who even really knows what means at this point. But I'm fairly convinced that you'd be quite happy to have a theory-free "intuition pump" that could tell you "if you slow down binding with the following 3 membrane proteins, you see roughly double that effect on overall energy use by the cell".

        The tool that generates this prediction may be completely unable to give you a "theory" about why this should be so, but then neither will the experiment(s) you do that confirm it to be true.

        So, while indeed, ML-style stuff "should never be the endpoint", they can act as a incredibly useful intuition pump/launchpad for ideas and approaches that would otherwise remain inaccessible.

        • throwawaygh 5 years ago

          That's the mode of use for ML in most industries -- flagging stuff for follow-up by humans. Basically anything that's not real-time works like this.

          Most uses of ML in real-time settings look more like hybrid systems -- a little dusting of ML on top of a whole heap of more traditional mathematical modeling/software engineering.

          Outside of a few very niche settings, we're still a long way off from "trusting" ML in any meaningful sense.

  • dumb1224 5 years ago

    > For example, what’s the end game for the cellular signaling modeling outlined in the article?

    I think the article meant to refer to systems biology (as in a new field). It's not exactly a single 'model' rather a methodology as far as I know. Also the 'end game' IMO in bioinformatics the goal is mostly to discover new knowledge rather than having a 'production ready' model. Through the 'big data' science one could uncover hidden biological effects, new mechanism and new insights etc. Each of these big data modelling exercise is really to push the biology forward to a deeper level. In a way it can be comparable to astrophysics (is it a coincidence that many people working in bioinformatics have an astrophysics background)?

  • xg15 5 years ago

    > It just seems more efficient to focus on what outcomes we really want and then put the models to work in pursuit of those outcomes.

    So which outcomes do we want? - and who is "we" anyway?

    Figuring that out may be a hard problem by itself.

    • redelbee 5 years ago

      I was thinking of the human “we.” I think that’s my point: It’s hard work whether you work on models or focus on the bigger picture of what outcomes would be best for humanity. I think it makes more sense to work hard on the latter, or at least to work hard on it first and then build the models.

  • ilophu 5 years ago

    Going to plug a couple of relevant things here.

    - A book I saw recommended here called "The Sciences of the Artificial," which talks about the purpose and practice of modeling with computers.

    - An old post of mine, where I wrote that "creating knowledge is a philosophical act that businesses mostly didn't realize they were getting into when they got on the data science bandwagon."

    - A post by HN user "wenc," a practicing data scientist. I'm going to copy-paste the whole thing because I think it's that good and relevant:

    ---

    Data science is correctly valued when you realize how relatively unimportant it is. It is a small cog in a larger machinery (or at least it ought to be). You see, decision-making involves (1) getting data, (2) summarizing and predicting, and (3) taking action. Continuous decision-making -- the kind that leads to impact -- involves doing this repeatedly in a principled fashion, which means creating a system around the decision process. For systems thinkers, this is analogous to a feedback control loop which includes sensor measurements + filters, controllers and actuators. (1) involves programmers/data engineers who have to create/manage/monitor data pipelines (that often break). This the sensor + filters part, which is ~40% of the system. (2) involves data scientists creating a model that guides the decision-making process. This is the model of the controller (not even the controller itself!), which is ~20% of the system. Having the right model is great, but as most control engineers will tell you, even having the wrong model is not as terrible as most people think because the feedback loop is self-correcting. A good-enough model is all you need. (3) involves business/front-line peoplewho actually implement decisions in real-life. This is where impact is delivered. ~40% of the system. This is the controller + actuator part, which makes the decisions and carries them out. Most data scientists think their value is in creating the most accurate model possible in Jupyter. This is nice, but in real-life not really that critical because the feedback-loop inherently moderates the error when deployed in a complex, stochastic environment. The right level of optimization would be to optimize the entire decision-making control feedback loop instead of just the small part that is "data science". p.s. data scientists who have particularly low-impact are those who focus on producing once-off reports (like consultant reports). Reports are rarely read, and often forgotten. Real impact comes from continuous decision-making and implementing actions with feedback. Source: practicing data scientist

trabant00 5 years ago

From my point of view there are 2 possible ways:

- we simply acknowledge we don't understand something enough and keep looking into it until we do. I mean everything we now understand (at an acceptable level by our standards) has gone through an intermediary phase - see alchemy for example.

- we declare some things (prematurely?) as forever escaping our grasp and accept we may never have a simple model of them.

What bothers me is the 3rd way:

- we don't know why but the computer model gave this result so let's go ahead with putting it into production. We make money, the user/consumer may have a nice experience or die, fingers crossed.

  • elliekelly 5 years ago

    I think there's another (perhaps worse?) way:

    - we think we understand a complex model completely and we actually don't

reedwolfOP 5 years ago

Bottom line:

"With the new database-based science, there is often no moment when the complex becomes simple enough for us to understand it. The model does not reduce to an equation that lets us then throw away the model. You have to run the simulation to see what emerges. For example, a computer model of the movement of people within a confined space who are fleeing from a threat--they are in a panic--shows that putting a column about one meter in front of an exit door, slightly to either side, actually increases the flow of people out the door. Why? There may be a theory or it may simply be an emergent property. We can climb the ladder of complexity from party games to humans with the single intent of getting outside of a burning building, to phenomena with many more people with much more diverse and changing motivations, such as markets. We can model these and perhaps know how they work without understanding them. They are so complex that only our artificial brains can manage the amount of data and the number of interactions involved."

  • throwawaygh 5 years ago

    > The model does not reduce to an equation that lets us then throw away the model. You have to run the simulation to see what emerges.

    This is true of simulation in general, not just data-drive models. E.g., a lot of applied mathematics uses PDE models that don't have closed-form solutions and so you just run a ton of simulations sweeping a parameter space.

    > For example, a computer model of the movement of people within a confined space who are fleeing from a threat--they are in a panic--shows that putting a column about one meter in front of an exit door, slightly to either side, actually increases the flow of people out the door.

    The crux of this type of science is that you don't know whether the computer simulations are telling you anything about reality. You just have to run real-world experiments and see what happens. And even if the experiment turns out to work, you still don't know for sure that your model was reasonable.

    • mturmon 5 years ago

      It's also worth saying that there's an emerging discipline devoted to formalizing and developing tools for this kind of problem (complex computer models of real-world systems, say).

      One decent place to start is this National Academies report [1] on Verification, Validation, and Uncertainty Quantification.

      Verification = did you implement the math correctly in the computer;

      Validation = does the implemented mathematical model compare against the real system in controlled experimentss

      Uncertainty Quantification = analysis and prediction of the accuracy of the model approximation.

      This work was given a big push by the nuclear test ban treaty - you have to really validate the model predictions in this case.

      [1] https://www.nap.edu/catalog/13395/assessing-the-reliability-...

    • newqer 5 years ago

      So you are suggesting we need to do a double blind test, by means of throwing a molotov cocktail at a few people gatherings?

      • nxpnsv 5 years ago

        Double blind so that neither the thrower of recipients of said cocktail doesn't know wether it is real or placebo? It's the only way to know if they are panicking because of a bottle or a fireball...

      • throwawaygh 5 years ago

        There are probably slightly more ethical ways of designing that study...

andrewla 5 years ago

Completely off topic -- one of my favorite stories from a friend at Google was that they saw someone writing what looked like a giant AWK script, and they went over to the guy and told them "look over at that desk, that's Brian Kernighan, the 'K' in AWK" only to be met with a scornful "I'm the W" from Weinberger.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection