Settings

Theme

What does a neural network actually do?

moalquraishi.wordpress.com

151 points by zercool 12 years ago · 38 comments

Reader

dlsym 12 years ago

"What does a neural network actually do?"

This is a fundamental Question: Can we really say and predict what a neural network does?

Contrary to an engineered / constructed algorithm a neural network is 'trained'.

Whenever we will present a 'known' input pattern it will reposnd with a 'learned' response.

This however introduces interesting problems: How can we _debug_ a neural network? How can we debug a correlation? Sure we can tune its paramters, we can train it some more to again show the desired response. But reaching this point we just abandoned knowing how the intrinsic algorithm works in favor to just focussing on the result.

Okay - now if we follow this argument - this would lead to: If we simulate the whole brain by simulating the neural network, we won't gain any knowlege about the intrinsic workings of the brain. We won't find any enlightenment about the innermost algorithm _represented_ by the neural network we call our brain.

  • joe_the_user 12 years ago

    I think you have hit upon a problem of present day AI.

    Neural networks, Support Vector Machines, Hidden Markov Models and other stuff (Markov Networks, etc) do something like linear regression on some huge space - they draw a curve/plane between groups of things on this feature space. The tendency is for this division to make sense and to correspond to our common sense categorization of these things.

    The problem is that once that happens, you can't really reason about the division you've drawn. It's just there. You can tweak for various purposes but that's a manual process.

    You categorize animals by shape or particular adaptation or by genetic makeup. You can teach one of these algorithm each of these categorizations. But you can't do something like have the thing categorize for one purpose and then tell it to "change it's outlook" and categorize for a different purpose.

    In this sense, despite seeming impressive, the products of these processes are dead-ends that we can't reason about, that lack the flexible intelligence of a human being.

    • judk 12 years ago

      Your criticism is fair, but you fail to explain how an NN or SVM is any worse than how a human mind actually operates.

      In other words, the incomprehensibility of a modern AI model is not a failing of AI, it is a failing of (AI) psychology and (AI) neuroscience.

      The artificially constructed intelligence works whether or not we understand how. The frontier of AI science is now open to AI Psychology. Psychologists and Neuroscientists will replace the data scientists! Such fun!

      • joe_the_user 12 years ago

        I could hardly contrast the operations of a NN to those of the human mind because we most definitely don't understand the later. I have already described apparent properties of human minds that NN and SVN definitely don't have. But I'll repeat and expand:

        The artificially constructed intelligence works whether or not we understand how

        The operation of human intelligence is very dependent on the fact that we human have operational understanding of each other's mental processes. Moreover, it is well known that human beings process language and can re-evaluate past experience in light of present understand.

        Oppositely, we know enough about the properties of the various non-linear recognizers to know that constructs can't do and won't ever be able to do these things.

  • ShardPhoenix 12 years ago

    I can't find an example on Google right now, but I've seen demonstrations that it's possible to visualize the intermediate layers of a neural network - for example you can see how an image recognition network is first breaking down an image into horizontal and vertical lines, then combining those into more complex shapes, etc.

    • joe_the_user 12 years ago

      But visualizing is quite a ways from debugging.

      To debug a program you actually verify that it's logic is correct (at least the good kind of debugging).

      Consider a spectrum:

      1. Natural language - we humans combine fragments of natural language easily and on an ad-hoc basis. We can get a fairly amount of use from reusing Shakespeare quotes and neologisms while spending rather little effort.

      2. Trained programmers can reuse and combine general purpose libraries - with difficulty and often after considerable debugging.

      3. AI algorithms like Neural Networks. These are just plopped in and tweaked, not combining seems possible.

      It seem like "intelligent behavior" should be going more towards #1 but the process of Machine Learning seems to move things more towards #3. The "learn once, understand never" approach means that for each significant case, you'll need to do a re-tweaking and re-learning. The potential to get harder rather than easier over time might well be there.

      • judk 12 years ago

        Can you debug a human brain? I can't.

        Is a human brain intelligent? I believe so.

        • joe_the_user 12 years ago

          Well,

          Admittedly, all this is in manner of speaking but still, I would claim that most if not all the times you debug a program you are also debugging your mental concept of what the program does. By that fact that we can change our concepts, our minds are very "debuggable."

    • agibsonccc 12 years ago

      These are called filters. This is from my deep learning framework: Debugging a net visually: http://deeplearning4j.org/debug.html

      An example of doing facial reconstruction: http://deeplearning4j.org/facial-reconstruction-tutorial.htm...

    • aidos 12 years ago

      There's an interesting example in one of the coursera courses (Neural Networks for Machine Learning) - you just need to watch through the intro video to see it in action.

      https://www.coursera.org/course/neuralnets

csense 12 years ago

I've found that, in practice, traditional neural networks tend to be prone to overfitting and are finicky about their parameters (in particular the topology and number of nodes you choose).

I use the word "traditional" to describe the NN architecture discussed in the article. Recent NN research has been promising [1], but this article strictly discusses traditional NN's. I don't really have much experience with the newer NN algorithms, so I'm not sure to what extent they suffer from the same problems as traditional NN's.

[1] http://en.wikipedia.org/wiki/Neural_network#Recent_improveme...

  • billderose 12 years ago

    Hinton's DropOut [1] and Wan's DropConnect [2] have ameliorated some of the overfitting issues present in traditional NN's. In fact, DropConnect in conjunction with deep learning are responsible for new records being set on classical datasets such as MNIST.

    [1] http://arxiv.org/pdf/1207.0580.pdf [2] http://cs.nyu.edu/~wanli/dropc/

    • agibsonccc 12 years ago

      Dropout is actually a knob on any neural network. These are used in image recognition as well as text and other areas.

      The fuzzing creates a very similar effect to convolutional nets where it can learn different poses of an image.

    • im3w1l 12 years ago

      It's pretty funny, I saw DropConnect described in a stackoverflow answer that predated the paper you reference. It was an incorrect answer on how to do dropout. I shall try to find it tomorrow.

  • niels_olson 12 years ago

    Is it safe to say that in ML, use of NNs is more about writing code that designs NNs, evaluates the results, and modifies the designs to optimize some desired meta-values, like accuracy, efficiency, etc?

nullc 12 years ago

Another limit they don't address is that the training normally used is purely local— just a gradient descent. So even when the network can model your function well, there is no guarantee that it will find the solution.

For me ANN's always seem to get stuck on not very helpful local minima— they're not one of the first tools in my bags of tricks by far.

Often I associate them as being the sort of thing that someone who doesn't really know what they're talking about talks about. (Esp. if its clear that in their minds NN have magical powers. :) maybe they'll also mention something about "genetic algorithms")

  • alkonaut 12 years ago

    > So even when the network can model your function well, there is no guarantee that it will find the solution.

    If it models the function over the input domain, then it is properly trained. If it is trained to a local minima then it doesn't model the underlying function well over the whole input domain. If you have good/representative training and validation sets you will be able to tell.

    > Esp. if its clear that in their minds NN have magical powers

    I know that type. When dealing with ANN's you realize quickly (just like in all data science) that all of the "magic" relies on the manual work and thought that goes into washing and adapting the data. Not very sexy work, and work that requires a fair bit of knowledge about the problem domain.

    > For me ANN's always seem to get stuck on not very helpful local minima

    That isn't the ANN that gets stuck, it's the training algorithm (using gradient descent) that gets stuck :) Training is orthogonal to the operation of the network itself (which is just a nonlinear function in the end!). Gradient descent via error backpropagation is the most common training method for MLP's, but you could imagine doing a random/brute force algorithm that is significantly simpler to implement, but slower. Since a network is often trained once and then used repeatedly, it is often plausible to train it for several weeks if needed! A pure random search is usually not feasible, but adding randomization to a gradient descent will help. There are many ways to avoid local minima for a gradient desccent, if you have time to wait.

    > maybe they'll also mention something about "genetic algorithms"

    The simple error backpropagation methods only work well for normal feed-forward networks. Other topologies e.g. recurrent networks require more exotic methods. In my (limited) experience genetic algorithms are rarely efficient as a training method though.

  • nanidin 12 years ago

    Well, you could use an EA to take a stab at finding better minima :)

    And correct me if I'm wrong, but isn't the cost function for a feed forward neural networks that uses a sigmoid activation function convex wrt the parameters being trained, i.e. gradient descent is guaranteed to find the global minimum when small enough of a step size is used?

    • chestervonwinch 12 years ago

      Mostly, no. Hidden units introduce non-convexity to the cost. How bout a simple counter-example?

      Take a simple classifier network with one input, one hidden unit and one output and no biases. To make things even simpler, tie the two weights, i.e. make the first weight equal to the second. Now, mathematically the output of the network can be written: z=f(w * f(w * x)) where f() is the sigmoid.

      Next, consider a dataset with two items: [(x_1, y_1), (x_2, y_2)] where x_i is the input and y_i is the class label, 0 or 1. Take as values: [(0.9, 1), (0.1,0)]. The cost function (loglikelihood in this case) is:

      L(w) = sum_i { y_i * log( f(w * f(w * x_i)) ) + (1-y_i) * log( 1-f(w * f(w * x_i)) ) }

      or

      L(w) = log( f(w * f(w * 0.9)) ) + log( 1-f(w * f(w * 0.1)) )

      Plot that last guy replacing f with the sigmoid, and you'll see the result is non-convex - there's a kink near zero.

oldspiceman 12 years ago

A less mathy explanation with some real examples: http://neuralnetworksanddeeplearning.com/chap1.html

Coding a digit recognizer using a neural network is an extremely rewarding exercise and there's a lot of help on the web to get you started.

  • agibsonccc 12 years ago

    This is a great example of a hello world application. Keep in mind there are several kinds of neural nets that allow you to do this. This includes convolutional RBMs (recognizes parts of an image) and normal RBMs (learns everything at once)

robert_tweed 12 years ago

This is a pretty good article, but I'm seeing a lot of confusion in this thread because the article is maybe one step ahead of the basic intuition needed to understand why ANNs are not magical and are not artificial intelligence (at least not feed-forward networks).

Perhaps a simpler way to look at it is to understand that a feed-forward ANN is basically just a really fancy transformation matrix.

OK, so unless you know linear algebra, you're probably now asking what's a transformation matrix? Without the full explanation, the important understanding is why they are so important in 3D graphics: they can perform essentially arbitrary operations (translation, rotation, scaling) on points/vectors. Once you have set up your matrix, it will dutifully perform the same transformations on every point/vector you give it. In graphics programming, we use 4x4 matrices to perform these transformations on 3D points (vertices) but the same principle works in any number of dimensions - you just need a matrix that is one bigger than the number of dimensions in your data*.

Edit: For NNs the matrices don't always have to be square. For instance you might want your output space to have far fewer dimensions that your input. If you want a simple yes/no decision then your output space is one-dimensional. The only reason the matrices are square in 3D graphics is because the vertices are always 3-dimensional.

What a neural network does is take a bunch of "points" (the input data) in some arbitrary, high number of dimensions and performs the same transformation on all of them, so as to distort that space. The reason it does this is so that the points go from being some complex intertwining that might appear random or intractable, into something where the points are linearly separable: i.e., we can now draw a series of planes in between the data that segments it into the classifications we care about.

The only difference between a transformation matrix and a neural network is that a neural network has at least two layers. In other words, it is two (or more) transformation matrices bolted together. For reasons that are a bit too complex to get into here, allows an NN to perform more complex transformations than a single matrix can. In fact, it turns out that an arbitrarily large NN can perform any polynomial-based transformation on the data.

The reason this is often seen as somewhat magic is that although you can tell what transformations a neural network is doing in trivial cases, NNs are generally used where the number of dimensions is so large that reasoning about what it is doing is difficult. Different training methods can give wildly different networks that seemingly give much the same results, or fairly similar networks that give wildly different results. How easy it is to understand the various convolutions that are taking place rather depends on what the input data represents. In the case of computer vision it can be quite easy to visualise the features that each neuron in the hidden layer is looking for. In cases where the data is more arbitrary, it can be much harder to reason about, so if your training algorithm isn't performing as you'd like, it can be difficult to understand why it isn't working, even if you already understand that the basic principle of a feed-forward network is just a bunch of simple algebra.

  • sergiosgc 12 years ago

    > The only difference between a transformation matrix and a neural network is that a neural network has at least two layers. In other words, it is two (or more) transformation matrices bolted together. For reasons that are a bit too complex to get into here, allows an NN to perform more complex transformations than a single matrix can. In fact, it turns out that an arbitrarily large NN can perform any polynomial-based transformation on the data.

    Nice explanation. I need one clarification, though. Isn't matrix multiplication associative? Isn't thus any transformation defined by two matrices representable by a single matrix that is the product of the two matrices?

    I am probably misunderstanding how NNs bolt matrices together.

    • tfgg 12 years ago

      You apply a non-linear function (usually some sigmoid) on the output vector after each matrix product. Otherwise, you'd be correct and any multi-layer ANN could be expressed as a single layer network.

      • sergiosgc 12 years ago

        Thanks. It makes sense. The sigmoid is the activation function of the output "neuron". Unfortunately, matrix algebra here is not as useful as in computer graphics.

        • tfgg 12 years ago

          No problem. Actually, I personally found that a pretty intuitive understanding of linear algebra & vector calculus makes quite a lot of ML straight forward to approach geometrically.

    • joe_the_user 12 years ago

      Well,

      I suspect some kind of transformation could be used to make a two level NN into a one level one. The thing is the resulting one level network might be more complex and less useful than the original two level network. Still, I think this does illustrate the limitations of multilevel networks.

      Another way to see this is to notice that NNs and SVMs[1] are (approximately or exactly) equivalent [2] because they both involve the fairly simple linear and non-linear transformations we've been looking at.

      [1] http://en.wikipedia.org/wiki/Support_vector_machine [2] http://www.staff.ncl.ac.uk/peter.andras/PAnpl2002.pdf

    • dwiel 12 years ago

      Interesting to note though that even with a linear network that can be represented by a single matrix, it can be faster, easier and converge to better results with multiple layers because the different gradient and parameter space that is presented to the optimization algorithm.

  • joe_the_user 12 years ago

    A nice, cogent explanation.

    It's good to remember the ANN's input offset comes as vector data. The ANN isn't directly transforming those vectors directly, rather it is transforms these input to a higher dimensional "feature" space and performs the linear transform. If you take the separating plane that's drawn in the feature space and reverse the map, you'll the ANN has drawn complex surface between the points it want to recognize and those it rejects.

    So it's basically a heuristic and no more intelligent than Taylor's series.

  • lloeki 12 years ago

    So, IIUC creating a NN basically follows this process:

    - define an input vector space (i.e choose dimensions you want to operate on with input data)

    - define your categories in another space (or another basis in the same space?)

    - set up a transformation pipeline between the two spaces (with at least two stages)

    - devise an algorithm that takes categorised elements and produces new transformation matrices

    - train the NN (i.e feed input and categorise the result so that through some algorithm the transformation matrices converge)

neuralnet 12 years ago

http://www.dkriesel.com/en/science/neural_networks about 200 page bilingual ebook about neural networks containing lots of illustrations, in case some guys of you want to read further. There's also a nice java framework to try out for the coders.

thegeomaster 12 years ago

Looks like an interesting text, but to be honest I didn't understand a substantial portion of it.

  • sillysaurus3 12 years ago

    You're 17; get to work understanding it! The more you learn now, the more you'll be able to do for the rest of your life. Plus, diving into random topics and understanding them more deeply than anyone else is a ton of fun.

nsxwolf 12 years ago

Title made me think it would be something for beginners. Either it is not or I am very dim.

ctdavies 12 years ago

They write comments on HN.

  • jacquesm 12 years ago

    Those are Markov-chain based text generators. (yes, I got the joke).

    • Houshalter 12 years ago

      Well recently a lot of work has gone into using NNs for natural language processing. Typically it's trained to do something like predict the next word or character in a sequence. Using that you could possibly create a far better generative model than markov chains, and create more realistic sentences. Perhaps you could even combine them (NN gets the output of the markov chain to help make it's prediction.)

    • ctdavies 12 years ago

      :( I wish everyone else did.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection