Settings

Theme

Black box optimization competition

bbcomp.ini.rub.de

32 points by silentvoice 11 years ago · 25 comments

Reader

darkmighty 11 years ago

I really dislike the term "Black box optimization". There's no such thing. You have to make assumptions about your function, so in the end this is just rewarding people whose optimizers happen to match the chosen functions; but those functions are not made explicit whatsoever. That doesn't make any sense.

For example, if the output/input are floating point numbers than you can assume the domain/range is [-M,M]. Otherwise, with even the most clever function you have no guarantee of ever approaching the optimum, even if the function is continuous. Now even with a limited range there are no guarantees if the function is not well behaved -- so you have to again assume the function is well behaved. And for any assumption you make there is a condition on function for which it is terrible. There is no best assumption, or best algorithm, then. You could, for instance, assume the function is adversarial (trying to make your life difficult), for which the best algorithm is perhaps just sampling randomly the range, which is really a terrible algorithm -- but that's of course just another assumption, and a terrible one.

I would much prefer 'Typical function optimization', if you're optimizing unlabeled functions so frequently, or at least not try to hide the inevitable assumptions.

TL;DR: The contest may be useful, but the concept of "Black box optimization" is nonsense.

  • rer0tsaz 11 years ago

    The domain is the unit cube [0, 1]^d using double precision floating point, see the documentation.

    Making assumptions and testing them is very much part of the contest. You are even allowed to do this interactively.

  • jpfr 11 years ago

    Yes, there is such a thing.

    There exist many more techniques than trivially assuming some "template" function and fitting the function parameters against the data.

    Have a look at nonparametric modelling techniques. For example kernel regression or gaussian processes. You either don't make any assumptions, or you take an uninformative prior that distributes over all possible results.

    This competition evokes modelling, optimisation and the exploration/exploitation tradeoff. I'm sure there will be very interesting theory behind the winning entries...

    • darkmighty 11 years ago

      The point is, I don't even need to look up your techniques (although I did out of respect) to know there really isn't such a case; what I stated is a simple, almost trivial principle (apparently it has a name [1] as some pointed out).

      Mathematics models data, and you can't model without assumptions. It's like developing a theory which can't have axioms. For example, kernel regression probabilistic model is a terrible model (assumption) with very large error for a large class of distributions[2], and so on. We're talking about picking the best technique; this technique is going to pick some assumptions arbitrarily that will or will not work well based on an unclear choice of the organizers. That's why I would prefer if they stated instead "Functions with some real world relevance", or "Typical functions", or maybe "Poorly behaved functions", and so on.

      [1] http://en.wikipedia.org/wiki/No_free_lunch_in_search_and_opt...

      [2] On the wikipedia page you can see they do make assumptions on f to minimize the squared error for choosing the kernel. It's inevitable.

      • jpfr 11 years ago

        You are fighting a mathematically pure interpretation of black boxes that are making no assumptions at all. Your observations are correct. But nobody actually interprets the term "black box" the way you deem wrong.

        Taken from here [1]:

        White-box models: This is the case when a model is perfectly known; it has been possible to construct it entirely from prior knowledge and physical insight.

        Grey-box models: This is the case when some physical insight is available, but several parameters remain to be determined from observed data. It is useful to consider two subcases.

        1. Physical modeling: A model structure can be built on physical grounds, which has a certain number of parameters to be estimated from data. This could, for example, be a state-space model of given order and structure.

        2. Semiphysical modeling. Physical insight is used to suggest certain nonlinear combinations of measured data signal. These new signals are then subjected to model structures of black-box character.

        Black-box models: No physical insight is available or used, but the chosen model structure belongs to families that are known to have good flexibility and have been 'successful in the past'.

        [1] http://www.sciencedirect.com/science/article/pii/00051098950...

        • darkmighty 11 years ago

          Fair enough. I wasn't not familiar with the literature to be honest, it was just a remark.

          I still dislike the term and concept, but it's hard to argue with a conventional definition. I believe assumptions should be made as clear as possible and the term seems like a futile attempt at hiding them.

      • nullc 11 years ago

        Look at it this way: Many interesting problems in engineering have expensive to evaluate objectives with generally unknown structure and noisy multi-modal results, but are still piece-wise smooth. It's true that in the space of all possible functions virtually none meet these criteria, but many practically interesting ones do.

        If your function really is some a random oracle, then, indeed, no optimizer will do well against it. OTOH, none will do (relatively) poorly either.

        Effective optimization techniques can explore a function generally and exploit similarities to known models or at least any smoothness they can find. Ineffective techniques will just it caught in local minima or fail to exploit smoothness or "obvious" structure.

        Powerful "generic" optimizers are a tool which is important for industry. But the common ways they are benchmarked potentially allows for overfitting in the design phase, this contest is intended to correct that, and provide a potentially better assessment of how general these optimizers are.

  • silentvoiceOP 11 years ago

    Maybe a better term should be "blind" rather than black-box. I think the goal is simply to hold optimization to the same level of reproducibility that is expected of most scientific fields today, and if a researcher is allowed to introduce a hundred tunable parameters that makes their algorithm converge on all the standard test cases then they haven't created a reproducible optimizer - they have created a benchmark solver.

  • enkico 11 years ago

    what is "typical" for one can be "rare" for the other, "black box" suggests that the participants do not know what is inside, the organizers on their side should make sure that the content is of some interest for the "real-world problems/applications"

    what you are describing is related to the "no free lunch theorem", something one can attempt to deal with to get things working "in practice"

    • darkmighty 11 years ago

      The organizers making sure it has some real world relevance is what I would equate with problems being "typical". In practice, you may find ill-characterized "Typical problems" and solve them, but as I said, a truly "Black box optimization" would not make sense; hence I dislike the term (and the general problem statement).

murbard2 11 years ago

It's a little strange that they do not have a track that gives gradient information, given that it is often a real world possibility. Also, this basically allows unlimited time between eval... So this becomes a contest about - coming up with a distribution over R^n -> R function - finding the optimal evaluation points to do Bayesian update

I predict the winner will use some a mixture of Gaussian processes with various kernels and stochastic control (with a limited look ahead, otherwise it blows up) to pick the test points.

  • pilooch 11 years ago

    You can compute the gradient, it just has a high budget cost.

    The usual winner is a flavor of CMA-ES, though they may have picked up the functions to avoid this.

    • murbard2 11 years ago

      You're missing my point. In many real world problems, it is cheap to compute the gradient. Thus, black box optimization methods which can use gradient information are inherently valuable, and it is surprising that they do not have a track that would allow showcasing those.

      • nullc 11 years ago

        In a great many real world problems, including most of the most expensive ones gradients are _not_ available, or can only be expensively computable... even if your objective is differentiable, automatic differentiation isn't cheap on non-trivial functions.

        Experiences differ, but in mine the most common place to find objectives with gradients is in optimizer challenges.

        That said; sure, there should be a track that gives you the gradients. I agree that it would be nice if there were another track.

obstinate 11 years ago

Seems really interesting. Too mathy for my skillset.

If I may, I propose that the organizers remove the restriction on disassembling the client library or intercepting network connections. This restriction seems like it cannot benefit the organizers, unless the protocol is insecure. People are going to ignore this rule anyway, and you can't stop them or even detect them doing it. So why put it in there? It's only going to generate ill will.

  • rer0tsaz 11 years ago

    It's probably insecure, because you don't want to do 75,850,000 sequential evaluations over a network. It would take over a week for a single track with even just 10ms response time.

    • obstinate 11 years ago

      Presumably a secure implementation would just batch up evaluations into groups of ten or a hundred. You could make something like this work.

    • nullc 11 years ago

      They run over the network (no, I wasn't trying to cheat, but while stracing to debug my own code, I could see it sending for each query).

      Each track is split up into 1000 tests; you can run the separate tests concurrently; which more or less eliminates the latency as the largest test permits 'only' about 400k queries.

darklajid 11 years ago

Whoa. The servers for this competion are about 8km away. That's the most 'local' content I've ever seen on HN.

Unfortunately I have to agree with obstinate here. The pure math is too much for me and reverse engineering (still daunting, but interesting/possible) is not acceptable. If any HN person wins this contest, I offer beers close to the black box :)

  • pilooch 11 years ago

    Shameless plug: anyone interested should be able to get baseline results and above easily by using libcmaes. I am one of the authors with no time to compete, but am interested in reports on how it goes. Also if you are a researcher or a student the lib should let you experiment easily with various custom strategies.

    https://github.com/beniz/libcmaes

cshimmin 11 years ago

Wish I had seen something about this sooner. The competition began in January and ends on the 30th of this month.

  • nullc 11 years ago

    That was my thought too! but instead of even click on the HN comments I went and wrote a contestant. Within a couple hours all my runs will have completed; assuming no big power failure I'll make the deadline! :)

    (Uh, no doubt I won't do well, since I had no time to ... like.. actually test my code on any functions except a couple trivial trials. :P ... I hope they put up some kind of ranking information as soon as it closes. I have no idea if my results are awful or merely bad :) (and I probably shouldn't share best numbers before it closes) )

  • enkico 11 years ago

    There are two tracks, the second has the deadline in the end of June.

ramgorur 11 years ago

1. You do not know what the function looks like, even there is no gradient information

2. You have a fixed number of probes M

2. Among M, You have N number probes to get the silhouette of the function (exploration).

3. Then from the rest of the (M - N) trials, you need to find the optima (exploiation).

Sounds more like a pseudo-science than a math problem to me.

  • Houshalter 11 years ago

    Huh? Who said it was a math problem? And pseudoscience? Most real world optimization problems are like this. Sometimes don't get gradient information or unlimited trials.

    The point of the task is to reward methods that work efficiently with limited trials and domain information, rather than who can run hillclimbing on the biggest computer or hand tune the parameters the best.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection