Pareto Front Sort for Python

41 points by kummappp 6 years ago · 13 comments

Reader

kummapppOP 6 years ago

I did not find pareto front sorting library so I made one. It took a weekend. Save your weekend and use my module instead.

phijFTW 6 years ago

What about pymoo's [1] Non-dominated Sorting? [2]
I used it recently since I had the same problem.
[1]: https://pymoo.org/
[2]: https://github.com/msu-coinlab/pymoo/blob/master/pymoo/util/...
anonymoushn 6 years ago

In the first example, why does (None, 0, 1) dominate (0, 0, 0)? The latter has a higher value in the first position.
- kummapppOP 6 years ago
  
  Oh, it is a bug. I'll fix it. How embarrassing and thanks. I have mainly used the None means the whole row is inferior -thing.
klyrs 6 years ago

This seems pretty cool, and something I might use.
Echoing another commenter: can you add a brief intro to what's meant by a Pareto front? A picture would do wonders.

phijFTW 6 years ago

Generally, I would also recommend using pymoo [1] if you have multi-objective optimization problems. They also have an implementation of Non-dominated Sorting [2], which should serve the same purpose.

[1]: https://pymoo.org/

[2]: https://github.com/msu-coinlab/pymoo/blob/master/pymoo/util/...

kummapppOP 6 years ago

Thanks. The 'Non-dominated sorting' was the magic phrase for google i was missing.

anamexis 6 years ago

Can you explain, for the uninitiated, what Pareto front searching is?

hansvm 6 years ago

The general idea is that if you have multiple things you care about (objectives) it's generally not possible to optimize them all at once (e.g. in computer science you often can't minimize runtime and memory usage in the same algorithm). However, there are some objectively bad answers. If you have an algorithm running in O(n) time and O(n) space (assuming no other objectives of interest), one running in O(n) time and O(n^2) space is worse.
The Pareto front is a formalization of that idea. It's a collection of all "non-dominated" points. A point is dominated if there is some other point that's at least as good in all dimensions and better in at least one. A point is "non-dominated" if there are no points that dominate it.
Going back to the [O(n), O(n)] example, this would not be dominated by, e.g., an [O(1), O(n^2)] algorithm since even though the time complexity is better, the space complexity is worse. Out of the three examples I've given, those two would be in the pareto front.
I haven't read through the whole library yet, but it looks like the way this works is it finds the pareto front (all non-empty, finite collections have a non-empty sub-collection of non-dominated points), finds the pareto front of the remainder, and so on till it's exhausted the input. Your output is a collection of "waves" of the original data.
Edit: I used algorithmic complexity just because it would be serve as a familiar example to Hacker News. In general you just need a comparison function in each dimension (and often you'll make a choice for each dimension as to whether you care about maximums or minimums -- they're directly interchangeable, so that's just for end user convenience). E.g., you might want to maximize crop yield and minimize input costs, or in a dimensionality reduction algorithm you might want to maximize separation while minimizing how different your reduction is from the original data by some metric. It's an extremely general purpose paradigm that sidesteps some of the issues that arise when you have multiple things you care about that can't be cleanly converted into a single cost function (all the hyperparameters in a machine learning model come to mind as well). You still need a mechanism for choosing things from the Pareto front, and there are whole fields of study in how people interact with a Pareto solver, but being able to restrict your inputs to _just_ the Pareto front is still a big win.
- anamexis 6 years ago
  
  That is very helpful, thank you!

cochleari_major 6 years ago

For low dimensions, it might be useful to look at the convex properties of the Pareto front. If a point P is on Pareto front, it is not in the interior of the convex hull of undominated points. In two dimensions, one can compute the convex hull of N points in O(N log N) time. This typically allows for faster Pareto front computation, but not in the worst case.

contravariant 6 years ago

Yeah I'm not sure this works. As far as I can tell the Pareto front doesn't have to be convex and can therefore contain points that are in the interior of the convex hull.
Of course if you accept convex combinations of options then the pareto front is part of the convex hull.
Jargerhaben 6 years ago

> If a point P is on Pareto front, it is not in the interior of the convex hull of undominated points.
Can you elaborate on what you mean by this? Is this under the assumption of a convex Pareto front? I may be misunderstanding.

Settings

Pareto Front Sort for Python

Keyboard Shortcuts