Settings

Theme

Ask HN: Data Matching and Reconciliation machine learning algorithms suggestions

3 points by maddy1512 5 years ago · 3 comments · 1 min read

Reader

I am trying to solve Data reconciliation problem using ML and need suggestions on which algorithm would be suitable? Follow the link to get more elaboration: https://www.kaggle.com/questions-and-answers/171307

Imanari 5 years ago

In your example it seems the primary clue to find matches is the name, i.e. 'ABC' + Corp/Des/etc. So how about doing some fuzzy string matching? Once you have done this you can identify edge cases and additionally group by dates or whatever.

So you would have 'ABC' in L and a selection of matches in S. If not all of the matches in S actually belong to the ABC in L you are faced with the Knapsack Problem[0] that you can solve with different methods(sorry, no expert here).

[0] https://en.wikipedia.org/wiki/Knapsack_problem

doonesbury 5 years ago

You mean comparing data? For what purpose (to help assess solution) ... and why ML? Surely a rules engine is much more practical.

  • maddy1512OP 5 years ago

    Umm... not comparing data but taking a data point and finding its nearest data points whose amounts nets to zero. Rule engine might work on a data where the data is not complex but here there are a lot of complexities like you don't have exact matching features which gives enough surety to rule based matching engine.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection