Regex Generator++, THE Automatic Generator of Regular Expressions from Examples
regex.inginf.units.itWhat about if I want to extract a text based on a pattern which comes before the match?
eg:
Hello name: carl Your name: simon Enter your name: mark
I am only interested in the first name and not in the string "name:". Will it works?
Yes it works, it finds out the 'shape' of the surrounding text and it writes a regex using lookarounds operators which does the work right. So, yes, it builds an extractor which is able to extract only "simon" and "mark"
seems an interesting project, but I don't understand why the examples are splitted into "training" and "validation": sometimes the regex doesn't extract correctly all the strings and I suspect this is due to the dataset splitting.
Splitting the learning set in training and validation sets is very important. The validation set is used in order to select the solutions which have generalized (or understood) the problem for real. When you use all the knowledge for training, the algorithm can overfit, providing a solution that has a great performance on the training examples but has poor performance when you use it, for real, on unseen text. Splitting in training and validation leads to better solutions.
uhmmm... "evolving regex", something already seen: http://www.dcs.kcl.ac.uk/technical-reports/papers/TR-09-02.p...
Evolving regexes or programs is not an unknown topic but this is the only online tool which is able to find out regexes for text extraction, a real world application, that works for real. I have not find out other sites or applications able to do the job, providing a good solution.
great! ...but now you have 3 problems! :D