A technique to store rich formatting separately from text
frameableinc.comWorks great until the text gets changed.
This strategy works for applying semantic annotations of any kind to text. If you are marking up text to train a machine learning model that works as a "magic magic marker" (learns to mark up text the way you do) this is the ultimate way to do it because it is not dependent on any way to tokenize the text and lets you make markings that overlap.
Yes, it is true that with this approach, you have to make sure not to edit the text independently from its metadata.
I was working at a startup that was developing an annotation pipeline for text and I thought a lot about what set of operators you'd use to do things like: modify the text with index-controlled labels and keep the labels aligned, change the tokenization of labeled data (labeled per token) w/o having to re-label the data, etc.