Ask HN: How to Parse and Process BofA Statements?
I have previously seen some discussions on HN where folks have shared open-source or personally-developed solutions to parse reports and statements from various financial institutions.
I need to parse and visualize statements over last few years and see the trends of spending. The statements are available only as PDF now. Is there any way to do this?
Appreciate all pointers! I have once or twice had luck with the Python Camelot package, which you read about at https://camelot-py.readthedocs.io/en/master/ . But one can burn quantities of time trying to extract useful information from PDFs, with small results. I wish you luck. I’ve heard of tools in Python that can extract data and text from PDF files… My bank offers CSV downloads of the same data. Look for that first! :) The problem is that this is what BofA considers 'historic' data (older than 18 months, I think). Only PDF statements are available. And the text is encoded as well, which means I cannot just copy paste all transactions text and clean it up in sheets or a text editor (Amex has copyable text PDFs) By encoded you mean it's an image and not text? That sounds like a deliberately obtuse way to do things. Maybe crop your person data out of the pdf's using some script.. then use mechanical turk or some other piece work service to get your data typed in. Or try OCR.