Abstract
Abstract
Parser generators have long been a savior for programmers, liberating them from the daunting task of crafting correct and maintainable parsers. Yet, this much-needed simplicity often comes at the expense of efficiency.
We present, Paguroidea, a parser generator that harnesses the power of lexer-parser fusion techniques to create parsers that boast user-friendly grammar definitions while delivering performance that rivals specialized parsers. Building upon the foundations of the flap parser, our work introduces a series of extensions.
One of our key contributions is a novel approach to the normalization method. By encoding reduction actions directly into the Deterministic Greibach Normal Form (DGNF), we provide parser generators with flexibility in manipulating semantic actions. This unique approach empowers developers with the freedom to customize their parser generators to their specific needs while maintaining semantic correctness.
Furthermore, we formulate the execution of the parser in substructural logic, providing an elegant way to prove the correctness of the amended normalization procedure. In this exposition, we offer a glimpse into efficacious, user-friendly, and correctness-provable parser generation.
Summary
To view this AI-generated plain language summary, you must have Premium access.
Formats available
You can view the full content in the following formats:
References
[1]
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., USA. isbn:0321486811
[2]
Andrew W. Appel. 1997. Modern Compiler Implementation in C: Basic Techniques. Cambridge University Press, USA. isbn:0521583896
[3]
Henry DeYoung and Frank Pfenning. 2016. Substructural Proofs as Automata. In Programming Languages and Systems, Atsushi Igarashi (Ed.). Springer International Publishing, Cham. 3–22. isbn:978-3-319-47958-3 https://doi.org/10.1007/978-3-319-47958-3_1
[4]
Bryan Ford. 2002. Packrat Parsing: Simple, Powerful, Lazy, Linear Time, Functional Pearl. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP ’02). Association for Computing Machinery, New York, NY, USA. 36–47. isbn:1581134878 https://doi.org/10.1145/581478.581483
[5]
Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-Based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’04). Association for Computing Machinery, New York, NY, USA. 111–122. isbn:158113729X https://doi.org/10.1145/964001.964011
[6]
Luke A. D. Hutchison. 2020. Pika parsing: reformulating packrat parsing as a dynamic programming algorithm solves the left recursion and error recovery problems. https://doi.org/10.48550/arXiv.2005.06444 arxiv:2005.06444.
[7]
Anastasia Izmaylova, Ali Afroozeh, and Tijs van der Storm. 2016. Practical, General Parser Combinators. In Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM ’16). Association for Computing Machinery, New York, NY, USA. 1–12. isbn:9781450340977 https://doi.org/10.1145/2847538.2847539
[8]
Oleg Kiselyov. 2014. The Design and Implementation of BER MetaOCaml - System Description. In Fuji International Symposium on Functional and Logic Programming. https://doi.org/10.1007/978-3-319-07151-0_6
[9]
Oleg Kiselyov. 2023. MetaOCaml Theory and Implementation. https://doi.org/10.48550/arXiv.2309.08207 arxiv:2309.08207.
[10]
Pieter Koopman and Rinus Plasmeijer. 2021. A New View on Parser Combinators. In Proceedings of the 31st Symposium on Implementation and Application of Functional Languages (IFL ’19). Association for Computing Machinery, New York, NY, USA. Article 6, 11 pages. isbn:9781450375627 https://doi.org/10.1145/3412932.3412938
[11]
Neelakantan R. Krishnaswami and Jeremy Yallop. 2019. A Typed, Algebraic Approach to Parsing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 379–393. isbn:9781450367127 https://doi.org/10.1145/3314221.3314625
[12]
Paul Liétar, Theodore Butler, Sylvan Clebsch, Sophia Drossopoulou, Juliana Franco, Matthew J. Parkinson, Alex Shamis, Christoph M. Wintersteiger, and David Chisnall. 2019. Snmalloc: A Message Passing Allocator. In Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management (ISMM 2019). Association for Computing Machinery, New York, NY, USA. 122–135. isbn:9781450367226 https://doi.org/10.1145/3315573.3329980
[13]
LLVM Project. 2023. “Clang” CFE Internals Manual. https://clang.llvm.org/docs/InternalsManual.html Accessed on Oct. 12, 2023
[14]
Elton M. Cardoso, Regina De Paula, Daniel Pereira, Leonardo Reis, and Rodrigo Geraldo Ribeiro. 2023. Type-Based Termination Analysis for Parsing Expression Grammars. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing (SAC ’23). Association for Computing Machinery, New York, NY, USA. 1372–1379. isbn:9781450395175 https://doi.org/10.1145/3555776.3577620
[15]
Maciej Hirsz. 2020. Stacking Lookup Tables in Logos. https://maciej.codes/2020-04-19-stacking-luts-in-logos.html Accessed on Jan. 20, 2024
[16]
Scott Owens, John Reppy, and Aaron Turon. 2009. Regular-Expression Derivatives Re-Examined. J. Funct. Program., 19, 2 (2009), mar, 173–190. issn:0956-7968 https://doi.org/10.1017/S0956796808007090
[17]
Francesco Paoli. 2002. Substructural Logics: A Primer. Springer, Dordrecht, Netherland.
[18]
Terence Parr, Sam Harwell, and Kathleen Fisher. 2014. Adaptive LL(*) Parsing: The Power of Dynamic Analysis. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). Association for Computing Machinery, New York, NY, USA. 579–598. isbn:9781450325851 https://doi.org/10.1145/2660193.2660202
[19]
Pest Developers. 2023. Pest. The Elegant Parser. https://pest.rs/ Accessed on Oct. 16, 2023
[20]
Audrius Saikunas. 2020. Just-in-Time Parsing with Scannerless Earley Virtual Machines. In Proceedings of the 3rd International Conference on Vision, Image and Signal Processing (ICVISP 2019). Association for Computing Machinery, New York, NY, USA. Article 90, 10 pages. isbn:9781450376259 https://doi.org/10.1145/3387168.3387216
[21]
Michael L. Scott. 2009. Programming Language Pragmatics, Third Edition (3rd ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. isbn:0123745144
[22]
Maarten P. Sijm. 2019. Incremental Scannerless Generalized LR Parsing. In Proceedings Companion of the 2019 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH Companion 2019). Association for Computing Machinery, New York, NY, USA. 54–56. isbn:9781450369923 https://doi.org/10.1145/3359061.3361085
[23]
S.D. Swierstra. 2001. Combinator Parsers: From Toys to Tools. Electronic Notes in Theoretical Computer Science, 41, 1 (2001), 38–59. issn:1571-0661 https://doi.org/10.1016/S1571-0661(05)80545-6 2000 ACM SIGPLAN Haskell Workshop (Satellite Event of PLI 2000)
[24]
Robert Tarjan. 1971. Depth-first search and linear graph algorithms. In 12th Annual Symposium on Switching and Automata Theory (swat 1971). 114–121. https://doi.org/10.1109/SWAT.1971.10
[25]
The Rust Team. 2023. Rust. A language empowering everyone to build reliable and efficient software. https://www.rust-lang.org/ Accessed on Oct. 16, 2023
[26]
Jeremy Yallop, Ningning Xie, and Neel Krishnaswami. 2023. flap: A Deterministic Parser with Fused Lexing. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), jun, 1194–1217. https://doi.org/10.1145/3591269
[27]
Yifan Zhu, Quartic Cat, Boluo Ge, and Shaotong Sun. 2024. Paguroidea: Fused Parser Generator with Transparent Semantic Actions. https://doi.org/10.5281/zenodo.10570638