FWIW here is a small, self-contained recursive-descent parser I recently wrote in Python (that recognizes certain C data definitions).
https://github.com/oilshell/oil/blob/master/build/cpython_de...
I think your structure is bit odd because Nodes are "smart". That is, the style is more object oriented than functional.
If you clearly separate out the input data (lexer), code for the parser, and the resulting data it will be better. Pretty much all compilers and interpreters follow this structure, even ones in Ja (e.g. in Terrence Parr's books). AST nodes should be dumb (no methods related except trivial ones for pretty printing).
I think you might be getting confused between clauses of the grammar and nodes in an AST. They are similar but they don't he a one-to-one correspondence. (Generating a heterogeneous AST is generally more useful than generating a homogeneous parse tree.)
In my case I return tuples; in bigger projects I use Zephyr ASDL. (https://news.ycombinator.com/item?id=17852049)
Parser() is a class because it holds the state of the current token. That is necessary for LL(1) parser. It feels a bit redundant sometimes, but it is a straightforward and useful structure once you get used to it.
Writing an LL(1) recursive descent parser by hand is a good exercise even if you plan to use a parser generator eventually. IME it helps to write out the grammar in comments next to the code.
On the other hand, I he found that what parsing technique works best is very closely related to the actual language. It could be that something like Owl works for you language, but it's not obvious and would require some justification IMO.