724 lines
27 KiB
Plaintext
724 lines
27 KiB
Plaintext
|
* Soon
|
|||
|
** scan-code
|
|||
|
The default case is scanning char-per-char.
|
|||
|
|
|||
|
/* By default, grow the string obstack with the input. */
|
|||
|
.|\n STRING_GROW ();
|
|||
|
|
|||
|
make it more eager?
|
|||
|
|
|||
|
** Missing tests
|
|||
|
commit 2c294c132528ede23d8ae4959783a67e9ff05ac5
|
|||
|
Author: Vincent Imbimbo <vmi6@cornell.edu>
|
|||
|
Date: Sat Jan 23 13:25:18 2021 -0500
|
|||
|
|
|||
|
cex: fix state-item pruning
|
|||
|
|
|||
|
See https://lists.gnu.org/r/bug-bison/2021-01/msg00002.html
|
|||
|
|
|||
|
** pos_set_set
|
|||
|
The current approach is correct, but with poor performances. Bitsets need
|
|||
|
to support 'assign' and 'shift'. And instead of extending POS_SET just for
|
|||
|
the out-of-range new values, we need something like doubling the size.
|
|||
|
|
|||
|
** glr
|
|||
|
There is no test with "Parse on stack %ld rejected by rule %d" in it.
|
|||
|
|
|||
|
** yyrline etc.
|
|||
|
Clarify that rule numbers in the skeletons are 1-based.
|
|||
|
|
|||
|
** Macros in C++
|
|||
|
There are many macros that should obey api.prefix: YY_CPLUSPLUS, YY_MOVE,
|
|||
|
etc.
|
|||
|
|
|||
|
** yyerrok in Java
|
|||
|
And add tests in calc.at, to prepare work for D.
|
|||
|
|
|||
|
** YYERROR and yynerrs
|
|||
|
We are missing some cases. Write a test case, and check all the skeletons.
|
|||
|
|
|||
|
** Cex
|
|||
|
*** Improve gnulib
|
|||
|
Don't do this (counterexample.c):
|
|||
|
|
|||
|
// This is the fastest way to get the tail node from the gl_list API.
|
|||
|
gl_list_node_t
|
|||
|
list_get_end (gl_list_t list)
|
|||
|
{
|
|||
|
gl_list_node_t sentinel = gl_list_add_last (list, NULL);
|
|||
|
gl_list_node_t res = gl_list_previous_node (list, sentinel);
|
|||
|
gl_list_remove_node (list, sentinel);
|
|||
|
return res;
|
|||
|
}
|
|||
|
|
|||
|
*** Ambiguous rewriting
|
|||
|
If the user is stupid enough to have equal rules, then the derivations are
|
|||
|
harder to read:
|
|||
|
|
|||
|
Reduce/reduce conflict on tokens $end, "+", "⊕":
|
|||
|
2 exp: exp "+" exp .
|
|||
|
3 exp: exp "+" exp .
|
|||
|
Example exp "+" exp •
|
|||
|
First derivation exp ::=[ exp "+" exp • ]
|
|||
|
Example exp "+" exp •
|
|||
|
Second derivation exp ::=[ exp "+" exp • ]
|
|||
|
|
|||
|
Do we care about this? In color, we use twice the same color here, but we
|
|||
|
could try to use the same color for the same rule.
|
|||
|
|
|||
|
*** XML reports
|
|||
|
Show the counterexamples. This is going to be really hard and/or painful.
|
|||
|
Unless we play it dumb (little structure).
|
|||
|
|
|||
|
** Bistromathic
|
|||
|
- How about not evaluating incomplete lines when the text is not finished
|
|||
|
(as shells do).
|
|||
|
|
|||
|
** Questions
|
|||
|
*** Java
|
|||
|
- Should i18n be part of the Lexer? Currently it's a static method of
|
|||
|
Lexer.
|
|||
|
|
|||
|
- is there a migration path that would allow to use TokenKinds in
|
|||
|
yylex?
|
|||
|
|
|||
|
- define the tokens as an enum too.
|
|||
|
|
|||
|
- promote YYEOF rather than EOF.
|
|||
|
|
|||
|
** YYerror
|
|||
|
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/plural.y;h=a712255af4f2f739c93336d4ff6556d932a426a5;hb=HEAD
|
|||
|
|
|||
|
should be updated to not use YYERRCODE. Returning an undef token is good
|
|||
|
enough.
|
|||
|
|
|||
|
** Java
|
|||
|
*** calc.at
|
|||
|
Stop hard-coding "Calc". Adjust local.at (look for FIXME).
|
|||
|
|
|||
|
** doc
|
|||
|
I feel it's ugly to use the GNU style to declare functions in the doc. It
|
|||
|
generates tons of white space in the page, and may contribute to bad page
|
|||
|
breaks.
|
|||
|
|
|||
|
** consistency
|
|||
|
token vs terminal.
|
|||
|
|
|||
|
** api.token.raw
|
|||
|
The YYUNDEFTOK could be assigned a semantic value so that yyerror could be
|
|||
|
used to report invalid lexemes.
|
|||
|
|
|||
|
** push parsers
|
|||
|
Consider deprecating impure push parsers. They add a lot of complexity, for
|
|||
|
a bad feature. On the other hand, that would make it much harder to sit
|
|||
|
push parsers on top of pull parser. Which is currently not relevant, since
|
|||
|
push parsers are measurably slower.
|
|||
|
|
|||
|
** %define parse.error formatted
|
|||
|
How about pushing Bistromathic's yyreport_syntax_error as another standard
|
|||
|
way to generate the error message, and leave to the user the task of
|
|||
|
providing the message formats? Currently in bistro, it reads:
|
|||
|
|
|||
|
const char *
|
|||
|
error_format_string (int argc)
|
|||
|
{
|
|||
|
switch (argc)
|
|||
|
{
|
|||
|
default: /* Avoid compiler warnings. */
|
|||
|
case 0: return _("%@: syntax error");
|
|||
|
case 1: return _("%@: syntax error: unexpected %u");
|
|||
|
// TRANSLATORS: '%@' is a location in a file, '%u' is an
|
|||
|
// "unexpected token", and '%0e', '%1e'... are expected tokens
|
|||
|
// at this point.
|
|||
|
//
|
|||
|
// For instance on the expression "1 + * 2", you'd get
|
|||
|
//
|
|||
|
// 1.5: syntax error: expected - or ( or number or function or variable before *
|
|||
|
case 2: return _("%@: syntax error: expected %0e before %u");
|
|||
|
case 3: return _("%@: syntax error: expected %0e or %1e before %u");
|
|||
|
case 4: return _("%@: syntax error: expected %0e or %1e or %2e before %u");
|
|||
|
case 5: return _("%@: syntax error: expected %0e or %1e or %2e or %3e before %u");
|
|||
|
case 6: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e before %u");
|
|||
|
case 7: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e or %5e before %u");
|
|||
|
case 8: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e or %5e or %6e before %u");
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
The message would have to be generated in a string, and pushed to yyerror.
|
|||
|
Which will be a pain in the neck in yacc.c.
|
|||
|
|
|||
|
If we want to do that, we should think very carefully about the syntax of
|
|||
|
the format string.
|
|||
|
|
|||
|
** yyclearin does not invoke the lookahead token's %destructor
|
|||
|
https://lists.gnu.org/r/bug-bison/2018-02/msg00000.html
|
|||
|
Rici:
|
|||
|
|
|||
|
> Modifying yyclearin so that it calls yydestruct seems like the simplest
|
|||
|
> solution to this issue, but it is conceivable that such a change would
|
|||
|
> break programs which already perform some kind of workaround in order to
|
|||
|
> destruct the lookahead symbol. So it might be necessary to use some kind of
|
|||
|
> compatibility %define, or to create a new replacement macro with a
|
|||
|
> different name such as yydiscardin.
|
|||
|
>
|
|||
|
> At a minimum, the fact that yyclearin does not invoke the %destructor
|
|||
|
> should be highlighted in the documentation, since it is not at all obvious.
|
|||
|
|
|||
|
** Issues in i18n
|
|||
|
|
|||
|
Les catégories d'avertissements incluent :
|
|||
|
conflicts-sr conflits S/R (activé par défaut)
|
|||
|
conflicts-rr conflits R/R (activé par défaut)
|
|||
|
dangling-alias l'alias chaîne n'est pas attaché à un symbole
|
|||
|
deprecated construction obsolète
|
|||
|
empty-rule règle vide sans %empty
|
|||
|
midrule-values valeurs de règle intermédiaire non définies ou inutilisées
|
|||
|
precedence priorité et associativité inutiles
|
|||
|
yacc incompatibilités avec POSIX Yacc
|
|||
|
other tous les autres avertissements (activé par défaut)
|
|||
|
all tous les avertissements sauf « dangling-alias » et « yacc »
|
|||
|
no-CATEGORY désactiver les avertissements dans CATEGORIE
|
|||
|
none désactiver tous les avertissements
|
|||
|
error[=CATEGORY] traiter les avertissements comme des erreurs
|
|||
|
|
|||
|
Line -1 and -3 should mention CATEGORIE, not CATEGORY.
|
|||
|
|
|||
|
* Bison 3.9
|
|||
|
** Rewrite glr.cc (currently glr2.cc)
|
|||
|
*** custom error messages
|
|||
|
|
|||
|
*** Remove jumps
|
|||
|
We can probably replace setjmp/longjmp with exceptions. That would help
|
|||
|
tremendously other languages such as D and Java that probably have no
|
|||
|
similar feature. If we remove jumps, we probably no longer need _Noreturn,
|
|||
|
so simplify `b4_attribute_define([noreturn])` into `b4_attribute_define`.
|
|||
|
|
|||
|
After discussing with Valentin, it was decided that it's better to stay with
|
|||
|
jumps, since in some places exceptions are ruled out from C++.
|
|||
|
|
|||
|
*** Coding style
|
|||
|
Move to our coding conventions. In particular names such as yy_glr_stack,
|
|||
|
not yyGLRStack.
|
|||
|
|
|||
|
*** yydebug
|
|||
|
It should be a member of the parser object, see lalr1.cc. Let the parser
|
|||
|
object decide what the debug stream is, rather than open coding std::cerr.
|
|||
|
|
|||
|
*** Avoid pointers
|
|||
|
There are many places where pointers should be replaced with references.
|
|||
|
Some occurrences were fixed, but now some have improper names:
|
|||
|
|
|||
|
-yygetToken (int *yycharp, ]b4_namespace_ref[::]b4_parser_class[& yyparser][]b4_pure_if([, glr_stack* yystackp])[]b4_user_formals[)
|
|||
|
+yygetToken (int& yycharp, ]b4_namespace_ref[::]b4_parser_class[& yyparser][]b4_pure_if([, glr_stack* yystackp])[]b4_user_formals[)
|
|||
|
|
|||
|
yycharp is no longer a Pointer. And yystackp should probably also be a reference.
|
|||
|
|
|||
|
*** parse.assert
|
|||
|
Currently all the assertions are enabled. Once we are confident in glr2.cc,
|
|||
|
let parse.assert use the same approach as in lalr1.cc.
|
|||
|
|
|||
|
*** debug_stream
|
|||
|
Stop using std::cerr everywhere.
|
|||
|
|
|||
|
*** glr.c
|
|||
|
When glr2.cc fully replaces glr.cc, get rid of the glr.cc scaffolding in
|
|||
|
glr.c.
|
|||
|
|
|||
|
* Chains
|
|||
|
** Unit rules / Injection rules (Akim Demaille)
|
|||
|
Maybe we could expand unit rules (or "injections", see
|
|||
|
https://homepages.cwi.nl/~daybuild/daily-books/syntax/2-sdf/sdf.html), i.e.,
|
|||
|
transform
|
|||
|
|
|||
|
exp: arith | bool;
|
|||
|
arith: exp '+' exp;
|
|||
|
bool: exp '&' exp;
|
|||
|
|
|||
|
into
|
|||
|
|
|||
|
exp: exp '+' exp | exp '&' exp;
|
|||
|
|
|||
|
when there are no actions. This can significantly speed up some grammars.
|
|||
|
I can't find the papers. In particular the book 'LR parsing: Theory and
|
|||
|
Practice' is impossible to find, but according to 'Parsing Techniques: a
|
|||
|
Practical Guide', it includes information about this issue. Does anybody
|
|||
|
have it?
|
|||
|
|
|||
|
** clean up (Akim Demaille)
|
|||
|
Do not work on these items now, as I (Akim) have branches with a lot of
|
|||
|
changes in this area (hitting several files), and no desire to have to fix
|
|||
|
conflicts. Addressing these items will happen after my branches have been
|
|||
|
merged.
|
|||
|
|
|||
|
*** lalr.c
|
|||
|
Introduce a goto struct, and use it in place of from_state/to_state.
|
|||
|
Rename states1 as path, length as pathlen.
|
|||
|
Introduce inline functions for things such as nullable[*rp - ntokens]
|
|||
|
where we need to map from symbol number to nterm number.
|
|||
|
|
|||
|
There are probably a significant part of the relations management that
|
|||
|
should be migrated on top of a bitsetv.
|
|||
|
|
|||
|
*** closure
|
|||
|
It should probably take a "state*" instead of two arguments.
|
|||
|
|
|||
|
*** traces
|
|||
|
The "automaton" and "set" categories are not so useful. We should probably
|
|||
|
introduce lr(0) and lalr, just the way we have ielr categories. The
|
|||
|
"closure" function is too verbose, it should probably have its own category.
|
|||
|
|
|||
|
"set" can still be used for summarizing the important sets. That would make
|
|||
|
tests easy to maintain.
|
|||
|
|
|||
|
*** complain.*
|
|||
|
Rename these guys as "diagnostics.*" (or "diagnose.*"), since that's the
|
|||
|
name they have in GCC, clang, etc. Likewise for the complain_* series of
|
|||
|
functions.
|
|||
|
|
|||
|
*** ritem
|
|||
|
states/nstates, rules/nrules, ..., ritem/nritems
|
|||
|
Fix the latter.
|
|||
|
|
|||
|
*** m4: slot, type, type_tag
|
|||
|
The meaning of type_tag varies depending on api.value.type. We should avoid
|
|||
|
that and using clear definitions with stable semantics.
|
|||
|
|
|||
|
* D programming language
|
|||
|
There's a number of features that are missing, here sorted in _suggested_
|
|||
|
order of implementation.
|
|||
|
|
|||
|
When copying code from other skeletons, keep the comments exactly as they
|
|||
|
are. Keep the same variable names. If you change the wording in one place,
|
|||
|
do it in the others too. In other words: make sure to keep the
|
|||
|
maintenance *simple* by avoiding any gratuitous difference.
|
|||
|
|
|||
|
** CI
|
|||
|
Check when gdc and ldc.
|
|||
|
|
|||
|
** GLR Parser
|
|||
|
This is very ambitious. That's the final boss. There are currently no
|
|||
|
"clean" implementation to get inspiration from.
|
|||
|
|
|||
|
glr.c is very clean but:
|
|||
|
- is low-level C
|
|||
|
- is a different skeleton from yacc.c
|
|||
|
|
|||
|
glr.cc is (currently) an ugly hack: a C++ shell around glr.c. Valentin
|
|||
|
Tolmer is currently rewriting glr.cc to be clean C++, but he is not
|
|||
|
finished. There will be a lot a common code between lalr1.cc and glr.cc, so
|
|||
|
eventually I would like them to be fused into a single skeleton, supporting
|
|||
|
both deterministic and generalized parsing.
|
|||
|
|
|||
|
It would be great for D to also support this.
|
|||
|
|
|||
|
The basic ideas of GLR are explained here:
|
|||
|
|
|||
|
https://www.codeproject.com/Articles/5259825/GLR-Parsing-in-Csharp-How-to-Use-The-Most-Powerful
|
|||
|
|
|||
|
* Better error messages
|
|||
|
The users are not provided with enough tools to forge their error messages.
|
|||
|
See for instance "Is there an option to change the message produced by
|
|||
|
YYERROR_VERBOSE?" by Simon Sobisch, on bison-help.
|
|||
|
|
|||
|
See also
|
|||
|
https://www.cs.tufts.edu/~nr/cs257/archive/clinton-jefferey/lr-error-messages.pdf
|
|||
|
https://research.swtch.com/yyerror
|
|||
|
http://gallium.inria.fr/~fpottier/publis/fpottier-reachability-cc2016.pdf
|
|||
|
|
|||
|
* Modernization
|
|||
|
Fix data/skeletons/yacc.c so that it defines YYPTRDIFF_T properly for modern
|
|||
|
and older C++ compilers. Currently the code defaults to defining it to
|
|||
|
'long' for non-GCC compilers, but it should use the proper C++ magic to
|
|||
|
define it to the same type as the C ptrdiff_t type.
|
|||
|
|
|||
|
* Completion
|
|||
|
Several features are not available in all the back-ends.
|
|||
|
|
|||
|
- push parsers: glr.c, glr.cc, lalr1.cc (not very difficult)
|
|||
|
- token constructors: Java, C, D (a bit difficult)
|
|||
|
- glr: D, Java (super difficult)
|
|||
|
|
|||
|
* Bugs
|
|||
|
** Autotest has quotation issues
|
|||
|
tests/input.at:1730:AT_SETUP([%define errors])
|
|||
|
|
|||
|
->
|
|||
|
|
|||
|
$ ./tests/testsuite -l | grep errors | sed q
|
|||
|
38: input.at:1730 errors
|
|||
|
|
|||
|
* Short term
|
|||
|
** Better design for diagnostics
|
|||
|
The current implementation of diagnostics is ad hoc, it grew organically.
|
|||
|
It works as a series of calls to several functions, with dependency of the
|
|||
|
latter calls on the former. For instance:
|
|||
|
|
|||
|
complain (&sym->location,
|
|||
|
sym->content->status == needed ? complaint : Wother,
|
|||
|
_("symbol %s is used, but is not defined as a token"
|
|||
|
" and has no rules; did you mean %s?"),
|
|||
|
quote_n (0, sym->tag),
|
|||
|
quote_n (1, best->tag));
|
|||
|
if (feature_flag & feature_caret)
|
|||
|
location_caret_suggestion (sym->location, best->tag, stderr);
|
|||
|
|
|||
|
We should rewrite this in a more FP way:
|
|||
|
|
|||
|
1. build a rich structure that denotes the (complete) diagnostic.
|
|||
|
"Complete" in the sense that it also contains the suggestions, the list
|
|||
|
of possible matches, etc.
|
|||
|
|
|||
|
2. send this to the pretty-printing routine. The diagnostic structure
|
|||
|
should be sufficient so that we can generate all the 'format' of
|
|||
|
diagnostics, including the fixits.
|
|||
|
|
|||
|
If properly done, this diagnostic module can be detached from Bison and be
|
|||
|
put in gnulib. It could be used, for instance, for errors caught by
|
|||
|
xgettext.
|
|||
|
|
|||
|
There's certainly already something alike in GCC. At least that's the
|
|||
|
impression I get from reading the "-fdiagnostics-format=FORMAT" part of this
|
|||
|
page:
|
|||
|
|
|||
|
https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html
|
|||
|
|
|||
|
** Graphviz display code thoughts
|
|||
|
The code for the --graph option is over two files: print_graph, and
|
|||
|
graphviz. This is because Bison used to also produce VCG graphs, but since
|
|||
|
this is no longer true, maybe we could consider these files for fusion.
|
|||
|
|
|||
|
An other consideration worth noting is that print_graph.c (correct me if I
|
|||
|
am wrong) should contain generic functions, whereas graphviz.c and other
|
|||
|
potential files should contain just the specific code for that output
|
|||
|
format. It will probably prove difficult to tell if the implementation is
|
|||
|
actually generic whilst only having support for a single format, but it
|
|||
|
would be nice to keep stuff a bit tidier: right now, the construction of the
|
|||
|
bitset used to show reductions is in the graphviz-specific code, and on the
|
|||
|
opposite side we have some use of \l, which is graphviz-specific, in what
|
|||
|
should be generic code.
|
|||
|
|
|||
|
Little effort seems to have been given to factoring these files and their
|
|||
|
print{,-xml} counterpart. We would very much like to re-use the pretty format
|
|||
|
of states from .output for the graphs, etc.
|
|||
|
|
|||
|
Since graphviz dies on medium-to-big grammars, maybe consider an other tool?
|
|||
|
|
|||
|
** push-parser
|
|||
|
Check it too when checking the different kinds of parsers. And be
|
|||
|
sure to check that the initial-action is performed once per parsing.
|
|||
|
|
|||
|
** m4 names
|
|||
|
b4_shared_declarations is no longer what it is. Make it
|
|||
|
b4_parser_declaration for instance.
|
|||
|
|
|||
|
** yychar in lalr1.cc
|
|||
|
There is a large difference bw maint and master on the handling of
|
|||
|
yychar (which was removed in lalr1.cc). See what needs to be
|
|||
|
back-ported.
|
|||
|
|
|||
|
|
|||
|
/* User semantic actions sometimes alter yychar, and that requires
|
|||
|
that yytoken be updated with the new translation. We take the
|
|||
|
approach of translating immediately before every use of yytoken.
|
|||
|
One alternative is translating here after every semantic action,
|
|||
|
but that translation would be missed if the semantic action
|
|||
|
invokes YYABORT, YYACCEPT, or YYERROR immediately after altering
|
|||
|
yychar. In the case of YYABORT or YYACCEPT, an incorrect
|
|||
|
destructor might then be invoked immediately. In the case of
|
|||
|
YYERROR, subsequent parser actions might lead to an incorrect
|
|||
|
destructor call or verbose syntax error message before the
|
|||
|
lookahead is translated. */
|
|||
|
|
|||
|
/* Make sure we have latest lookahead translation. See comments at
|
|||
|
user semantic actions for why this is necessary. */
|
|||
|
yytoken = yytranslate_ (yychar);
|
|||
|
|
|||
|
|
|||
|
** Get rid of fake #lines [Bison: ...]
|
|||
|
Possibly as simple as checking whether the column number is nonnegative.
|
|||
|
|
|||
|
I have seen messages like the following from GCC.
|
|||
|
|
|||
|
<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
|
|||
|
|
|||
|
|
|||
|
** Discuss about %printer/%destroy in the case of C++.
|
|||
|
It would be very nice to provide the symbol classes with an operator<<
|
|||
|
and a destructor. Unfortunately the syntax we have chosen for
|
|||
|
%destroy and %printer make them hard to reuse. For instance, the user
|
|||
|
is invited to write something like
|
|||
|
|
|||
|
%printer { debug_stream() << $$; } <my_type>;
|
|||
|
|
|||
|
which is hard to reuse elsewhere since it wants to use
|
|||
|
"debug_stream()" to find the stream to use. The same applies to
|
|||
|
%destroy: we told the user she could use the members of the Parser
|
|||
|
class in the printers/destructors, which is not good for an operator<<
|
|||
|
since it is no longer bound to a particular parser, it's just a
|
|||
|
(standalone symbol).
|
|||
|
|
|||
|
* Various
|
|||
|
** Rewrite glr.cc in C++ (Valentin Tolmer)
|
|||
|
As a matter of fact, it would be very interesting to see how much we can
|
|||
|
share between lalr1.cc and glr.cc. Most of the skeletons should be common.
|
|||
|
It would be a very nice source of inspiration for the other languages.
|
|||
|
|
|||
|
Valentin Tolmer is working on this.
|
|||
|
|
|||
|
* From lalr1.cc to yacc.c
|
|||
|
** Single stack
|
|||
|
Merging the three stacks in lalr1.cc simplified the code, prompted for
|
|||
|
other improvements and also made it faster (probably because memory
|
|||
|
management is performed once instead of three times). I suggest that
|
|||
|
we do the same in yacc.c.
|
|||
|
|
|||
|
(Some time later): it's also very nice to have three stacks: it's more dense
|
|||
|
as we don't lose bits to padding. For instance the typical stack for states
|
|||
|
will use 8 bits, while it is likely to consume 32 bits in a struct.
|
|||
|
|
|||
|
We need trustworthy benchmarks for Bison, for all our backends. Akim has a
|
|||
|
few things scattered around; we need to put them in the repo, and make them
|
|||
|
more useful.
|
|||
|
|
|||
|
* Report
|
|||
|
|
|||
|
** Figures
|
|||
|
Some statistics about the grammar and the parser would be useful,
|
|||
|
especially when asking the user to send some information about the
|
|||
|
grammars she is working on. We should probably also include some
|
|||
|
information about the variables (I'm not sure for instance we even
|
|||
|
specify what LR variant was used).
|
|||
|
|
|||
|
** GLR
|
|||
|
How would Paul like to display the conflicted actions? In particular,
|
|||
|
what when two reductions are possible on a given lookahead token, but one is
|
|||
|
part of $default. Should we make the two reductions explicit, or just
|
|||
|
keep $default? See the following point.
|
|||
|
|
|||
|
** Disabled Reductions
|
|||
|
See 'tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
|
|||
|
what we want to do.
|
|||
|
|
|||
|
** Documentation
|
|||
|
Extend with error productions. The hard part will probably be finding
|
|||
|
the right rule so that a single state does not exhibit too many yet
|
|||
|
undocumented ''features''. Maybe an empty action ought to be
|
|||
|
presented too. Shall we try to make a single grammar with all these
|
|||
|
features, or should we have several very small grammars?
|
|||
|
|
|||
|
* Extensions
|
|||
|
** More languages?
|
|||
|
Well, only if there is really some demand for it.
|
|||
|
|
|||
|
*** PHP
|
|||
|
https://github.com/scfc/bison-php/blob/master/data/lalr1.php
|
|||
|
|
|||
|
*** Python
|
|||
|
https://lists.gnu.org/r/bison-patches/2013-09/msg00000.html and following
|
|||
|
|
|||
|
** Multiple start symbols
|
|||
|
Revert a70e75b8a41755ab96ab211a0ea111ac68a4aadd.
|
|||
|
Revert tests: disable "Multistart reports".
|
|||
|
|
|||
|
Would be very useful when parsing closely related languages. The idea is to
|
|||
|
declare several start symbols, for instance
|
|||
|
|
|||
|
%start stmt expr
|
|||
|
%%
|
|||
|
stmt: ...
|
|||
|
expr: ...
|
|||
|
|
|||
|
and to generate parse(), parse_stmt() and parse_expr(). Technically, the
|
|||
|
above grammar would be transformed into
|
|||
|
|
|||
|
%start yy_start
|
|||
|
%token YY_START_STMT YY_START_EXPR
|
|||
|
%%
|
|||
|
yy_start: YY_START_STMT stmt | YY_START_EXPR expr
|
|||
|
|
|||
|
so that there are no new conflicts in the grammar (as would undoubtedly
|
|||
|
happen with yy_start: stmt | expr). Then adjust the skeletons so that this
|
|||
|
initial token (YY_START_STMT, YY_START_EXPR) be shifted first in the
|
|||
|
corresponding parse function.
|
|||
|
|
|||
|
*** Number of useless symbols
|
|||
|
AT_TEST(
|
|||
|
[[%start exp;
|
|||
|
exp: exp;]],
|
|||
|
[[input.y: warning: 2 nonterminals useless in grammar [-Wother]
|
|||
|
input.y: warning: 2 rules useless in grammar [-Wother]
|
|||
|
input.y:2.8-10: error: start symbol exp does not derive any sentence]])
|
|||
|
|
|||
|
We should say "1 nonterminal": the other one is $accept, which should not
|
|||
|
participate in the count.
|
|||
|
|
|||
|
*** Tokens
|
|||
|
Do we want to disallow terminal start symbols? The limitation is not
|
|||
|
technical. Can it be useful to someone to "parse" a token?
|
|||
|
|
|||
|
** %include
|
|||
|
This is a popular demand. We already made many changes in the parser that
|
|||
|
should make this reasonably easy to implement.
|
|||
|
|
|||
|
Bruce Mardle <marblypup@yahoo.co.uk>
|
|||
|
https://lists.gnu.org/r/bison-patches/2015-09/msg00000.html
|
|||
|
|
|||
|
However, there are many other things to do before having such a feature,
|
|||
|
because I don't want a % equivalent to #include (which we all learned to
|
|||
|
hate). I want something that builds "modules" of grammars, and assembles
|
|||
|
them together, paying attention to keep separate bits separated, in pseudo
|
|||
|
name spaces.
|
|||
|
|
|||
|
** Push parsers
|
|||
|
There is demand for push parsers in C++.
|
|||
|
|
|||
|
** Generate code instead of tables
|
|||
|
This is certainly quite a lot of work. See
|
|||
|
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.4539.
|
|||
|
|
|||
|
** $-1
|
|||
|
We should find a means to provide an access to values deep in the
|
|||
|
stack. For instance, instead of
|
|||
|
|
|||
|
baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
|
|||
|
|
|||
|
we should be able to have:
|
|||
|
|
|||
|
foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
|
|||
|
|
|||
|
Or something like this.
|
|||
|
|
|||
|
** %if and the like
|
|||
|
It should be possible to have %if/%else/%endif. The implementation is
|
|||
|
not clear: should it be lexical or syntactic. Vadim Maslow thinks it
|
|||
|
must be in the scanner: we must not parse what is in a switched off
|
|||
|
part of %if. Akim Demaille thinks it should be in the parser, so as
|
|||
|
to avoid falling into another CPP mistake.
|
|||
|
|
|||
|
(Later): I'm sure there's actually good case for this. People who need that
|
|||
|
feature can use m4/cpp on top of Bison. I don't think it is worth the
|
|||
|
trouble in Bison itself.
|
|||
|
|
|||
|
** XML Output
|
|||
|
There are couple of available extensions of Bison targeting some XML
|
|||
|
output. Some day we should consider including them. One issue is
|
|||
|
that they seem to be quite orthogonal to the parsing technique, and
|
|||
|
seem to depend mostly on the possibility to have some code triggered
|
|||
|
for each reduction. As a matter of fact, such hooks could also be
|
|||
|
used to generate the yydebug traces. Some generic scheme probably
|
|||
|
exists in there.
|
|||
|
|
|||
|
XML output for GNU Bison and gcc
|
|||
|
http://www.cs.may.ie/~jpower/Research/bisonXML/
|
|||
|
|
|||
|
XML output for GNU Bison
|
|||
|
http://yaxx.sourceforge.net/
|
|||
|
|
|||
|
* Coding system independence
|
|||
|
Paul notes:
|
|||
|
|
|||
|
Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
|
|||
|
255). It also assumes that the 8-bit character encoding is
|
|||
|
the same for the invocation of 'bison' as it is for the
|
|||
|
invocation of 'cc', but this is not necessarily true when
|
|||
|
people run bison on an ASCII host and then use cc on an EBCDIC
|
|||
|
host. I don't think these topics are worth our time
|
|||
|
addressing (unless we find a gung-ho volunteer for EBCDIC or
|
|||
|
PDP-10 ports :-) but they should probably be documented
|
|||
|
somewhere.
|
|||
|
|
|||
|
More importantly, Bison does not currently allow NUL bytes in
|
|||
|
tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
|
|||
|
the source code. This should get fixed.
|
|||
|
|
|||
|
* Broken options?
|
|||
|
** %token-table
|
|||
|
** Skeleton strategy
|
|||
|
Must we keep %token-table?
|
|||
|
|
|||
|
* Precedence
|
|||
|
|
|||
|
** Partial order
|
|||
|
It is unfortunate that there is a total order for precedence. It
|
|||
|
makes it impossible to have modular precedence information. We should
|
|||
|
move to partial orders (sounds like series/parallel orders to me).
|
|||
|
|
|||
|
This is a prerequisite for modules.
|
|||
|
|
|||
|
* Pre and post actions.
|
|||
|
From: Florian Krohm <florian@edamail.fishkill.ibm.com>
|
|||
|
Subject: YYACT_EPILOGUE
|
|||
|
To: bug-bison@gnu.org
|
|||
|
X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
|
|||
|
|
|||
|
The other day I had the need for explicitly building the parse tree. I
|
|||
|
used %locations for that and defined YYLLOC_DEFAULT to call a function
|
|||
|
that returns the tree node for the production. Easy. But I also needed
|
|||
|
to assign the S-attribute to the tree node. That cannot be done in
|
|||
|
YYLLOC_DEFAULT, because it is invoked before the action is executed.
|
|||
|
The way I solved this was to define a macro YYACT_EPILOGUE that would
|
|||
|
be invoked after the action. For reasons of symmetry I also added
|
|||
|
YYACT_PROLOGUE. Although I had no use for that I can envision how it
|
|||
|
might come in handy for debugging purposes.
|
|||
|
All is needed is to add
|
|||
|
|
|||
|
#if YYLSP_NEEDED
|
|||
|
YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
|
|||
|
#else
|
|||
|
YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
|
|||
|
#endif
|
|||
|
|
|||
|
at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
|
|||
|
|
|||
|
I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
|
|||
|
to bison. If you're interested, I'll work on a patch.
|
|||
|
|
|||
|
* Better graphics
|
|||
|
Equip the parser with a means to create the (visual) parse tree.
|
|||
|
|
|||
|
|
|||
|
-----
|
|||
|
|
|||
|
# LocalWords: Cex gnulib gl Bistromathic TokenKinds yylex enum YYEOF EOF
|
|||
|
# LocalWords: YYerror gettext af hb YYERRCODE undef calc FIXME dev yyerror
|
|||
|
# LocalWords: Autoconf YYUNDEFTOK lexemes parsers Bistromathic's yyreport
|
|||
|
# LocalWords: const argc yacc yyclearin lookahead destructor Rici incluent
|
|||
|
# LocalWords: yydestruct yydiscardin catégories d'avertissements sr activé
|
|||
|
# LocalWords: conflits défaut rr l'alias chaîne n'est attaché un symbole
|
|||
|
# LocalWords: obsolète règle vide midrule valeurs de intermédiaire ou avec
|
|||
|
# LocalWords: définies inutilisées priorité associativité inutiles POSIX
|
|||
|
# LocalWords: incompatibilités tous les autres avertissements sauf dans rp
|
|||
|
# LocalWords: désactiver CATEGORIE traiter comme des erreurs glr Akim bool
|
|||
|
# LocalWords: Demaille arith lalr goto struct pathlen nullable ntokens lr
|
|||
|
# LocalWords: nterm bitsetv ielr ritem nstates nrules nritems yysymbol EQ
|
|||
|
# LocalWords: SymbolKind YYEMPTY YYUNDEF YYTNAME NUM yyntokens yytname sed
|
|||
|
# LocalWords: nonterminals yykind yycode YYNAMES yynames init getName conv
|
|||
|
# LocalWords: TokenKind ival yychar yylval yylexer Tolmer hoc
|
|||
|
# LocalWords: Sobisch YYPTRDIFF ptrdiff Autotest toknum yytoknum
|
|||
|
# LocalWords: sym Wother stderr FP fixits xgettext fdiagnostics Graphviz
|
|||
|
# LocalWords: graphviz VCG bitset xml bw maint yytoken YYABORT deps
|
|||
|
# LocalWords: YYACCEPT yytranslate nonnegative destructors yyerrlab repo
|
|||
|
# LocalWords: backends stmt expr yy Mardle baz qux Vadim Maslow CPP cpp
|
|||
|
# LocalWords: yydebug gcc UCHAR EBCDIC gung PDP NUL Pre Florian Krohm utf
|
|||
|
# LocalWords: YYACT YYLLOC YYLSP yyval yyvsp yylen yyloc yylsp endif
|
|||
|
# LocalWords: ispell american
|
|||
|
|
|||
|
Local Variables:
|
|||
|
mode: outline
|
|||
|
coding: utf-8
|
|||
|
fill-column: 76
|
|||
|
ispell-dictionary: "american"
|
|||
|
End:
|
|||
|
|
|||
|
Copyright (C) 2001-2004, 2006, 2008-2015, 2018-2021 Free Software
|
|||
|
Foundation, Inc.
|
|||
|
|
|||
|
This file is part of Bison, the GNU Compiler Compiler.
|
|||
|
|
|||
|
Permission is granted to copy, distribute and/or modify this document
|
|||
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|||
|
any later version published by the Free Software Foundation; with no
|
|||
|
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
|
|||
|
Texts. A copy of the license is included in the "GNU Free
|
|||
|
Documentation License" file as part of this distribution.
|