724 lines
27 KiB
Plaintext
724 lines
27 KiB
Plaintext
* Soon
|
||
** scan-code
|
||
The default case is scanning char-per-char.
|
||
|
||
/* By default, grow the string obstack with the input. */
|
||
.|\n STRING_GROW ();
|
||
|
||
make it more eager?
|
||
|
||
** Missing tests
|
||
commit 2c294c132528ede23d8ae4959783a67e9ff05ac5
|
||
Author: Vincent Imbimbo <vmi6@cornell.edu>
|
||
Date: Sat Jan 23 13:25:18 2021 -0500
|
||
|
||
cex: fix state-item pruning
|
||
|
||
See https://lists.gnu.org/r/bug-bison/2021-01/msg00002.html
|
||
|
||
** pos_set_set
|
||
The current approach is correct, but with poor performances. Bitsets need
|
||
to support 'assign' and 'shift'. And instead of extending POS_SET just for
|
||
the out-of-range new values, we need something like doubling the size.
|
||
|
||
** glr
|
||
There is no test with "Parse on stack %ld rejected by rule %d" in it.
|
||
|
||
** yyrline etc.
|
||
Clarify that rule numbers in the skeletons are 1-based.
|
||
|
||
** Macros in C++
|
||
There are many macros that should obey api.prefix: YY_CPLUSPLUS, YY_MOVE,
|
||
etc.
|
||
|
||
** yyerrok in Java
|
||
And add tests in calc.at, to prepare work for D.
|
||
|
||
** YYERROR and yynerrs
|
||
We are missing some cases. Write a test case, and check all the skeletons.
|
||
|
||
** Cex
|
||
*** Improve gnulib
|
||
Don't do this (counterexample.c):
|
||
|
||
// This is the fastest way to get the tail node from the gl_list API.
|
||
gl_list_node_t
|
||
list_get_end (gl_list_t list)
|
||
{
|
||
gl_list_node_t sentinel = gl_list_add_last (list, NULL);
|
||
gl_list_node_t res = gl_list_previous_node (list, sentinel);
|
||
gl_list_remove_node (list, sentinel);
|
||
return res;
|
||
}
|
||
|
||
*** Ambiguous rewriting
|
||
If the user is stupid enough to have equal rules, then the derivations are
|
||
harder to read:
|
||
|
||
Reduce/reduce conflict on tokens $end, "+", "⊕":
|
||
2 exp: exp "+" exp .
|
||
3 exp: exp "+" exp .
|
||
Example exp "+" exp •
|
||
First derivation exp ::=[ exp "+" exp • ]
|
||
Example exp "+" exp •
|
||
Second derivation exp ::=[ exp "+" exp • ]
|
||
|
||
Do we care about this? In color, we use twice the same color here, but we
|
||
could try to use the same color for the same rule.
|
||
|
||
*** XML reports
|
||
Show the counterexamples. This is going to be really hard and/or painful.
|
||
Unless we play it dumb (little structure).
|
||
|
||
** Bistromathic
|
||
- How about not evaluating incomplete lines when the text is not finished
|
||
(as shells do).
|
||
|
||
** Questions
|
||
*** Java
|
||
- Should i18n be part of the Lexer? Currently it's a static method of
|
||
Lexer.
|
||
|
||
- is there a migration path that would allow to use TokenKinds in
|
||
yylex?
|
||
|
||
- define the tokens as an enum too.
|
||
|
||
- promote YYEOF rather than EOF.
|
||
|
||
** YYerror
|
||
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/plural.y;h=a712255af4f2f739c93336d4ff6556d932a426a5;hb=HEAD
|
||
|
||
should be updated to not use YYERRCODE. Returning an undef token is good
|
||
enough.
|
||
|
||
** Java
|
||
*** calc.at
|
||
Stop hard-coding "Calc". Adjust local.at (look for FIXME).
|
||
|
||
** doc
|
||
I feel it's ugly to use the GNU style to declare functions in the doc. It
|
||
generates tons of white space in the page, and may contribute to bad page
|
||
breaks.
|
||
|
||
** consistency
|
||
token vs terminal.
|
||
|
||
** api.token.raw
|
||
The YYUNDEFTOK could be assigned a semantic value so that yyerror could be
|
||
used to report invalid lexemes.
|
||
|
||
** push parsers
|
||
Consider deprecating impure push parsers. They add a lot of complexity, for
|
||
a bad feature. On the other hand, that would make it much harder to sit
|
||
push parsers on top of pull parser. Which is currently not relevant, since
|
||
push parsers are measurably slower.
|
||
|
||
** %define parse.error formatted
|
||
How about pushing Bistromathic's yyreport_syntax_error as another standard
|
||
way to generate the error message, and leave to the user the task of
|
||
providing the message formats? Currently in bistro, it reads:
|
||
|
||
const char *
|
||
error_format_string (int argc)
|
||
{
|
||
switch (argc)
|
||
{
|
||
default: /* Avoid compiler warnings. */
|
||
case 0: return _("%@: syntax error");
|
||
case 1: return _("%@: syntax error: unexpected %u");
|
||
// TRANSLATORS: '%@' is a location in a file, '%u' is an
|
||
// "unexpected token", and '%0e', '%1e'... are expected tokens
|
||
// at this point.
|
||
//
|
||
// For instance on the expression "1 + * 2", you'd get
|
||
//
|
||
// 1.5: syntax error: expected - or ( or number or function or variable before *
|
||
case 2: return _("%@: syntax error: expected %0e before %u");
|
||
case 3: return _("%@: syntax error: expected %0e or %1e before %u");
|
||
case 4: return _("%@: syntax error: expected %0e or %1e or %2e before %u");
|
||
case 5: return _("%@: syntax error: expected %0e or %1e or %2e or %3e before %u");
|
||
case 6: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e before %u");
|
||
case 7: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e or %5e before %u");
|
||
case 8: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e or %5e or %6e before %u");
|
||
}
|
||
}
|
||
|
||
The message would have to be generated in a string, and pushed to yyerror.
|
||
Which will be a pain in the neck in yacc.c.
|
||
|
||
If we want to do that, we should think very carefully about the syntax of
|
||
the format string.
|
||
|
||
** yyclearin does not invoke the lookahead token's %destructor
|
||
https://lists.gnu.org/r/bug-bison/2018-02/msg00000.html
|
||
Rici:
|
||
|
||
> Modifying yyclearin so that it calls yydestruct seems like the simplest
|
||
> solution to this issue, but it is conceivable that such a change would
|
||
> break programs which already perform some kind of workaround in order to
|
||
> destruct the lookahead symbol. So it might be necessary to use some kind of
|
||
> compatibility %define, or to create a new replacement macro with a
|
||
> different name such as yydiscardin.
|
||
>
|
||
> At a minimum, the fact that yyclearin does not invoke the %destructor
|
||
> should be highlighted in the documentation, since it is not at all obvious.
|
||
|
||
** Issues in i18n
|
||
|
||
Les catégories d'avertissements incluent :
|
||
conflicts-sr conflits S/R (activé par défaut)
|
||
conflicts-rr conflits R/R (activé par défaut)
|
||
dangling-alias l'alias chaîne n'est pas attaché à un symbole
|
||
deprecated construction obsolète
|
||
empty-rule règle vide sans %empty
|
||
midrule-values valeurs de règle intermédiaire non définies ou inutilisées
|
||
precedence priorité et associativité inutiles
|
||
yacc incompatibilités avec POSIX Yacc
|
||
other tous les autres avertissements (activé par défaut)
|
||
all tous les avertissements sauf « dangling-alias » et « yacc »
|
||
no-CATEGORY désactiver les avertissements dans CATEGORIE
|
||
none désactiver tous les avertissements
|
||
error[=CATEGORY] traiter les avertissements comme des erreurs
|
||
|
||
Line -1 and -3 should mention CATEGORIE, not CATEGORY.
|
||
|
||
* Bison 3.9
|
||
** Rewrite glr.cc (currently glr2.cc)
|
||
*** custom error messages
|
||
|
||
*** Remove jumps
|
||
We can probably replace setjmp/longjmp with exceptions. That would help
|
||
tremendously other languages such as D and Java that probably have no
|
||
similar feature. If we remove jumps, we probably no longer need _Noreturn,
|
||
so simplify `b4_attribute_define([noreturn])` into `b4_attribute_define`.
|
||
|
||
After discussing with Valentin, it was decided that it's better to stay with
|
||
jumps, since in some places exceptions are ruled out from C++.
|
||
|
||
*** Coding style
|
||
Move to our coding conventions. In particular names such as yy_glr_stack,
|
||
not yyGLRStack.
|
||
|
||
*** yydebug
|
||
It should be a member of the parser object, see lalr1.cc. Let the parser
|
||
object decide what the debug stream is, rather than open coding std::cerr.
|
||
|
||
*** Avoid pointers
|
||
There are many places where pointers should be replaced with references.
|
||
Some occurrences were fixed, but now some have improper names:
|
||
|
||
-yygetToken (int *yycharp, ]b4_namespace_ref[::]b4_parser_class[& yyparser][]b4_pure_if([, glr_stack* yystackp])[]b4_user_formals[)
|
||
+yygetToken (int& yycharp, ]b4_namespace_ref[::]b4_parser_class[& yyparser][]b4_pure_if([, glr_stack* yystackp])[]b4_user_formals[)
|
||
|
||
yycharp is no longer a Pointer. And yystackp should probably also be a reference.
|
||
|
||
*** parse.assert
|
||
Currently all the assertions are enabled. Once we are confident in glr2.cc,
|
||
let parse.assert use the same approach as in lalr1.cc.
|
||
|
||
*** debug_stream
|
||
Stop using std::cerr everywhere.
|
||
|
||
*** glr.c
|
||
When glr2.cc fully replaces glr.cc, get rid of the glr.cc scaffolding in
|
||
glr.c.
|
||
|
||
* Chains
|
||
** Unit rules / Injection rules (Akim Demaille)
|
||
Maybe we could expand unit rules (or "injections", see
|
||
https://homepages.cwi.nl/~daybuild/daily-books/syntax/2-sdf/sdf.html), i.e.,
|
||
transform
|
||
|
||
exp: arith | bool;
|
||
arith: exp '+' exp;
|
||
bool: exp '&' exp;
|
||
|
||
into
|
||
|
||
exp: exp '+' exp | exp '&' exp;
|
||
|
||
when there are no actions. This can significantly speed up some grammars.
|
||
I can't find the papers. In particular the book 'LR parsing: Theory and
|
||
Practice' is impossible to find, but according to 'Parsing Techniques: a
|
||
Practical Guide', it includes information about this issue. Does anybody
|
||
have it?
|
||
|
||
** clean up (Akim Demaille)
|
||
Do not work on these items now, as I (Akim) have branches with a lot of
|
||
changes in this area (hitting several files), and no desire to have to fix
|
||
conflicts. Addressing these items will happen after my branches have been
|
||
merged.
|
||
|
||
*** lalr.c
|
||
Introduce a goto struct, and use it in place of from_state/to_state.
|
||
Rename states1 as path, length as pathlen.
|
||
Introduce inline functions for things such as nullable[*rp - ntokens]
|
||
where we need to map from symbol number to nterm number.
|
||
|
||
There are probably a significant part of the relations management that
|
||
should be migrated on top of a bitsetv.
|
||
|
||
*** closure
|
||
It should probably take a "state*" instead of two arguments.
|
||
|
||
*** traces
|
||
The "automaton" and "set" categories are not so useful. We should probably
|
||
introduce lr(0) and lalr, just the way we have ielr categories. The
|
||
"closure" function is too verbose, it should probably have its own category.
|
||
|
||
"set" can still be used for summarizing the important sets. That would make
|
||
tests easy to maintain.
|
||
|
||
*** complain.*
|
||
Rename these guys as "diagnostics.*" (or "diagnose.*"), since that's the
|
||
name they have in GCC, clang, etc. Likewise for the complain_* series of
|
||
functions.
|
||
|
||
*** ritem
|
||
states/nstates, rules/nrules, ..., ritem/nritems
|
||
Fix the latter.
|
||
|
||
*** m4: slot, type, type_tag
|
||
The meaning of type_tag varies depending on api.value.type. We should avoid
|
||
that and using clear definitions with stable semantics.
|
||
|
||
* D programming language
|
||
There's a number of features that are missing, here sorted in _suggested_
|
||
order of implementation.
|
||
|
||
When copying code from other skeletons, keep the comments exactly as they
|
||
are. Keep the same variable names. If you change the wording in one place,
|
||
do it in the others too. In other words: make sure to keep the
|
||
maintenance *simple* by avoiding any gratuitous difference.
|
||
|
||
** CI
|
||
Check when gdc and ldc.
|
||
|
||
** GLR Parser
|
||
This is very ambitious. That's the final boss. There are currently no
|
||
"clean" implementation to get inspiration from.
|
||
|
||
glr.c is very clean but:
|
||
- is low-level C
|
||
- is a different skeleton from yacc.c
|
||
|
||
glr.cc is (currently) an ugly hack: a C++ shell around glr.c. Valentin
|
||
Tolmer is currently rewriting glr.cc to be clean C++, but he is not
|
||
finished. There will be a lot a common code between lalr1.cc and glr.cc, so
|
||
eventually I would like them to be fused into a single skeleton, supporting
|
||
both deterministic and generalized parsing.
|
||
|
||
It would be great for D to also support this.
|
||
|
||
The basic ideas of GLR are explained here:
|
||
|
||
https://www.codeproject.com/Articles/5259825/GLR-Parsing-in-Csharp-How-to-Use-The-Most-Powerful
|
||
|
||
* Better error messages
|
||
The users are not provided with enough tools to forge their error messages.
|
||
See for instance "Is there an option to change the message produced by
|
||
YYERROR_VERBOSE?" by Simon Sobisch, on bison-help.
|
||
|
||
See also
|
||
https://www.cs.tufts.edu/~nr/cs257/archive/clinton-jefferey/lr-error-messages.pdf
|
||
https://research.swtch.com/yyerror
|
||
http://gallium.inria.fr/~fpottier/publis/fpottier-reachability-cc2016.pdf
|
||
|
||
* Modernization
|
||
Fix data/skeletons/yacc.c so that it defines YYPTRDIFF_T properly for modern
|
||
and older C++ compilers. Currently the code defaults to defining it to
|
||
'long' for non-GCC compilers, but it should use the proper C++ magic to
|
||
define it to the same type as the C ptrdiff_t type.
|
||
|
||
* Completion
|
||
Several features are not available in all the back-ends.
|
||
|
||
- push parsers: glr.c, glr.cc, lalr1.cc (not very difficult)
|
||
- token constructors: Java, C, D (a bit difficult)
|
||
- glr: D, Java (super difficult)
|
||
|
||
* Bugs
|
||
** Autotest has quotation issues
|
||
tests/input.at:1730:AT_SETUP([%define errors])
|
||
|
||
->
|
||
|
||
$ ./tests/testsuite -l | grep errors | sed q
|
||
38: input.at:1730 errors
|
||
|
||
* Short term
|
||
** Better design for diagnostics
|
||
The current implementation of diagnostics is ad hoc, it grew organically.
|
||
It works as a series of calls to several functions, with dependency of the
|
||
latter calls on the former. For instance:
|
||
|
||
complain (&sym->location,
|
||
sym->content->status == needed ? complaint : Wother,
|
||
_("symbol %s is used, but is not defined as a token"
|
||
" and has no rules; did you mean %s?"),
|
||
quote_n (0, sym->tag),
|
||
quote_n (1, best->tag));
|
||
if (feature_flag & feature_caret)
|
||
location_caret_suggestion (sym->location, best->tag, stderr);
|
||
|
||
We should rewrite this in a more FP way:
|
||
|
||
1. build a rich structure that denotes the (complete) diagnostic.
|
||
"Complete" in the sense that it also contains the suggestions, the list
|
||
of possible matches, etc.
|
||
|
||
2. send this to the pretty-printing routine. The diagnostic structure
|
||
should be sufficient so that we can generate all the 'format' of
|
||
diagnostics, including the fixits.
|
||
|
||
If properly done, this diagnostic module can be detached from Bison and be
|
||
put in gnulib. It could be used, for instance, for errors caught by
|
||
xgettext.
|
||
|
||
There's certainly already something alike in GCC. At least that's the
|
||
impression I get from reading the "-fdiagnostics-format=FORMAT" part of this
|
||
page:
|
||
|
||
https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html
|
||
|
||
** Graphviz display code thoughts
|
||
The code for the --graph option is over two files: print_graph, and
|
||
graphviz. This is because Bison used to also produce VCG graphs, but since
|
||
this is no longer true, maybe we could consider these files for fusion.
|
||
|
||
An other consideration worth noting is that print_graph.c (correct me if I
|
||
am wrong) should contain generic functions, whereas graphviz.c and other
|
||
potential files should contain just the specific code for that output
|
||
format. It will probably prove difficult to tell if the implementation is
|
||
actually generic whilst only having support for a single format, but it
|
||
would be nice to keep stuff a bit tidier: right now, the construction of the
|
||
bitset used to show reductions is in the graphviz-specific code, and on the
|
||
opposite side we have some use of \l, which is graphviz-specific, in what
|
||
should be generic code.
|
||
|
||
Little effort seems to have been given to factoring these files and their
|
||
print{,-xml} counterpart. We would very much like to re-use the pretty format
|
||
of states from .output for the graphs, etc.
|
||
|
||
Since graphviz dies on medium-to-big grammars, maybe consider an other tool?
|
||
|
||
** push-parser
|
||
Check it too when checking the different kinds of parsers. And be
|
||
sure to check that the initial-action is performed once per parsing.
|
||
|
||
** m4 names
|
||
b4_shared_declarations is no longer what it is. Make it
|
||
b4_parser_declaration for instance.
|
||
|
||
** yychar in lalr1.cc
|
||
There is a large difference bw maint and master on the handling of
|
||
yychar (which was removed in lalr1.cc). See what needs to be
|
||
back-ported.
|
||
|
||
|
||
/* User semantic actions sometimes alter yychar, and that requires
|
||
that yytoken be updated with the new translation. We take the
|
||
approach of translating immediately before every use of yytoken.
|
||
One alternative is translating here after every semantic action,
|
||
but that translation would be missed if the semantic action
|
||
invokes YYABORT, YYACCEPT, or YYERROR immediately after altering
|
||
yychar. In the case of YYABORT or YYACCEPT, an incorrect
|
||
destructor might then be invoked immediately. In the case of
|
||
YYERROR, subsequent parser actions might lead to an incorrect
|
||
destructor call or verbose syntax error message before the
|
||
lookahead is translated. */
|
||
|
||
/* Make sure we have latest lookahead translation. See comments at
|
||
user semantic actions for why this is necessary. */
|
||
yytoken = yytranslate_ (yychar);
|
||
|
||
|
||
** Get rid of fake #lines [Bison: ...]
|
||
Possibly as simple as checking whether the column number is nonnegative.
|
||
|
||
I have seen messages like the following from GCC.
|
||
|
||
<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
|
||
|
||
|
||
** Discuss about %printer/%destroy in the case of C++.
|
||
It would be very nice to provide the symbol classes with an operator<<
|
||
and a destructor. Unfortunately the syntax we have chosen for
|
||
%destroy and %printer make them hard to reuse. For instance, the user
|
||
is invited to write something like
|
||
|
||
%printer { debug_stream() << $$; } <my_type>;
|
||
|
||
which is hard to reuse elsewhere since it wants to use
|
||
"debug_stream()" to find the stream to use. The same applies to
|
||
%destroy: we told the user she could use the members of the Parser
|
||
class in the printers/destructors, which is not good for an operator<<
|
||
since it is no longer bound to a particular parser, it's just a
|
||
(standalone symbol).
|
||
|
||
* Various
|
||
** Rewrite glr.cc in C++ (Valentin Tolmer)
|
||
As a matter of fact, it would be very interesting to see how much we can
|
||
share between lalr1.cc and glr.cc. Most of the skeletons should be common.
|
||
It would be a very nice source of inspiration for the other languages.
|
||
|
||
Valentin Tolmer is working on this.
|
||
|
||
* From lalr1.cc to yacc.c
|
||
** Single stack
|
||
Merging the three stacks in lalr1.cc simplified the code, prompted for
|
||
other improvements and also made it faster (probably because memory
|
||
management is performed once instead of three times). I suggest that
|
||
we do the same in yacc.c.
|
||
|
||
(Some time later): it's also very nice to have three stacks: it's more dense
|
||
as we don't lose bits to padding. For instance the typical stack for states
|
||
will use 8 bits, while it is likely to consume 32 bits in a struct.
|
||
|
||
We need trustworthy benchmarks for Bison, for all our backends. Akim has a
|
||
few things scattered around; we need to put them in the repo, and make them
|
||
more useful.
|
||
|
||
* Report
|
||
|
||
** Figures
|
||
Some statistics about the grammar and the parser would be useful,
|
||
especially when asking the user to send some information about the
|
||
grammars she is working on. We should probably also include some
|
||
information about the variables (I'm not sure for instance we even
|
||
specify what LR variant was used).
|
||
|
||
** GLR
|
||
How would Paul like to display the conflicted actions? In particular,
|
||
what when two reductions are possible on a given lookahead token, but one is
|
||
part of $default. Should we make the two reductions explicit, or just
|
||
keep $default? See the following point.
|
||
|
||
** Disabled Reductions
|
||
See 'tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
|
||
what we want to do.
|
||
|
||
** Documentation
|
||
Extend with error productions. The hard part will probably be finding
|
||
the right rule so that a single state does not exhibit too many yet
|
||
undocumented ''features''. Maybe an empty action ought to be
|
||
presented too. Shall we try to make a single grammar with all these
|
||
features, or should we have several very small grammars?
|
||
|
||
* Extensions
|
||
** More languages?
|
||
Well, only if there is really some demand for it.
|
||
|
||
*** PHP
|
||
https://github.com/scfc/bison-php/blob/master/data/lalr1.php
|
||
|
||
*** Python
|
||
https://lists.gnu.org/r/bison-patches/2013-09/msg00000.html and following
|
||
|
||
** Multiple start symbols
|
||
Revert a70e75b8a41755ab96ab211a0ea111ac68a4aadd.
|
||
Revert tests: disable "Multistart reports".
|
||
|
||
Would be very useful when parsing closely related languages. The idea is to
|
||
declare several start symbols, for instance
|
||
|
||
%start stmt expr
|
||
%%
|
||
stmt: ...
|
||
expr: ...
|
||
|
||
and to generate parse(), parse_stmt() and parse_expr(). Technically, the
|
||
above grammar would be transformed into
|
||
|
||
%start yy_start
|
||
%token YY_START_STMT YY_START_EXPR
|
||
%%
|
||
yy_start: YY_START_STMT stmt | YY_START_EXPR expr
|
||
|
||
so that there are no new conflicts in the grammar (as would undoubtedly
|
||
happen with yy_start: stmt | expr). Then adjust the skeletons so that this
|
||
initial token (YY_START_STMT, YY_START_EXPR) be shifted first in the
|
||
corresponding parse function.
|
||
|
||
*** Number of useless symbols
|
||
AT_TEST(
|
||
[[%start exp;
|
||
exp: exp;]],
|
||
[[input.y: warning: 2 nonterminals useless in grammar [-Wother]
|
||
input.y: warning: 2 rules useless in grammar [-Wother]
|
||
input.y:2.8-10: error: start symbol exp does not derive any sentence]])
|
||
|
||
We should say "1 nonterminal": the other one is $accept, which should not
|
||
participate in the count.
|
||
|
||
*** Tokens
|
||
Do we want to disallow terminal start symbols? The limitation is not
|
||
technical. Can it be useful to someone to "parse" a token?
|
||
|
||
** %include
|
||
This is a popular demand. We already made many changes in the parser that
|
||
should make this reasonably easy to implement.
|
||
|
||
Bruce Mardle <marblypup@yahoo.co.uk>
|
||
https://lists.gnu.org/r/bison-patches/2015-09/msg00000.html
|
||
|
||
However, there are many other things to do before having such a feature,
|
||
because I don't want a % equivalent to #include (which we all learned to
|
||
hate). I want something that builds "modules" of grammars, and assembles
|
||
them together, paying attention to keep separate bits separated, in pseudo
|
||
name spaces.
|
||
|
||
** Push parsers
|
||
There is demand for push parsers in C++.
|
||
|
||
** Generate code instead of tables
|
||
This is certainly quite a lot of work. See
|
||
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.4539.
|
||
|
||
** $-1
|
||
We should find a means to provide an access to values deep in the
|
||
stack. For instance, instead of
|
||
|
||
baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
|
||
|
||
we should be able to have:
|
||
|
||
foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
|
||
|
||
Or something like this.
|
||
|
||
** %if and the like
|
||
It should be possible to have %if/%else/%endif. The implementation is
|
||
not clear: should it be lexical or syntactic. Vadim Maslow thinks it
|
||
must be in the scanner: we must not parse what is in a switched off
|
||
part of %if. Akim Demaille thinks it should be in the parser, so as
|
||
to avoid falling into another CPP mistake.
|
||
|
||
(Later): I'm sure there's actually good case for this. People who need that
|
||
feature can use m4/cpp on top of Bison. I don't think it is worth the
|
||
trouble in Bison itself.
|
||
|
||
** XML Output
|
||
There are couple of available extensions of Bison targeting some XML
|
||
output. Some day we should consider including them. One issue is
|
||
that they seem to be quite orthogonal to the parsing technique, and
|
||
seem to depend mostly on the possibility to have some code triggered
|
||
for each reduction. As a matter of fact, such hooks could also be
|
||
used to generate the yydebug traces. Some generic scheme probably
|
||
exists in there.
|
||
|
||
XML output for GNU Bison and gcc
|
||
http://www.cs.may.ie/~jpower/Research/bisonXML/
|
||
|
||
XML output for GNU Bison
|
||
http://yaxx.sourceforge.net/
|
||
|
||
* Coding system independence
|
||
Paul notes:
|
||
|
||
Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
|
||
255). It also assumes that the 8-bit character encoding is
|
||
the same for the invocation of 'bison' as it is for the
|
||
invocation of 'cc', but this is not necessarily true when
|
||
people run bison on an ASCII host and then use cc on an EBCDIC
|
||
host. I don't think these topics are worth our time
|
||
addressing (unless we find a gung-ho volunteer for EBCDIC or
|
||
PDP-10 ports :-) but they should probably be documented
|
||
somewhere.
|
||
|
||
More importantly, Bison does not currently allow NUL bytes in
|
||
tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
|
||
the source code. This should get fixed.
|
||
|
||
* Broken options?
|
||
** %token-table
|
||
** Skeleton strategy
|
||
Must we keep %token-table?
|
||
|
||
* Precedence
|
||
|
||
** Partial order
|
||
It is unfortunate that there is a total order for precedence. It
|
||
makes it impossible to have modular precedence information. We should
|
||
move to partial orders (sounds like series/parallel orders to me).
|
||
|
||
This is a prerequisite for modules.
|
||
|
||
* Pre and post actions.
|
||
From: Florian Krohm <florian@edamail.fishkill.ibm.com>
|
||
Subject: YYACT_EPILOGUE
|
||
To: bug-bison@gnu.org
|
||
X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
|
||
|
||
The other day I had the need for explicitly building the parse tree. I
|
||
used %locations for that and defined YYLLOC_DEFAULT to call a function
|
||
that returns the tree node for the production. Easy. But I also needed
|
||
to assign the S-attribute to the tree node. That cannot be done in
|
||
YYLLOC_DEFAULT, because it is invoked before the action is executed.
|
||
The way I solved this was to define a macro YYACT_EPILOGUE that would
|
||
be invoked after the action. For reasons of symmetry I also added
|
||
YYACT_PROLOGUE. Although I had no use for that I can envision how it
|
||
might come in handy for debugging purposes.
|
||
All is needed is to add
|
||
|
||
#if YYLSP_NEEDED
|
||
YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
|
||
#else
|
||
YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
|
||
#endif
|
||
|
||
at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
|
||
|
||
I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
|
||
to bison. If you're interested, I'll work on a patch.
|
||
|
||
* Better graphics
|
||
Equip the parser with a means to create the (visual) parse tree.
|
||
|
||
|
||
-----
|
||
|
||
# LocalWords: Cex gnulib gl Bistromathic TokenKinds yylex enum YYEOF EOF
|
||
# LocalWords: YYerror gettext af hb YYERRCODE undef calc FIXME dev yyerror
|
||
# LocalWords: Autoconf YYUNDEFTOK lexemes parsers Bistromathic's yyreport
|
||
# LocalWords: const argc yacc yyclearin lookahead destructor Rici incluent
|
||
# LocalWords: yydestruct yydiscardin catégories d'avertissements sr activé
|
||
# LocalWords: conflits défaut rr l'alias chaîne n'est attaché un symbole
|
||
# LocalWords: obsolète règle vide midrule valeurs de intermédiaire ou avec
|
||
# LocalWords: définies inutilisées priorité associativité inutiles POSIX
|
||
# LocalWords: incompatibilités tous les autres avertissements sauf dans rp
|
||
# LocalWords: désactiver CATEGORIE traiter comme des erreurs glr Akim bool
|
||
# LocalWords: Demaille arith lalr goto struct pathlen nullable ntokens lr
|
||
# LocalWords: nterm bitsetv ielr ritem nstates nrules nritems yysymbol EQ
|
||
# LocalWords: SymbolKind YYEMPTY YYUNDEF YYTNAME NUM yyntokens yytname sed
|
||
# LocalWords: nonterminals yykind yycode YYNAMES yynames init getName conv
|
||
# LocalWords: TokenKind ival yychar yylval yylexer Tolmer hoc
|
||
# LocalWords: Sobisch YYPTRDIFF ptrdiff Autotest toknum yytoknum
|
||
# LocalWords: sym Wother stderr FP fixits xgettext fdiagnostics Graphviz
|
||
# LocalWords: graphviz VCG bitset xml bw maint yytoken YYABORT deps
|
||
# LocalWords: YYACCEPT yytranslate nonnegative destructors yyerrlab repo
|
||
# LocalWords: backends stmt expr yy Mardle baz qux Vadim Maslow CPP cpp
|
||
# LocalWords: yydebug gcc UCHAR EBCDIC gung PDP NUL Pre Florian Krohm utf
|
||
# LocalWords: YYACT YYLLOC YYLSP yyval yyvsp yylen yyloc yylsp endif
|
||
# LocalWords: ispell american
|
||
|
||
Local Variables:
|
||
mode: outline
|
||
coding: utf-8
|
||
fill-column: 76
|
||
ispell-dictionary: "american"
|
||
End:
|
||
|
||
Copyright (C) 2001-2004, 2006, 2008-2015, 2018-2021 Free Software
|
||
Foundation, Inc.
|
||
|
||
This file is part of Bison, the GNU Compiler Compiler.
|
||
|
||
Permission is granted to copy, distribute and/or modify this document
|
||
under the terms of the GNU Free Documentation License, Version 1.3 or
|
||
any later version published by the Free Software Foundation; with no
|
||
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
|
||
Texts. A copy of the license is included in the "GNU Free
|
||
Documentation License" file as part of this distribution.
|