The Scanner and The Parser - SeKoS

Apr 2, 2005 - unique=$(wc -lc < $tmp.2 | awk '{print $1 " (" $2 " chars)"}') echo $total occurrences of $unique symbols. sed 42 q $tmp .2 \. | pr --page - width ...
737KB taille 2 téléchargements 288 vues
Symbols Semantic Values Locations Improving the Scanner/Parser

The Scanner and The Parser

Akim Demaille

[email protected]

EPITA  École Pour l'Informatique et les Techniques Avancées April 2, 2005

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser



Symbols cstats


Symbols Semantic Values Scanner


Parser Locations Location tracking in the Scanner


Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols



Symbols cstats


Symbols Semantic Values Scanner


Parser Locations Location tracking in the Scanner


Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols



Symbols cstats


Symbols Semantic Values Scanner


Parser Locations Location tracking in the Scanner


Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser


cstats Symbols

Counting Symbols

gcc -E -P " $@ " \ | tr -cs '[: alnum :]_ ' '[\ n *] ' \ | grep '[[: alpha :]] ' \ | grep -v -E -w " $cxx_keywords " > $tmp .1 total =$( wc -lc < $tmp .1 | awk '{ print $1 " (" $2 " chars )" } ') sort $tmp .1 \ | uniq -c \ | sed 's /^ //; s /\ t/ /' \ | sort -rn > $tmp .2 unique =$ ( wc -lc < $tmp .2 | awk '{ print $1 " (" $2 " chars )" } ') echo $total occurrences of $unique symbols . sed 42 q $tmp .2 \ | pr -- page - width =60 -- column =3 -- omit - header rm -f $tmp .*

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols


9501 454 264 230 193 151 142 137 126 118 114 114 113 113 112

(58762 chars) occurrences of 1280 (17814 chars) symbols. i 111 sp 69 ht lemp 104 lineno 61 a psp 94 next 60 lem rp 94 name 60 filename __restrict 91 h 59 rule cfp 82 np 59 config n 77 c 57 symbol __const 74 state 55 lemon x 73 j 54 type fprintf 72 size 53 __attribute__ cp 72 FILE 50 data s 70 stp 48 tbl out 70 size_t 48 errorcnt ap 70 array 46 __extension__ Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

GCC's C Parser

22714 (234415 chars) 2676 tree 1579 ttype 1123 yyvsp 909 yyval 358 ftype 306 __const 285 __restrict 248 __t 247 t 206 gt_pointer_ope 200 common 191 size_t 175 code 171 tree_code

occurrences of 6720 (134544 chars) symbols. 138 __c 58 GTY 138 _Bool 55 __stream 124 __attribute__ 53 __s 123 FILE 46 identifier 118 __extension__ 46 __i 97 rtx 45 __fd 95 type 44 __n 89 new_type_flag 43 error 70 cpp_reader 40 cp_global_tree 69 build_tree_lis 39 yyn 67 parse 39 s 65 y 39 lookups 65 __FUNCTION__ 38 TREE_LIST 61 obstack 38 build_nt

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Tiger Compiler's Driver

43712 (385190 chars) 1497 _CharT 1008 __first 921 _Tp 848 __x 709 _Traits 673 ios_base 670 _Alloc 658 __last 589 __n 465 char_type 422 size_type 419 __s 399 __c 357 size_t

occurrences of 3807 348 __y 309 __len 290 basic_string 286 __i 284 iterator 283 __pos 274 __result 259 __beg 251 locale 251 _Compare 245 __p 240 iter_type 237 _ForwardIter 225 __first1

Akim Demaille

(62449 chars) symbols. 222 __err 211 __a 205 __first2 202 _M_node 199 __end 198 __io 190 traits_type 178 __comp 178 _RandomAccessI 176 __middle 175 std 170 _Key 155 __v 154 value_type

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols



Symbols cstats


Symbols Semantic Values Scanner


Parser Locations Location tracking in the Scanner


Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an


Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an


Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an


Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an


Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an


Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an


Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Scanner Parser

Semantic Values


Symbols cstats


Symbols Semantic Values Scanner


Parser Locations Location tracking in the Scanner


Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser



int string %% { int } { string }

Scanner Parser

in the scanner

[0 -9]+ " \" " ([^\\]|\\.)* " \" " yylval - > ival = atoi ( yytext ); return INT ; { yylval -> string = new std :: string ( yytext + 1, yyleng - 2); return STRING ; }

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser



Scanner Parser

in the parser

%union { int ival ; const std :: string * str ; // ... } // Storing and printing integers . %token INT " integer " %printer { debug_stream () step (); %} /* Location of blanks are ignored . */ [ \t ]+ yylloc -> step (); /* Newlines change the current line number , but are ignored too . */ \n+ yylloc -> line ( yyleng ); yylloc -> step (); /* ... */ Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Location tracking in the Scanner Location tracking in the Parser

Location tracking in the Parser


Symbols cstats


Symbols Semantic Values Scanner


Parser Locations Location tracking in the Scanner


Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Location tracking in the Scanner Location tracking in the Parser

Using the Location in the Parser

%define " filename_type " " const symbol :: Symbol " %locations %initial -action { // The initial location . @$ . begin . filename = @$ . end . filename = & symbol :: Symbol :: create ( filename ); }; %% biglvalue : ID "[" exp "]" { $$ = new SubscriptVar (@$ , new SimpleVar (@1 , $1 ), $3 ); } | biglvalue "[" exp "]" { $$ = new SubscriptVar (@$ , $1 , $3 ); } ;

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Location tracking in the Scanner Location tracking in the Parser

Error Messages

%error -verbose %% // ... %% void yy :: parser :: error ( const location_type & l , const std :: string & m) { std :: cerr