The Scanner and The Parser - SeKoS

Apr 2, 2005 - unique=$(wc -lc < $tmp.2 | awk '{print $1 " (" $2 " chars)"}') echo $total occurrences of $unique symbols. sed 42 q $tmp .2 \. | pr --page - width ...
737KB taille 2 téléchargements 284 vues
Symbols Semantic Values Locations Improving the Scanner/Parser

The Scanner and The Parser

Akim Demaille

[email protected]

EPITA  École Pour l'Informatique et les Techniques Avancées April 2, 2005

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Outline

1

Symbols cstats

2

Symbols Semantic Values Scanner

3

Parser Locations Location tracking in the Scanner

4

Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Symbols

1

Symbols cstats

2

Symbols Semantic Values Scanner

3

Parser Locations Location tracking in the Scanner

4

Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

cstats

1

Symbols cstats

2

Symbols Semantic Values Scanner

3

Parser Locations Location tracking in the Scanner

4

Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats:

cstats Symbols

Counting Symbols

gcc -E -P " $@ " \ | tr -cs '[: alnum :]_ ' '[\ n *] ' \ | grep '[[: alpha :]] ' \ | grep -v -E -w " $cxx_keywords " > $tmp .1 total =$( wc -lc < $tmp .1 | awk '{ print $1 " (" $2 " chars )" } ') sort $tmp .1 \ | uniq -c \ | sed 's /^ //; s /\ t/ /' \ | sort -rn > $tmp .2 unique =$ ( wc -lc < $tmp .2 | awk '{ print $1 " (" $2 " chars )" } ') echo $total occurrences of $unique symbols . sed 42 q $tmp .2 \ | pr -- page - width =60 -- column =3 -- omit - header rm -f $tmp .*

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Lemon

9501 454 264 230 193 151 142 137 126 118 114 114 113 113 112

(58762 chars) occurrences of 1280 (17814 chars) symbols. i 111 sp 69 ht lemp 104 lineno 61 a psp 94 next 60 lem rp 94 name 60 filename __restrict 91 h 59 rule cfp 82 np 59 config n 77 c 57 symbol __const 74 state 55 lemon x 73 j 54 type fprintf 72 size 53 __attribute__ cp 72 FILE 50 data s 70 stp 48 tbl out 70 size_t 48 errorcnt ap 70 array 46 __extension__ Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

GCC's C Parser

22714 (234415 chars) 2676 tree 1579 ttype 1123 yyvsp 909 yyval 358 ftype 306 __const 285 __restrict 248 __t 247 t 206 gt_pointer_ope 200 common 191 size_t 175 code 171 tree_code

occurrences of 6720 (134544 chars) symbols. 138 __c 58 GTY 138 _Bool 55 __stream 124 __attribute__ 53 __s 123 FILE 46 identifier 118 __extension__ 46 __i 97 rtx 45 __fd 95 type 44 __n 89 new_type_flag 43 error 70 cpp_reader 40 cp_global_tree 69 build_tree_lis 39 yyn 67 parse 39 s 65 y 39 lookups 65 __FUNCTION__ 38 TREE_LIST 61 obstack 38 build_nt

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Tiger Compiler's Driver

43712 (385190 chars) 1497 _CharT 1008 __first 921 _Tp 848 __x 709 _Traits 673 ios_base 670 _Alloc 658 __last 589 __n 465 char_type 422 size_type 419 __s 399 __c 357 size_t

occurrences of 3807 348 __y 309 __len 290 basic_string 286 __i 284 iterator 283 __pos 274 __result 259 __beg 251 locale 251 _Compare 245 __p 240 iter_type 237 _ForwardIter 225 __first1

Akim Demaille

(62449 chars) symbols. 222 __err 211 __a 205 __first2 202 _M_node 199 __end 198 __io 190 traits_type 178 __comp 178 _RandomAccessI 176 __middle 175 std 170 _Key 155 __v 154 value_type

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Symbols

1

Symbols cstats

2

Symbols Semantic Values Scanner

3

Parser Locations Location tracking in the Scanner

4

Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an

allocations

Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an

allocations

Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an

allocations

Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an

allocations

Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an

allocations

Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

cstats Symbols

Save Time and Space

One unique occurrence for each identier:

const char* iterator in a std::set

In C, a simple

Save space: fewer

In C++, an

allocations

Set has the important property that inserting a new element into a set does not

Save time: fewer allocations, easier comparisons

invalidate iterators that point

Save nerves: easier

to existing elements.

memory management

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Scanner Parser

Semantic Values

1

Symbols cstats

2

Symbols Semantic Values Scanner

3

Parser Locations Location tracking in the Scanner

4

Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Using

yylval

int string %% { int } { string }

Scanner Parser

in the scanner

[0 -9]+ " \" " ([^\\]|\\.)* " \" " yylval - > ival = atoi ( yytext ); return INT ; { yylval -> string = new std :: string ( yytext + 1, yyleng - 2); return STRING ; }

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Using

yylval

Scanner Parser

in the parser

%union { int ival ; const std :: string * str ; // ... } // Storing and printing integers . %token INT " integer " %printer { debug_stream () step (); %} /* Location of blanks are ignored . */ [ \t ]+ yylloc -> step (); /* Newlines change the current line number , but are ignored too . */ \n+ yylloc -> line ( yyleng ); yylloc -> step (); /* ... */ Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Location tracking in the Scanner Location tracking in the Parser

Location tracking in the Parser

1

Symbols cstats

2

Symbols Semantic Values Scanner

3

Parser Locations Location tracking in the Scanner

4

Location tracking in the Parser Improving the Scanner/Parser Error Recovery Pure Parser Two Grammars in One Reentrancy Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Location tracking in the Scanner Location tracking in the Parser

Using the Location in the Parser

%define " filename_type " " const symbol :: Symbol " %locations %initial -action { // The initial location . @$ . begin . filename = @$ . end . filename = & symbol :: Symbol :: create ( filename ); }; %% biglvalue : ID "[" exp "]" { $$ = new SubscriptVar (@$ , new SimpleVar (@1 , $1 ), $3 ); } | biglvalue "[" exp "]" { $$ = new SubscriptVar (@$ , $1 , $3 ); } ;

Akim Demaille

The Scanner and The Parser

Symbols Semantic Values Locations Improving the Scanner/Parser

Location tracking in the Scanner Location tracking in the Parser

Error Messages

%error -verbose %% // ... %% void yy :: parser :: error ( const location_type & l , const std :: string & m) { std :: cerr