T3 lexer

40 %
60 %
Information about T3 lexer
Entertainment

Published on November 28, 2007

Author: Mikhail

Source: authorstream.com

Modern Compiler Design:  Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University gretay@post.tau.ac.il http://www.cs.tau.ac.il/~gretay Modern Compiler Design T3 - Lexer Today:  Today Goals: how to use FLEX ? learn how to use flex output Today: understand flex input structure of PA2 Lexical Analysis FLEX Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Gen. Designing Lexer:  Designing Lexer choose finite set of tokens programming language, parser a token corresponds to sets of strings rules to describe a lexeme IF token “if” or “i” followed by “f” Lexical Analysis with FLEX:  Lexical Analysis with FLEX FLEX is Fast Lexical Analyzer Generator recognized lexical patterns in text lexer, scanner Input: spec file Output: a lexical analyzer C++ program FLEX g++ spec Lexical analyzer cool source code tokens c++ source Structure of PA2:  Structure of PA2 FLEX g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc lextest.cc cool-parse.h FLEX Spec File:  FLEX Spec File definitions %% rules %% user code FLEX Spec File:  Possible source of g++ errors down the road FLEX Spec File %{ declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% DIGIT [0-9] LETTER [a-zA-Z] ID {LETTER}({LETTER}|{DIGIT})* INITIAL {LETTER} ({LETTER}|{DIGIT})* user code - functions main() { … yylex(); … } error_handler() {…} %{ declarations – user code local to yylex() %} Remarks:  Remarks %{ %} definitions section declare global variable rule section before the first rule declare variables local to yylex() executed whenever the scanning routine is entered Any text not matched by a flex rule is copied to the output User Code - definitions:  User Code - definitions %{ #include <cool-parse.h> #include <stringtab.h> #include <utilities.h> #define MAX_STR_CONST 1025 char string_buf[MAX_STR_CONST]; extern int curr_lineno; %} Structure of PA2:  Structure of PA2 flex g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc cool-parse.h utilities.h utilities.cc stirngtab.cc lextest.cc stringtab.h User Code - definitions:  User Code - definitions %{ extern FILE *fin; #undef YY_INPUT #define YY_INPUT(buf,result,max_size) \ if ( (result = fread( (char*)buf,\ sizeof(char), \ max_size, fin)) \ < 0) \ YY_FATAL_ERROR( "read() in flex failed"); #define yylval cool_yylval #define yylex cool_yylex extern YYSTYPE cool_yylval; %} FLEX Spec File:  FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %} FLEX Definitions:  FLEX Definitions Definitions of start condition %x list-of-start-conditions %s list-of-start-conditions Macro definitions Macro-name regular-expr Start Conditions:  Start Conditions %s inclusive start conditions rules with no start conditions are also activated %x exclusive start conditions only rules with the start condition are activated INITIAL BEGIN start-condition in action definition activates rules with start condition until next BEGIN EOF retains within the same start condition %s vs. %x by example:  %s vs. %x by example %s example %% <example>foo do_something(); bar something_else(); is equivalent to %x example %% <example>foo do_something(); <INITIAL,example>bar something_else(); is equivalent to %x example %% <example>foo do_something(); <*>bar something_else(); Why using start conditions ?:  Why using start conditions ? %x independent mini-scanner inside scanner scan syntactically different portion of the input tokenize according to context comments (* *) quote string “ ” Example Macros:  Example Macros ALPHA [A-Za-z_] DIGIT [0-9] ALPHA_NUMERIC {ALPHA}|{DIGIT} ID {ALPHA}({ALPHA_NUMERIC})* NUMBER ({DIGIT})+ WHITE_SPACE ([\ \n\r\t\f])+ Regular Expressions:  Regular Expressions FLEX Spec File:  FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %} Lexical Analysis Rules:  Lexical Analysis Rules Rule structure <states> regexp { C++ code action } Priority for rule matching longest string More than one match for same length – priority for rule appearing first ! Important: rules given in a flex specification should match all possible input ! Action Body:  Action Body C++ code Can use special methods and vars ECHO - copy yytext to the output REJECT - proceed on to the "second best" rule which matched the input or a prefix of the input #define yywrap() 1 yytext, yyleng, yyin, yyterminate(), yymore(), etc. Lexer state transition BEGIN, YY_START INITIAL FLEX Spec File:  FLEX Spec File {% declarations - user code copied directly to C++ file global declarations of extern variables, include directives %} definitions macro name, definition names of start conditions rules for lexical analysis optional state, regular expression, action as C++ code how to break input to tokens, action when token matched %% %% user code - functions %{ declarations – user code local to yylex() %} Running the Lexer:  Running the Lexer %% main( int argc, char *argv ) { ++argv, --argc; /* skip over program name */ if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); } (Just for testing Lexer as stand-alone program) Putting it all together…:  Putting it all together… %{ #include “tokens.h” int line_count = 0; %} Letter [a-zA-Z_] Digit [0-9] %% [ \t] {;} [\n] {line_count++;} “;” { return SemiColumn;} “++” { return PlusPlus ;} “+=” { return PlusEqual ;} “+” { return Plus} “while” { return While ; } {Letter}({Letter}|{Digit})* { yylval.sym = strdup( yytext ); return Id ;} “<=” { return LessOrEqual;} “<” { return LessThen ;} %% main (){ yylex(); } Structure of PA2:  Structure of PA2 flex g++ cool.flex Lexical analyzer test.cl tokens cool-lex.cc cool-parse.h utilities.h utilities.cc stirngtab.h YYSTYPE cool_yylval; int curr_lineno; FILE *fin; int cool_yylex(); lextest.cc stringtab.h cool-parse.h:  cool-parse.h Defines token constant ids Tells parser what is the token returned by lexer Actual value doesn’t matter Actually, generated by bison ?

Add a comment

Related presentations

Related pages

F Sharp Programming/Lexing and Parsing - Wikibooks, open ...

Lexing and parsing is a very handy way to convert source-code (or other human-readable input which has a well-defined syntax) into an abstract syntax tree ...
Read more

ANTLR Specification: Meta Language

ANTLR Meta-Language. ANTLR accepts three types of grammar specifications -- parsers, lexers, and tree-parsers (also called tree-walkers). Because ANTLR ...
Read more

nsiqcppstyle/RULE_4_1_E_align_conditions.py at master ...

nsiqcppstyle - Cpp style checker in python ... Align condition list in 'if' and 'while' clause if they are splitted in multiple lines.
Read more

ANTLR Specification: Meta Language - MIT

ANTLR. jGuru: ANTLR 2 Meta-Language. ANTLR 2 accepts three types of grammar specifications -- parsers, lexers, and tree-parsers (also called tree-walkers).
Read more

nsiqcppstyle/RULE_4_2_A_A_space_around_operator.py at ...

nsiqcppstyle - Cpp style checker in python ... Switch branches/tags. Branches; Tags
Read more

SWTOR Conquest-T3 M4-i-lexer der nacht

Total Estimated Points: 3.66 Million: For total points this guild is in 3519th place across all servers: For total points this guild is in 295th place on T3 M4
Read more

START: A | B; A: C | D; B: C | D; C: T1 | T2; D: T3 | T4 ...

T3: 't3'; T4: 't4'; S: NP VP | NPA VP; NPA: D N | NP PP; NP: D N | NP PP | NPA; VP: V NP | VP PP; PP: P NP; D: "the" | "a"; P: "in" | "with"; V: "saw";
Read more

pr0gramm.com

Meine T3 hat ein so großes Spiel in der Lenkung, ... Oct 6, 2015 - 10:27 by LEXer tzzz bis 30km/h kann das meiner ebenso ! Oct 6, 2015 ...
Read more

ABAP-Befehle protokollieren. - SAP ABAP | XING

T3 = ( t2 - t1 ) / 1000 . "Laufzeitmessung ASSERT ID ... Hallo Alex , dass lässt sich eigentlich nur sauber über einen Parser und Lexer abwickeln.
Read more

Tolexo - Online Marketplace for Industrial Goods, Safety ...

Buy online from over 1 million business goods & industrial supplies on Tolexo.com, from categories like power tools, safety equipment, office supplies etc ...
Read more