Developing A Procedural Language For Postgre Sql

0 %
100 %
Information about Developing A Procedural Language For Postgre Sql
Technology

Published on March 17, 2009

Author: linuxpoet

Source: slideshare.net

Description

Presentation presented by Joshua Tolley and PostgreSQL Conference West 08. Corresponding video is here: http://www.vimeo.com/3728119

Developing a Procedural Language for PostgreSQL  

I CAN HAZ P.L.Z PLZ?  

MKAY... WHY? To learn how

MKAY... WHY? To learn how To get to speak at conferences

MKAY... WHY? To learn how To get to speak at conferences Achieve world domination

MKAY... WHY? To learn how To get to speak at conferences Achieve world domination

UM, WHAT?

WHAT'S LOLCODE? 1 Oh hai. In teh beginnin 1 In the beginning, some Guy Ceiling Cat maded teh skiez made an Image of a cat. And An da Urfs, but he did not the picture was without eated dem. caption, and void. 2 Da Urfs no had shapez An 2 And said Guy did make a haded dark face, An Ceiling caption unto his Image. And Cat rode invisible bike over it was funny, and the Guy did teh waterz. call it good. 3 At start, no has lyte. An 3 And this guy's Image did Ceiling Cat sayz, i can haz become an internet meme, all lite? An lite wuz. Web 2.0-ish and stuff. 4 An Ceiling Cat sawed teh 4 There was much rejoicing. lite, to seez stuffs...

WHAT'S LOLCODE A lolcat is an image combining a photograph, most frequently a cat, with a humorous and idiosyncratic caption in (often) broken English—a dialect which is known as quot;lolspeakquot;, or quot;kittehquot;. The name quot;lolcatquot; is a compound word of quot;LOLquot; and quot;catquot;.[1] Another name is cat macro, being a type of image macro.[2] Lolcats are created for photo sharing imageboards and other internet forums. Lolcats are similar to other anthropomorphic animal-based image macros such as the O RLY?  owl.[3]

UM... OKAY...

SO, LOLCODE? Adam Lindsay had a revelation. From Ceiling Cat. quot;This is a love letter to very clever people who are slightly bored. I had no idea there were so many of us out there.quot;   - FAQ, lolcode.com

CHEEZBURGER  

CHEEZBURGER   SPEEK LOLCODE

CHEEZBURGER   SPEEK LOLCODE ANATUMMY OF A PL

CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ ANATUMMY OF A PL

CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ MAEK UN INTURPRETR ANATUMMY OF A PL

CHEEZBURGER   SPEEK LOLCODE

I CAN SPEEK LOLCODE Data types NUMBR: Integer values NUMBAR: Floating point values YARN: String values NOOB: Null values TROOF: Boolean values (WIN / FAIL)

I CAN SPEEK LOLCODE Operators Arithmetic SUM OF x AN y, DIFF OF x AN y PRODUKT OF x AN y, QUOSHUNT OF x AN y MOD OF x AN y BIGGR OF x AN y, SMALLR OF x AN y   Boolean BOTH OF x AN y EITHER OF x AN y WON OF x AN y ALL OF x AN y [AN ...] MKAY ANY OF x AN y [AN ...] MKAY NOT x

I CAN SPEEK LOLCODE Operators (cont.) Comparison BOTH SAEM x AN y WIN iff x == y DIFFRINT x AN y FAIL iff x == y Concatenation SMOOSH x y z p d q ... MKAY  Concatenates infinitely many YARNs Casting MAEK x A <type> x IS NOW A <type>

I CAN SPEEK LOLCODE Comparison   BOTH SAEM ANIMAL AN quot;CATquot;, O RLY? YA RLY, VISIBLE quot;J00 HAV A CATquot; MEBBE BOTH SAEM ANIMAL AN quot;MAUSquot; VISIBLE quot;JOO HAV A MAUS? WTF?quot; NO WAI, VISIBLE quot;J00 SUXquot; OIC

I CAN SPEEK LOLCODE Case COLOR, WTF? OMG quot;Rquot; VISIBLE quot;RED FISHquot; GTFO OMG quot;Yquot; VISIBLE quot;YELLOW FISHquot; OMG quot;Gquot; OMG quot;Bquot; VISIBLE quot;FISH HAS A FLAVORquot; GTFO OMGWTF VISIBLE quot;FISH IS TRANSPARENTquot; OIC

I CAN SPEEK LOLCODE Loops   IM IN YR <label> <operation> YR <variable> [TIL|WILE <expression>] <code block> IM OUTTA YR <label>   IT variable The default when nothing else is specified Similar to Perl's $_

I CAN SPEEK PL/LOLCODE PL/LOLCODE-specific stuff VISIBLE == PL/pgSQL RAISE Alt. VISIBLE <level> quot;Messagequot; FOUND YR <val> == PL/pgSQL RETURN GIMMEH <var> OUTTA DATABUKKIT quot;queryquot; Database interface (SPI) Function arguments are named LOL1, LOL2, etc. automatically PL/LOLCODE isn't smart enough yet to let you give them arbitrary names

I CAN SPEEK PL/LOLCODE PL/LOLCODE-specific stuff VISIBLE == PL/pgSQL RAISE Alt. VISIBLE <level> quot;Messagequot; FOUND YR <val> == PL/pgSQL RETURN GIMMEH <var> OUTTA DATABUKKIT quot;queryquot; Database interface (SPI) LIAR LIAR PANTZ Function arguments are named LOL1, LOL2, etc. automatically ON FIRE!!!1 PL/LOLCODE isn't smart enough yet to let you give them arbitrary names

I CAN SPEEK PL/LOLCODE PL/LOLCODE-specific stuff VISIBLE == PL/pgSQL RAISE Alt. VISIBLE <level> quot;Messagequot; FOUND YR <val> == PL/pgSQL RETURN GIMMEH <var> OUTTA DATABUKKIT quot;queryquot; Database interface (SPI) Function arguments are named LOL1, LOL2, etc. automatically PL/LOLCODE isn't smart enough yet to let you give them arbitrary names As of a few nights ago, this is no longer true. W00T!!1

I CAN SPEEK PL/LOLCODE HAI I HAS A X ITZ SMOOSH quot;MONORAILquot; quot;KITTYquot; MKAY I HAS A Y ITZ 3 I HAS A Z ITZ BIGGR OF Y AN 15 BOTH SAEM Z AN Y, O RLY? YA RLY, VISIBLE quot;SUMTHINZ RONG!!quot; NO WAI, VISIBLE quot;THX. CAN I HAZ CHEEZBURGR?quot; OIC IM IN YR MYLOOP Y R DIFF OF Y AN 1 DIFFRINT Y AN BIGGR OF Y AN 5, O RLY? YA RLY, GTFO OIC IM OUTTA YR MYLOOP KTHXBYE

CHEEZBURGER   SPEEK LOLCODE ANATUMMY OF A PL

ANATUMMY OF A PL Function call handler plpgsql_call_handler, plperl_call_handler, etc. This function is responsible for executing each function call for a particular language CREATE LANGUAGE Must be written in a compiled language Must take language_handler as a parameter Function validator plpgsql_validator, plperl_validator Optional Validates functions when they're created Takes an OID argument TRUSTED vs. UNTRUSTED

ANATUMMY OF A PL Where's the interpreter? Some PLs come with their own interpreter PL/pgSQL, SQL/PSM, PL/LOLCODE, PL/Proxy Others use an interpreter from a shared library PL/Perl, PL/Python, PL/R, most everything else Some do other strange things PL/J Talks to a Java virtual machine via RMI calls PLs with built-in interpreters can use PostgreSQL's internal data types to avoid conversion overhead

ANATUMMY OF A PL Where's the interpreter? Some PLs come with their own interpreter PL/pgSQL, SQL/PSM, PL/LOLCODE, PL/Proxy Others use an interpreter from a shared library PL/Perl, PL/Python, PL/R, most everything else Some do other strange things PL/J Talks to a Java virtual machine via RMI calls PLs with built-in interpreters can use PostgreSQL's internal data types to avoid conversion overhead PL/LOLCODE isn't that smart yet

ANATUMMY OF A PL Compilation Commonly a function will be quot;compiledquot; the first time it is run in a session The function call handler looks up argument and return types only once Sometimes keeps the parse tree in memory Avoids having to re-parse the function next time it's called

ANATUMMY OF A PL Compilation Commonly a function will be quot;compiledquot; the first time it is run in a session The function call handler looks up argument and return types only once Sometimes keeps the parse tree in memory Avoids having to re-parse the function next time it's called PL/LOLCODE isn't that smart yet

CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ ANATUMMY OF A PL

FUNKSHUN KALL HANDLRZ Job of the function handler Figure out if it's a TRIGGER function Determine argument values Determine return type Find the function source ...alternatively, find a pre-stored parse tree Execute the function Return the proper values

FUNKSHUN KALL HANDLRZ PG_MODULE_MAGIC; Datum pl_lolcode_call_handler (PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(pl_lolcode_call_handler); Datum pl_lolcode_call_handler(PG_FUNCTION_ARGS) { /* ... */ }

FUNKSHUN KALL HANDLRZ  Get function information Form_pg_proc from server/catalog/pg_proc.h   Form_pg_proc procStruct; HeapTuple procTup; procTup = SearchSysCache(PROCOID, ObjectIdGetDatum(fcinfo- >flinfo->fn_oid), 0, 0, 0); if (!HeapTupleIsValid(procTup)) elog(ERROR, quot;Cache lookup failed for procedure %uquot;, fcinfo->flinfo->fn_oid); procStruct = (Form_pg_proc) GETSTRUCT(procTup); ReleaseSysCache(procTup);

FUNKSHUN KALL HANDLRZ Get function source   procsrcdatum = SysCacheGetAttr(PROCOID, procTup, Anum_pg_proc_prosrc, &isnull); if (isnull) elog(ERROR, quot;Function source is nullquot;); proc_source = DatumGetCString (DirectFunctionCall1(textout, procsrcdatum));

FUNKSHUN KALL HANDLRZ Get return type   typeTup = SearchSysCache(TYPEOID, ObjectIdGetDatum (procStruct->prorettype), 0, 0, 0); if (!HeapTupleIsValid(typeTup)) elog(ERROR, quot;Cache lookup failed for type %uquot;, procStruct->prorettype); typeStruct = (Form_pg_type) GETSTRUCT(typeTup); resultTypeIOParam = getTypeIOParam(typeTup); fmgr_info_cxt(typeStruct->typinput, &flinfo, TopMemoryContext);

FUNKSHUN KALL HANDLRZ Get arguments and their types   for (i = 0; i < procStruct->pronargs; i++) { snprintf(arg_name, 255, quot;LOL%dquot;, i+1); lolDeclareVar(arg_name); LOLifyDatum(fcinfo->arg[i], fcinfo->argnull[i], procStruct->proargtypes.values[i], arg_name); } LOLifyDatum() gets the typeStruct for the argument type, gets the argument value as a C string, and sticks it in PL/LOLCODE's variable table

FUNKSHUN KALL HANDLRZ Pass the function source to the interpreter   pllolcode_yy_scan_string(proc_source); pllolcode_yyparse(NULL); pllolcode_yylex_destroy();   More on this later

FUNKSHUN KALL HANDLRZ Return the proper value if (returnTypeOID != VOIDOID) { if (returnVal->type == ident_NOOB) fcinfo->isnull = true; else { if (returnTypeOID == BOOLOID) retval = InputFunctionCall(&flinfo, lolVarGetTroof(returnVal) == lolWIN ? quot;Tquot; : quot;Fquot;, resultTypeIOParam, -1); else { /* ... */ retval = InputFunctionCall(&flinfo, rettmp, resultTypeIOParam, -1); } } }

FUNKSHUN KALL HANDLRZ Note the function manager interface (fmgr.[h|c]) InputFunctionCall Calls a previously determined datatype input function OutputFunctionCall Opposite of InputFunctionCall DirectFunctionCall[1..9] Call a specific, named function with various numbers of arguments Require FmgrInfo structures to describe the functions to be called Read through fmgr.h and fmgr.c for more details

CHEEZBURGER   SPEEK LOLCODE FUNKSHUN KALL HANDLRZ MAEK UN INTURPRETR ANATUMMY OF A PL

MAEK UN INTURPRETR Don't write the parser on your own. It's painful It's a lot of work It's easy to do wrong Someone else already has Writing the interpreter on your own is ok, but use a tool I wrote an LOLCODE interpreter to ... Learn how Be sure it was licensed the way I wanted it licensed (BSD)

MAEK UN INTURPRETR Tools for writing parsers and interpreters There are a fair number of them ANTLR Lex + Yacc Flex + Bison Parsers, language definitions, etc. are complex business Top-down vs. Bottom-up, Associativity rules... I chose Flex + Bison, because that's what PostgreSQL uses I've no particular expertise to choose based on merits I forgot most of my formal languages class :)

MAEK UN INTURPRETR Parser generation tools require a language definition Generally in a variant of BNF (Backus-Naur Form)   <symbol1> ::= <expression> <postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> quot;.quot; <street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL> <zip-part> ::= <town-name> quot;,quot; <state-code> <ZIP-code> <EOL> <opt-jr-part> ::= quot;Sr.quot; | quot;Jr.quot; | <roman-numeral> | quot;quot;Thanks, Wikipedia

MAEK UN INTURPRETR Flex + Bison == two parts  Flex = lexer Divides input into a set of labeled tokens Bison = parser Given a stream of tokens, interpret them and act accordingly

MAEK UN INTURPRETR Flex input is mostly straightforward     /* Arithmetic functions */ quot;SUM OFquot; { return SUMOF; } quot;DIFF OFquot; { return DIFFOF; } quot;PRODUKT OFquot; { return PRODUKTOF; } quot;QUOSHUNT OFquot; { return QUOSHUNTOF; } Some regular expressions [A-Za-z][A-Za-z0-9_]+ { return IDENTIFIER; }

MAEK UN INTURPRETR State machine allows for processing blocks   OBTW { BEGIN LOL_BLOCKCMT; } <LOL_BLOCKCMT>TLDR { BEGIN 0; lol_lex_debug(quot;yylex: block commentquot;); } <LOL_BLOCKCMT>n { lineNumber++; } <LOL_BLOCKCMT>. {}

MAEK UN INTURPRETR Some code to make memory allocation work   #define YYSTACK_FREE pfree #define YYSTACK_MALLOC palloc #define YYFREE pfree #define YYMALLOC palloc ... %option noyyalloc noyyrealloc noyyfree PostgreSQL uses memory contexts, so the parser has to use them too.

MAEK UN INTURPRETR Flex makes a big stream of tokens Bison turns them into something useful   Define the tokens:   %token HAI KTHXBYE EXPREND %token YARN IDENTIFIER NUMBR NUMBAR %token TROOFWIN TROOFFAIL TROOF Define grammar and associated actions %% lol_program: HAI EXPREND lol_cmdblock KTHXBYE EXPREND { executeProgram($3); }

MAEK UN INTURPRETR Bison's job (in this case) is to build a parse tree We don't actually execute anything until the tree is built In some future day, we'll save the parse tree after the first execution, and avoid re-parsing each time a function is called Once the tree is built, we'll pass it to a function that will execute it Specifically, the data structure is a linked list where each node can point to the next node in the list as well as to a new list This is similar to the structure PL/pgSQL uses

MAEK UN INTURPRETR typedef struct lolNodeList { int length; lolNode *head; lolNode *tail; } lolNodeList; struct lolNode { NodeType node_type; struct lolNode *next; /* list for sub-nodes */ lolNodeList *list; NodeVal nodeVal; };

MAEK UN INTURPRETR Each lolNode is a particular statement VISIBLE ORLY IHASA ... Also, constants and variables are nodes YARN NUMBAR ... How each node uses the members of the lolNode struct depends on the node

MAEK UN INTURPRETR TROOF node Really simple -- handles a TROOF (Boolean) constant NodeType node_type = TROOF NodeVal nodeVal = WIN or FAIL When executed, this node simply sets IT to the WIN or FAIL value stored in nodeVal

MAEK UN INTURPRETR VISIBLE nodes Remember, VISIBLE rasies an exception Structure members: NodeType node_type = VISIBLE lolNodeList *list = pointer to a list of nodes containing: A constant string (a 1-node list) An expression evaluating to a string (possibly many nodes) NodeVal nodeVal = a number representing the message level (NOTICE, WARNING, etc.) When executed, this node runs the list, which puts a value in IT, and raises IT as an exception

MAEK UN INTURPRETR SUMOF node SUM OF x AN y = x + y Structure members: lolNodeList *list == a two-element list, where each node represents an operand The result is returned in IT

MAEK UN INTURPRETR ORLY node O RLY? == if, YA RLY == then, MEBBE == else if, NO WAI == else lolNodeList *list points to a set of nodes, one per YA RLY, MEBBE, and NO WAI block. Each YA RLY, MEBBE, and NO WAI block has a sub-list of its own, containing the instructions in the block MEBBE blocks also contain a list of nodes containing the MEBBE expression

MAEK UN INTURPRETR So what does Bison need to make this work?

MAEK UN INTURPRETR Token definitions These describe the data Flex generates   %token FOUNDYR IHASA ITZ R %token SUMOF DIFFOF PRODUKTOF QUOSHUNTOF MODOF BIGGROF SMALLROF AN %token BOTHOF EITHEROF WONOF NOT ALLOF ANYOF MKAY %token BOTHSAEM DIFFRINT These will turn into #define statements (e.g. #define DIFFRINT 123)

MAEK UN INTURPRETR A definition of the grammar   lol_cmd: lol_expression EXPREND { $$ = $1; } | ORLY EXPREND lol_orly_block OIC EXPREND { $$ = lolMakeNode(ORLY, tmpnval, $3); } | WTF EXPREND lol_wtf_block OIC EXPREND { $$ = lolMakeNode(WTF, tmpnval, $3); } | IMINYR IDENTIFIER EXPREND lol_cmdblock IMOUTTAYR IDENTIFIER EXPREND { /* TODO: Check $2 and $7 for equality */ $$ = lolMakeNode(IMINYR, tmpnval, $4); }

MAEK UN INTURPRETR A definition of the data type used in Bison's stack   %union { int numbrVal; double numbarVal; char *yarnVal; struct lolNodeList *nodeList; struct lolNode *node; } This is pretty much always a union

MAEK UN INTURPRETR Type declarations to map various declared expressions to union members   %type <nodeList> lol_program lol_cmdblock lol_orly_block %type <node> lol_const lol_var lol_cmd lol_xaryoprgroup %type <yarnVal> YARN %type <yarnVal> IDENTIFIER

MAEK UN INTURPRETR As the parser sees token streams it can interpret as a particular object in the grammar, it runs the code associated with that object, using $1, $2, etc. to identify objects on the stack   lol_expression: ... | IHASA lol_var ITZ lol_expression { $$ = lolMakeNode (DECL, ($2)->nodeVal, lolMakeList($4)); }

MISELANEEUS DEVELOPMENT   ENVIRONMENT INTRO TO SPI

SPI GIMMEH <var> OUTTA DATABUKKIT quot;<query>quot; Note that this only returns one value PL/LOLCODE uses a special parse tree node for SPI The expression that generates the query is stored in the node's sub-list Executing that list puts the text of the query in IT SPI sends that query to the server for processing The first column of the first row is stuck in the variable in the GIMMEH statement SPIval = SPI_getvalue(SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1); LOLifyString(SPIval, SPI_gettypeid(SPI_tuptable->tupdesc, 1), node- >nodeVal.yarnVal);

Datums Everything is a Datum What a Datum actually is is architecture-dependant sizeof(Datum) == sizeof(long) >= sizeof(void *) >= 4 Different data types are stored in Datums in different ways You need to know what data type the datum is supposed to store SPI_tuptable is a global pointer to a structure that describes the last tuple returned by SPI Based on a Datum and knowledge of the type it represents, you can get useful values from it There are lots of functions, macros, etc. for processing Datums

Development Environment Configure with --enable-debug, --enable-cassert  --enable-debug Include debugging symbols in final library This allows things like gdb to know where in the program you are, and tell you about it Disables optimizations Makes executables bigger --enable-cassert Adds (in)sanity checks Makes things slower More likely to tell you if you do something really stupid cf. memory context problems

I CAN HAZ NAP NOW?

U HAZ KWESCHUNZ?

KTHXBYE

Development Environment jtolley@uber:~/devel/pgsql$ ps -ef | grep postgres jtolley 7465 6283 1 13:23 ? 00:00:00 postgres: jtolley jtolley [local] idle jtolley@uber:~/devel/pgsql$ gdb postgres 7465 GNU gdb 6.8-debian ... Attaching to program: /home/jtolley/devel/pgdb/bin/postgres, process 7465 Reading symbols from /usr/lib/libxml2.so.2...done. Loaded symbols for /usr/lib/libxml2.so.2 ... Loaded symbols for /home/jtolley/devel/pgdb/lib/postgresql/pllolcode.so 0xb8003430 in __kernel_vsyscall () (gdb)

Add a comment

Related presentations

Related pages

PostgreSQL: Documentation: 9.4: Procedural Languages

There are currently four procedural languages available in the standard PostgreSQL distribution: ... the basics of developing a new procedural language are ...
Read more

Chapter 39. Procedural Languages - PostgreSQL: The world's ...

Chapter 39. Procedural Languages. PostgreSQL allows user ... called procedural languages ... of developing a new procedural language are ...
Read more

Capítulo 36. Procedural Languages

Capítulo 36. Procedural Languages. PostgreSQL allows user ... called procedural languages ... of developing a new procedural language are ...
Read more

Developing a PostgreSQL Procedural Language

Developing a PostgreSQL Procedural Language Joshua Tolley End Point Corporation eggyknap@gmail.com. ... – Executes each stored procedure in a given language
Read more

33.5. Procedural Language Functions - PostgreSQL ...

Procedural languages aren't built into ... available in the standard PostgreSQL ... of developing a new procedural language are covered ...
Read more

PostgreSQL: Documentation: Manuals: PostgreSQL 7.4: PL ...

Tips for Developing in PL/pgSQL 37.3. Structure of PL/pgSQL 37.4. ... PL/pgSQL is a loadable procedural language for the PostgreSQL database system.
Read more

PostgreSQL Programming Guide - Procedural Languages

8 Procedural Languages. PostgreSQL ... For a function written in a procedural language, ... the basics of developing a new procedural language ...
Read more

Procedural Languages - 한국 포스트그레스큐엘 ...

There are currently four procedural languages available in the standard PostgreSQL ... the basics of developing a new procedural language are ...
Read more

PostgreSQL

A small system just for developing, ... side programming languages, i.e. SQL or PostgreSQL procedural languages like ... have your own procedural language at
Read more