来自 技术 2019-04-17 的文章

PostgreSQL 源码解读(168)- 查询#88(PG中的词法定义

输入一条SQL语句,PostgreSQL如何解析输入的SQL,识别SQL类型以及基表/字段等信息?接下来的几节将逐一进行解析.本节介绍了PostgreSQL的词法定义文件(Flex输入文件),在文件src/backend/parser/scan.l中.如前所述,Flex输入文件由四部分组成:

%{Declarations%}Definitions%%Rules%%User subroutines

本节介绍第一部分Declarations

一、Declarations

由%{和%}包含的部分为Declarations部分,这一部分都是C代码,会原封不动的copy到lex.yy.c文件中.比较重要的定义包括:YYSTYPE-Bison使用一个union联合体来存储所有可能类型的值,全局变量yyvalue的类型是YYSTYPE.

%top{/*------------------------------------------------------------------------- * * scan.l * lexical scanner for PostgreSQL * PostgreSQL的词法扫描器 * * NOTE NOTE NOTE: * 特别特别特别注意: * The rules in this file must be kept in sync with src/fe_utils/psqlscan.l! * 这个文件中的规则必须与src/fe_utils/psqlscan.l文件中的规则保持一致!!! * * The rules are designed so that the scanner never has to backtrack, * in the sense that there is always a rule that can match the input * consumed so far (the rule action may internally throw back some input * with yyless(), however). As explained in the flex manual, this makes * for a useful speed increase --- about a third faster than a plain -CF * lexer, in simple testing. The extra complexity is mostly in the rules * for handling float numbers and continued string literals. If you change * the lexical rules, verify that you haven't broken the no-backtrack * property by running flex with the "-b" option and checking that the * resulting "lex.backup" file says that no backing up is needed. (As of * Postgres 9.2, this check is made automatically by the Makefile.) * 之所以设计这一的规则是便于扫描器不需要回溯,确保对于输入一定有一条规则与其匹配 * (但是,规则动作可能在内部用yyless() throw back一些输入). * 正如Flex手册中所说明的,这可以提升性能 -- * 在简单测试的情况下,相对于普通的-CF词法分析器,大概有1/3的性能提升. * 额外的复杂性主要体现在处理浮点数和连续字符串文字的规则中. * 如果修改了词法规则,通过以-b选项执行Flex以确保没有打破无回溯的约定, * 并且坚持结果文件"lex.backup"以确认无需备份. * (在PG 9.2,该检查通过Makefile自动执行) * * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * IDENTIFICATION * src/backend/parser/scan.l * *------------------------------------------------------------------------- */#include "postgres.h"#include <ctype.h>#include <unistd.h>#include "common/string.h"#include "parser/gramparse.h"#include "parser/parser.h" /* only needed for GUC variables */#include "parser/scansup.h"#include "mb/pg_wchar.h"}//------------------ 声明部分%{/* LCOV_EXCL_START *//* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) *///在扫描器出现致命错误时,避免调用exit()直接退出#undef fprintf#define fprintf(file, fmt, msg) fprintf_to_ereport(fmt, msg)static voidfprintf_to_ereport(const char *fmt, const char *msg){ ereport(ERROR, (errmsg_internal("%s", msg)));}/* * GUC variables. This is a DIRECT violation of the warning given at the * head of gram.y, ie flex/bison code must not depend on any GUC variables; * as such, changing their values can induce very unintuitive behavior. * But we shall have to live with it until we can remove these variables. * GUC参数变量.这直接违反了gram.y中提出的约定,如flex/bison代码不能依赖GUC变量; * 因此,改变他们的值会导致未知的后果. * 但在去掉这些变量前,不得不"活下去" */int backslash_quote = BACKSLASH_QUOTE_SAFE_ENCODING;bool escape_string_warning = true;bool standard_conforming_strings = true;/* * Set the type of YYSTYPE. * 设置YYSTYPE. * 在Bison中,全局变量yylval的类型为YYSTYPE,默认为int * Internally, bison declares each value as a C union that includes all of the types. * You list all of the types in %union declarations. * Bison turns them into a typedef for a union type called YYSTYPE. */#define YYSTYPE core_YYSTYPE/* * Set the type of yyextra. All state variables used by the scanner should * be in yyextra, *not* statically allocated. * 设置yyextra的数据类型.所有扫描器使用的状态变量应在yyextra中,不是静态分配的. */#define YY_EXTRA_TYPE core_yy_extra_type */* * Each call to yylex must set yylloc to the location of the found token * (expressed as a byte offset from the start of the input text). * When we parse a token that requires multiple lexer rules to process, * this should be done in the first such rule, else yylloc will point * into the middle of the token. * 每一次调用yylex必须设置yylloc指向发现的token所在的位置. * (从输入文本开始计算的字节偏移量) * 在分析一个需要多个词法规则进行处理的token时, * 在第一次应用规则时就应该完成这个动作,否则的话yylloc会指向到token的中间位置. */#define SET_YYLLOC() (*(yylloc) = yytext - yyextra->scanbuf)/* * Advance yylloc by the given number of bytes. * 通过给定的字节数调整yylloc的位置 */#define ADVANCE_YYLLOC(delta) ( *(yylloc) += (delta) )#define startlit() ( yyextra->literallen = 0 )static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);static char *litbufdup(core_yyscan_t yyscanner);static char *litbuf_udeescape(unsigned char escape, core_yyscan_t yyscanner);static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);static int process_integer_literal(const char *token, YYSTYPE *lval);static bool is_utf16_surrogate_first(pg_wchar c);static bool is_utf16_surrogate_second(pg_wchar c);static pg_wchar surrogate_pair_to_codepoint(pg_wchar first, pg_wchar second);static void addunicode(pg_wchar c, yyscan_t yyscanner);static bool check_uescapechar(unsigned char escape);#define yyerror(msg) scanner_yyerror(msg, yyscanner)#define lexer_errposition() scanner_errposition(*(yylloc), yyscanner)static void check_string_escape_warning(unsigned char ychar, core_yyscan_t yyscanner);static void check_escape_warning(core_yyscan_t yyscanner);/* * Work around a bug in flex 2.5.35: it emits a couple of functions that * it forgets to emit declarations for. Since we use -Wmissing-prototypes, * this would cause warnings. Providing our own declarations should be * harmless even when the bug gets fixed. * Flex 2.5.35存在一个bug:忽略了函数但没有忽略函数声明. * 因为使用了-Wmissing-prototypes选项,这会导致警告出现. * 就算bug修复,提供PG的声明也可能会存在问题. */extern int core_yyget_column(yyscan_t yyscanner);extern void core_yyset_column(int column_no, yyscan_t yyscanner);%}

二、参考资料

Flex&Bison

标签:   science      带数据的表格   
上一篇:没有了
下一篇:没有了