发新话题
打印

30.5 tokenize -- Python源码的词法分析器

30.5 tokenize -- Python源码的词法分析器

The tokenize module provides a lexical scanner for Python source code, implemented in Python. The scanner in this module returns comments as tokens as well, making it useful for implementing ``pretty-printers,'' including colorizers for on-screen displays.

tokenize模块提供了Python源码的语法分析。该模块中的语法分析器返回像标记一样的注释,并使注释实现优美的排版,包括在屏幕上显示彩色。

The primary entry point is a generator:

主要的入口点是生成器:

generate_tokens(readline)
    The generate_tokens() generator requires one argment, readline, which must be a callable object which provides the same interface as the readline() method of built-in file objects (see section 3.9). Each call to the function should return one line of input as a string.
    generate_tokens()生成器需要一个参数,readline,该参数必须是一个回调对象,提供和内置的file对象的readline()方法一样的接口。该函数的每次调用都应该返回输出的其中一行作为字符串。

    The generator produces 5-tuples with these members: the token type; the token string; a 2-tuple (srow, scol) of ints specifying the row and column where the token begins in the source; a 2-tuple (erow, ecol) of ints specifying the row and column where the token ends in the source; and the line on which the token was found. The line passed is the logical line; continuation lines are included. New in version 2.2.

    该生成器产生一个五元组:标记类型;标记字符串;二元组(srow, scol)[注释:指定源码中标记开始的行和列];二元组(erow, ecol)[注释:指定源码中标记结束的行和列];和标记被找到的行号。这个行号是逻辑意义上的行;包括连续的行。2.2版本中的新特性。

An older entry point is retained for backward compatibility:

老的入口点用来保持向后的兼容性:

tokenize(readline[, tokeneater])
    The tokenize() function accepts two parameters: one representing the input stream, and one providing an output mechanism for tokenize().
    tokenize()函数接受两个参数:一个表示输入流,一个提供tokenize()的输出机制。

    The first parameter, readline, must be a callable object which provides the same interface as the readline() method of built-in file objects (see section 3.9). Each call to the function should return one line of input as a string. Alternately, readline may be a callable object that signals completion by raising StopIteration. Changed in version 2.5: Added StopIteration support.

    第一个参数,readline,该参数必须是一个回调对象,提供和内置的file对象的readline()方法一样的接口。该函数的每次调用都应该返回输出的其中一行作为字符串。readline也可以是抛出StopIteration信号的回调对象。2.5版本中的改变,增加StopIteration支持。

    The second parameter, tokeneater, must also be a callable object. It is called once for each token, with five arguments, corresponding to the tuples generated by generate_tokens().

    第二个参数,tokeneater,也必须是一个回调对象。对于每个标记都会用五个参数(对应于generate_tokens()生成的元组)调用一次。

All constants from the token module are also exported from tokenize, as are two additional token type values that might be passed to the tokeneater function by tokenize():

token模块中的所有常量也从tokenize中被导出,并作为两个附加的标记类型值通过tokenize()被传递给tokeneater函数。

COMMENT
注释
    Token value used to indicate a comment.
    Token值用来标识一个注释。

NL
    Token value used to indicate a non-terminating newline. The NEWLINE token indicates the end of a logical line of Python code; NL tokens are generated when a logical line of code is continued over multiple physical lines.
    Token值用来标识一个非终止的换行。NEWLINE token表示Python代码中一个逻辑行的结束;当代码的逻辑行向下还有物理行时生成NL标记。

Another function is provided to reverse the tokenization process. This is useful for creating tools that tokenize a script, modify the token stream, and write back the modified script.

其他的函数提供和tokenization相反的处理。对于分析一个脚本,修改语法流,并写回修改的脚本都是很有用的。

untokenize(iterable)
    Converts tokens back into Python source code. The iterable must return sequences with at least two elements, the token type and the token string. Any additional sequence elements are ignored.
    转换tokens到Python源码中。iterable 必须返回至少两个元素的序列,token类型和token字符串。其他附加的序列元素被忽略。

    The reconstructed script is returned as a single string. The result is guaranteed to tokenize back to match the input so that the conversion is lossless and round-trips are assured. The guarantee applies only to the token type and token string as the spacing between tokens (column positions) may change. New in version 2.5.

    重构的脚本返回一个单一的字符串。

Example of a script re-writer that transforms float literals into Decimal objects:

一个转换实型到十进制对象的例子:

def decistmt(s):
    """Substitute Decimals for floats in a string of statements.

    >>> from decimal import Decimal
    >>> s = 'print +21.3e-5*-.1234/81.7'
    >>> decistmt(s)
    "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"

    >>> exec(s)
    -3.21716034272e-007
    >>> exec(decistmt(s))
    -3.217160342717258261933904529E-7

    """
    result = []
    g = generate_tokens(StringIO(s).readline)   # tokenize the string
    for toknum, tokval, _, _, _  in g:
        if toknum == NUMBER and '.' in tokval:  # replace NUMBER tokens
            result.extend([
                (NAME, 'Decimal'),
                (OP, '('),
                (STRING, repr(tokval)),
                (OP, ')')
            ])
        else:
            result.append((toknum, tokval))
    return untokenize(result)

zkfarmer翻译整理,出处Python中国。附件
附件: 您所在的用户组无法下载或查看附件
最新文档、技术交流,www.zkfarmer.org

TOP

发新话题