code

인터프리터가 유지 관리하는 정수 캐시는 무엇입니까?

codestyles 2020. 11. 2. 07:59
반응형

인터프리터가 유지 관리하는 정수 캐시는 무엇입니까?


Python의 소스 코드를 살펴본 PyInt_Objectint(-5)~ int(256)(@ src / Objects / intobject.c) 범위 s 배열을 유지하고 있음을 알게되었습니다.

약간의 실험이 그것을 증명합니다.

>>> a = 1
>>> b = 1
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False

그러나 이러한 코드를 py 파일에서 함께 실행하거나 세미콜론으로 결합하면 결과가 다릅니다.

>>> a = 257; b = 257; a is b
True

왜 그것들이 여전히 같은 객체인지 궁금합니다. 그래서 구문 트리와 컴파일러를 더 깊이 파고 들어 아래에 나열된 호출 계층을 생각해 냈습니다.

PyRun_FileExFlags() 
    mod = PyParser_ASTFromFile() 
        node *n = PyParser_ParseFileFlagsEx() //source to cst
            parsetoke() 
                ps = PyParser_New() 
                for (;;)
                    PyTokenizer_Get() 
                    PyParser_AddToken(ps, ...)
        mod = PyAST_FromNode(n, ...)  //cst to ast
    run_mod(mod, ...)
        co = PyAST_Compile(mod, ...) //ast to CFG
            PyFuture_FromAST()
            PySymtable_Build()
            co = compiler_mod()
        PyEval_EvalCode(co, ...)
            PyEval_EvalCodeEx()

그런 다음 일부 디버그 코드를 in PyInt_FromLong및 before / after 에 추가 PyAST_FromNode하고 test.py를 실행했습니다.

a = 257
b = 257
print "id(a) = %d, id(b) = %d" % (id(a), id(b))

출력은 다음과 같습니다.

DEBUG: before PyAST_FromNode
name = a
ival = 257, id = 176046536
name = b
ival = 257, id = 176046752
name = a
name = b
DEBUG: after PyAST_FromNode
run_mod
PyAST_Compile ok
id(a) = 176046536, id(b) = 176046536
Eval ok

그것은 중에 있음을 의미 cst하는 ast변환, 두 개의 서로 다른 PyInt_Object들 (실제로는 수행 것 생성 ast_for_atom()기능),하지만 나중에 병합됩니다.

PyAST_Compile의 출처를 이해하기가 어렵 기 PyEval_EvalCode때문에 도움을 요청하기 위해 여기에 있습니다. 누군가가 힌트를 주면 감사하겠습니다.


Python은 범위의 정수를 캐시 [-5, 256]하므로 해당 범위의 정수도 동일 할 것으로 예상됩니다.

보시다시피 동일한 텍스트의 일부일 때 동일한 리터럴을 최적화하는 Python 컴파일러입니다.

Python 셸에 입력 할 때 각 줄은 완전히 다른 문이며 다른 순간에 구문 분석됩니다.

>>> a = 257
>>> b = 257
>>> a is b
False

그러나 동일한 코드를 파일에 넣으면 :

$ echo 'a = 257
> b = 257
> print a is b' > testing.py
$ python testing.py
True

이는 파서가 리터럴이 사용되는 위치를 분석 할 기회가있을 때마다 발생합니다. 예를 들어 대화 형 인터프리터에서 함수를 정의 할 때 :

>>> def test():
...     a = 257
...     b = 257
...     print a is b
... 
>>> dis.dis(test)
  2           0 LOAD_CONST               1 (257)
              3 STORE_FAST               0 (a)

  3           6 LOAD_CONST               1 (257)
              9 STORE_FAST               1 (b)

  4          12 LOAD_FAST                0 (a)
             15 LOAD_FAST                1 (b)
             18 COMPARE_OP               8 (is)
             21 PRINT_ITEM          
             22 PRINT_NEWLINE       
             23 LOAD_CONST               0 (None)
             26 RETURN_VALUE        
>>> test()
True
>>> test.func_code.co_consts
(None, 257)

컴파일 된 코드에 257.

In conclusion, the Python bytecode compiler is not able to perform massive optimizations (like static types languages), but it does more than you think. One of these things is to analyze usage of literals and avoid duplicating them.

Note that this does not have to do with the cache, because it works also for floats, which do not have a cache:

>>> a = 5.0
>>> b = 5.0
>>> a is b
False
>>> a = 5.0; b = 5.0
>>> a is b
True

For more complex literals, like tuples, it "doesn't work":

>>> a = (1,2)
>>> b = (1,2)
>>> a is b
False
>>> a = (1,2); b = (1,2)
>>> a is b
False

But the literals inside the tuple are shared:

>>> a = (257, 258)
>>> b = (257, 258)
>>> a[0] is b[0]
False
>>> a[1] is b[1]
False
>>> a = (257, 258); b = (257, 258)
>>> a[0] is b[0]
True
>>> a[1] is b[1]
True

Regarding why you see that two PyInt_Object are created, I'd guess that this is done to avoid literal comparison. for example, the number 257 can be expressed by multiple literals:

>>> 257
257
>>> 0x101
257
>>> 0b100000001
257
>>> 0o401
257

The parser has two choices:

  • Convert the literals to some common base before creating the integer, and see if the literals are equivalent. then create a single integer object.
  • Create the integer objects and see if they are equal. If yes, keep only a single value and assign it to all the literals, otherwise, you already have the integers to assign.

Probably the Python parser uses the second approach, which avoids rewriting the conversion code and also it's easier to extend (for example it works with floats as well).


Reading the Python/ast.c file, the function that parses all numbers is parsenumber, which calls PyOS_strtoul to obtain the integer value (for intgers) and eventually calls PyLong_FromString:

    x = (long) PyOS_strtoul((char *)s, (char **)&end, 0);
    if (x < 0 && errno == 0) {
        return PyLong_FromString((char *)s,
                                 (char **)0,
                                 0);
    }

As you can see here the parser does not check whether it already found an integer with the given value and so this explains why you see that two int objects are created, and this also means that my guess was correct: the parser first creates the constants and only afterward optimizes the bytecode to use the same object for equal constants.

The code that does this check must be somewhere in Python/compile.c or Python/peephole.c, since these are the files that transform the AST into bytecode.

In particular, the compiler_add_o function seems the one that does it. There is this comment in compiler_lambda:

/* Make None the first constant, so the lambda can't have a
   docstring. */
if (compiler_add_o(c, c->u->u_consts, Py_None) < 0)
    return 0;

So it seems like compiler_add_o is used to insert constants for functions/lambdas etc. The compiler_add_o function stores the constants into a dict object, and from this immediately follows that equal constants will fall in the same slot, resulting in a single constant in the final bytecode.

참고URL : https://stackoverflow.com/questions/15171695/whats-with-the-integer-cache-maintained-by-the-interpreter

반응형