"while (i ++ <n) {}"이 "while (++ i <n) {}"보다 훨씬 느린 이유

code

"while (i ++ <n) {}"이 "while (++ i <n) {}"보다 훨씬 느린 이유

codestyles 2020. 10. 21. 08:07

"while (i ++ <n) {}"이 "while (++ i <n) {}"보다 훨씬 느린 이유

HotSpot JDK 1.7.0_45 (모든 컴파일러 / VM 옵션이 기본값으로 설정 됨)가있는 내 Windows 8 랩톱에서 아래 루프

final int n = Integer.MAX_VALUE;
int i = 0;
while (++i < n) {
}

다음보다 최소 2 배 더 빠릅니다 (~ 10ms 대 ~ 5000ms).

final int n = Integer.MAX_VALUE;
int i = 0;
while (i++ < n) {
}

다른 관련없는 성능 문제를 평가하기 위해 루프를 작성하는 동안이 문제를 발견했습니다. 그리고 차이 ++i < n와는 i++ < n크게 결과에 영향을 미치는 거대한 충분했다.

바이트 코드를 살펴보면 더 빠른 버전의 루프 본문은 다음과 같습니다.

iinc
iload
ldc
if_icmplt

더 느린 버전의 경우 :

iload
iinc
ldc
if_icmplt

따라서의 ++i < n경우 먼저 지역 변수 i를 1 씩 증가 시킨 다음 피연산자 스택에 밀어 넣고 i++ < n두 단계를 역순으로 수행합니다. 그러나 그것은 전자가 훨씬 빠른 이유를 설명하지 않는 것 같습니다. 후자의 경우 임시 사본이 있습니까? 아니면 성능 차이를 담당해야하는 바이트 코드 (VM 구현, 하드웨어 등)를 넘어서는 것이 있습니까?

++iand i++(완전히 아님) 에 관한 다른 토론을 읽었 지만 Java 고유의 답변을 찾지 못 ++i했거나 i++값 비교에 관여 하는 경우와 직접 관련이 있습니다.

다른 사람들이 지적했듯이 테스트에는 여러 가지 결함이 있습니다.

이 테스트를 어떻게했는지 정확히 알려주지 않았습니다. 그러나 나는 다음과 같은 "순진한"테스트 (공격 없음)를 구현하려고 시도했습니다.

class PrePostIncrement
{
    public static void main(String args[])
    {
        for (int j=0; j<3; j++)
        {
            for (int i=0; i<5; i++)
            {
                long before = System.nanoTime();
                runPreIncrement();
                long after = System.nanoTime();
                System.out.println("pre  : "+(after-before)/1e6);
            }
            for (int i=0; i<5; i++)
            {
                long before = System.nanoTime();
                runPostIncrement();
                long after = System.nanoTime();
                System.out.println("post : "+(after-before)/1e6);
            }
        }
    }

    private static void runPreIncrement()
    {
        final int n = Integer.MAX_VALUE;
        int i = 0;
        while (++i < n) {}
    }

    private static void runPostIncrement()
    {
        final int n = Integer.MAX_VALUE;
        int i = 0;
        while (i++ < n) {}
    }
}

기본 설정으로 실행하면 약간의 차이가있는 것 같습니다. 그러나이 플래그를 사용하여 실행하면 벤치 마크 의 실제 결함이 분명해집니다 -server. 내 경우의 결과는 다음과 같습니다.

...
pre  : 6.96E-4
pre  : 6.96E-4
pre  : 0.001044
pre  : 3.48E-4
pre  : 3.48E-4
post : 1279.734543
post : 1295.989086
post : 1284.654267
post : 1282.349093
post : 1275.204583

분명히 사전 증분 버전은 완전히 최적화 되었습니다 . 이유는 간단합니다. 결과가 사용되지 않습니다. 루프가 실행되는지 여부는 전혀 중요하지 않으므로 JIT는 단순히 루프를 제거합니다.

이것은 핫스팟 디스 어셈블리를 보면 확인됩니다. 사전 증가 버전은 다음 코드를 생성합니다.

[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x0000000055060500} &apos;runPreIncrement&apos; &apos;()V&apos; in &apos;PrePostIncrement&apos;
  #           [sp+0x20]  (sp of caller)
  0x000000000286fd80: sub    $0x18,%rsp
  0x000000000286fd87: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - PrePostIncrement::runPreIncrement@-1 (line 28)

  0x000000000286fd8c: add    $0x10,%rsp
  0x000000000286fd90: pop    %rbp
  0x000000000286fd91: test   %eax,-0x243fd97(%rip)        # 0x0000000000430000
                                                ;   {poll_return}
  0x000000000286fd97: retq   
  0x000000000286fd98: hlt    
  0x000000000286fd99: hlt    
  0x000000000286fd9a: hlt    
  0x000000000286fd9b: hlt    
  0x000000000286fd9c: hlt    
  0x000000000286fd9d: hlt    
  0x000000000286fd9e: hlt    
  0x000000000286fd9f: hlt

증가 후 버전은 다음 코드를 생성합니다.

[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x00000000550605b8} &apos;runPostIncrement&apos; &apos;()V&apos; in &apos;PrePostIncrement&apos;
  #           [sp+0x20]  (sp of caller)
  0x000000000286d0c0: sub    $0x18,%rsp
  0x000000000286d0c7: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - PrePostIncrement::runPostIncrement@-1 (line 35)

  0x000000000286d0cc: mov    $0x1,%r11d
  0x000000000286d0d2: jmp    0x000000000286d0e3
  0x000000000286d0d4: nopl   0x0(%rax,%rax,1)
  0x000000000286d0dc: data32 data32 xchg %ax,%ax
  0x000000000286d0e0: inc    %r11d              ; OopMap{off=35}
                                                ;*goto
                                                ; - PrePostIncrement::runPostIncrement@11 (line 36)

  0x000000000286d0e3: test   %eax,-0x243d0e9(%rip)        # 0x0000000000430000
                                                ;*goto
                                                ; - PrePostIncrement::runPostIncrement@11 (line 36)
                                                ;   {poll}
  0x000000000286d0e9: cmp    $0x7fffffff,%r11d
  0x000000000286d0f0: jl     0x000000000286d0e0  ;*if_icmpge
                                                ; - PrePostIncrement::runPostIncrement@8 (line 36)

  0x000000000286d0f2: add    $0x10,%rsp
  0x000000000286d0f6: pop    %rbp
  0x000000000286d0f7: test   %eax,-0x243d0fd(%rip)        # 0x0000000000430000
                                                ;   {poll_return}
  0x000000000286d0fd: retq   
  0x000000000286d0fe: hlt    
  0x000000000286d0ff: hlt

It's not entirely clear for me why it seemingly does not remove the post-increment version. (In fact, I consider asking this as a separate question). But at least, this explains why you might see differences with an "order of magnitude"...

EDIT: Interestingly, when changing the upper limit of the loop from Integer.MAX_VALUE to Integer.MAX_VALUE-1, then both versions are optimized away and require "zero" time. Somehow this limit (which still appears as 0x7fffffff in the assembly) prevents the optimization. Presumably, this has something to do with the comparison being mapped to a (singed!) cmp instruction, but I can not give a profound reason beyond that. The JIT works in mysterious ways...

The difference between ++i and i++ is that ++i effectively increments the variable and 'returns' that new value. i++ on the other hand effectively creates a temp variable to hold the current value in i, then increments the variable 'returning' the temp variable's value. This is where the extra overhead is coming from.

// i++ evaluates to something like this
// Imagine though that somehow i was passed by reference
int temp = i;
i = i + 1;
return temp;

// ++i evaluates to
i = i + 1;
return i;

In your case it appears that the increment won't be optimized by the JVM because you are using the result in an expression. The JVM can on the other hand optimize a loop like this.

for( int i = 0; i < Integer.MAX_VALUE; i++ ) {}

This is because the result of i++ is never used. In a loop like this you should be able to use both ++i and i++ with the same performance as if you used ++i.

EDIT 2

You should really look here:

http://hg.openjdk.java.net/code-tools/jmh/file/f90aef7f1d2c/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_11_Loops.java

EDIT The more I think about it, I realise that this test is somehow wrong, the loop will get seriously optimized by the JVM.

I think that you should just drop the @Param and let n=2.

This way you will test the performance of the while itself. The results I get in this case :

o.m.t.WhileTest.testFirst      avgt         5        0.787        0.086    ns/op
o.m.t.WhileTest.testSecond     avgt         5        0.782        0.087    ns/op

The is almost no difference

The very first question you should ask yourself is how you test and measure this. This is micro-benchmarking and in Java this is an art, and almost always a simple user (like me) will get the results wrong. You should rely on a benchmark test and very good tool for that. I used JMH to test this:

    @Measurement(iterations=5, time=1, timeUnit=TimeUnit.MILLISECONDS)
@Fork(1)
@Warmup(iterations=5, time=1, timeUnit=TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@State(Scope.Benchmark)
public class WhileTest {
    public static void main(String[] args) throws Exception {
        Options opt = new OptionsBuilder()
            .include(".*" + WhileTest.class.getSimpleName() + ".*")
            .threads(1)
            .build();

        new Runner(opt).run();
    }


    @Param({"100", "10000", "100000", "1000000"})
    private int n;

    /*
    @State(Scope.Benchmark)
    public static class HOLDER_I {
        int x;
    }
    */


    @Benchmark
    public int testFirst(){
        int i = 0;
        while (++i < n) {
        }
        return i;
    }

    @Benchmark
    public int testSecond(){
        int i = 0;
        while (i++ < n) {
        }
        return i;
    }
}

Someone way more experienced in JMH might correct this results (I really hope so!, since I am not that versatile in JMH yet), but the results show that the difference is pretty darn small:

Benchmark                        (n)   Mode   Samples        Score  Score error    Units
o.m.t.WhileTest.testFirst        100   avgt         5        1.271        0.096    ns/op
o.m.t.WhileTest.testFirst      10000   avgt         5        1.319        0.125    ns/op
o.m.t.WhileTest.testFirst     100000   avgt         5        1.327        0.241    ns/op
o.m.t.WhileTest.testFirst    1000000   avgt         5        1.311        0.136    ns/op
o.m.t.WhileTest.testSecond       100   avgt         5        1.450        0.525    ns/op
o.m.t.WhileTest.testSecond     10000   avgt         5        1.563        0.479    ns/op
o.m.t.WhileTest.testSecond    100000   avgt         5        1.418        0.428    ns/op
o.m.t.WhileTest.testSecond   1000000   avgt         5        1.344        0.120    ns/op

The Score field is the one you are interested in.

probably this test is not enough to take conclusions but I would say if this is the case, the JVM can optimize this expression by changing i++ to ++i since the stored value of i++ (pre value) is never used in this loop.

I suggest you should (whenever possible) always use ++c rather than c++ as the former will never be slower since, conceptually, a deep copy of c has to be taken in the latter case in order to return the previous value.

Indeed many optimisers will optimise away an unnecessary deep copy but they can't easily do that if you're making use of the expression value. And you're doing just that in your case.

Many folk disagree though: they see it as as a micro-optimisation.

참고URL : https://stackoverflow.com/questions/25322679/why-is-while-i-n-significantly-slower-than-while-i-n

'code' 카테고리의 다른 글

SQL Server GROUP BY datetime은 시간 분을 무시하고 날짜 및 합계 값을 선택합니다. (0)	2020.10.21
배우 모델 : 왜 erlang이 특별할까요? (0)	2020.10.21
Docker 네트워킹 비활성화 됨 : 경고 : IPv4 전달이 비활성화되었습니다. (0)	2020.10.21
.NET-일반 컬렉션을 DataTable로 변환 (0)	2020.10.21
Golang-채널 버퍼 크기는 무엇입니까? (0)	2020.10.21

현재글"while (i ++ <n) {}"이 "while (++ i <n) {}"보다 훨씬 느린 이유

codestyle

"while (i ++ <n) {}"이 "while (++ i <n) {}"보다 훨씬 느린 이유

"while (i ++ <n) {}"이 "while (++ i <n) {}"보다 훨씬 느린 이유

'code' 카테고리의 다른 글

'code'의 다른글

티스토리툴바

"while (i ++ &lt;n) {}"이 "while (++ i &lt;n) {}"보다 훨씬 느린 이유

"while (i ++ <n) {}"이 "while (++ i <n) {}"보다 훨씬 느린 이유

'code' 카테고리의 다른 글

'code'의 다른글

관련글

티스토리툴바

"while (i ++ <n) {}"이 "while (++ i <n) {}"보다 훨씬 느린 이유