code

flatMap () 후 filter ()가 Java 스트림에서 "완전히"게으르지 않은 이유는 무엇입니까?

codestyles 2020. 11. 2. 08:00
반응형

flatMap () 후 filter ()가 Java 스트림에서 "완전히"게으르지 않은 이유는 무엇입니까?


다음 샘플 코드가 있습니다.

System.out.println(
       "Result: " +
        Stream.of(1, 2, 3)
                .filter(i -> {
                    System.out.println(i);
                    return true;
                })
                .findFirst()
                .get()
);
System.out.println("-----------");
System.out.println(
       "Result: " +
        Stream.of(1, 2, 3)
                .flatMap(i -> Stream.of(i - 1, i, i + 1))
                .flatMap(i -> Stream.of(i - 1, i, i + 1))
                .filter(i -> {
                    System.out.println(i);
                    return true;
                })
                .findFirst()
                .get()
);

출력은 다음과 같습니다.

1
Result: 1
-----------
-1
0
1
0
1
2
1
2
3
Result: -1

여기에서 첫 번째 경우 stream실제로 느리게 동작 하는 것을 볼 수 findFirst()있습니다. 첫 번째 요소가 있으면 필터링 람다가 호출되지 않습니다. 그러나 flatMaps 를 사용하는 두 번째 경우 에는 필터 조건을 충족하는 첫 번째 요소가 발견 되었음에도 불구하고 (람다가 항상 true를 반환하는 첫 번째 요소 임) 스트림의 추가 콘텐츠가 여전히 필터링 함수를 통해 공급되고 있음을 알 수 있습니다.

첫 번째 경우와 같이 첫 번째 요소가 계산 된 후 포기하지 않고 왜 이렇게 동작하는지 이해하려고합니다. 유용한 정보를 주시면 감사하겠습니다.


요컨대 , 이것은 JDK-8075939 에서 해결되었으며 Java 10에서 수정되었습니다 (및 JDK-8225328 에서 Java 8백 포트 ).

구현 ( ReferencePipeline.java)을 살펴보면 [ 링크 ] 메소드가 있습니다 .

@Override
final void forEachWithCancel(Spliterator<P_OUT> spliterator, Sink<P_OUT> sink) {
    do { } while (!sink.cancellationRequested() && spliterator.tryAdvance(sink));
}

findFirst작업을 위해 호출 됩니다. 주의해야 할 특별한 점 sink.cancellationRequested()은 첫 번째 경기에서 루프를 종료 할 수 있다는 것입니다. [ 링크 ]와 비교

@Override
public final <R> Stream<R> flatMap(Function<? super P_OUT, ? extends Stream<? extends R>> mapper) {
    Objects.requireNonNull(mapper);
    // We can do better than this, by polling cancellationRequested when stream is infinite
    return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
                                 StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT | StreamOpFlag.NOT_SIZED) {
        @Override
        Sink<P_OUT> opWrapSink(int flags, Sink<R> sink) {
            return new Sink.ChainedReference<P_OUT, R>(sink) {
                @Override
                public void begin(long size) {
                    downstream.begin(-1);
                }

                @Override
                public void accept(P_OUT u) {
                    try (Stream<? extends R> result = mapper.apply(u)) {
                        // We can do better that this too; optimize for depth=0 case and just grab spliterator and forEach it
                        if (result != null)
                            result.sequential().forEach(downstream);
                    }
                }
            };
        }
    };
}

한 항목을 진행하는 방법은 forEach이전 종료 가능성없이 하위 스트림을 호출 하는 것으로 끝나고 flatMap메서드 시작 부분의 주석은 이 부재 기능에 대해 알려줍니다.

Since this is more than just an optimization thing as it implies that the code simply breaks when the sub-stream is infinite, I hope that the developers soon prove that they “can do better than this”…


To illustrate the implications, while Stream.iterate(0, i->i+1).findFirst() works as expected, Stream.of("").flatMap(x->Stream.iterate(0, i->i+1)).findFirst() will end up in an infinite loop.

Regarding the specification, most of it can be found in the

chapter “Stream operations and pipelines” of the package specification:

Intermediate operations return a new stream. They are always lazy;

… Laziness also allows avoiding examining all the data when it is not necessary; for operations such as "find the first string longer than 1000 characters", it is only necessary to examine just enough strings to find one that has the desired characteristics without examining all of the strings available from the source. (This behavior becomes even more important when the input stream is infinite and not merely large.)

Further, some operations are deemed short-circuiting operations. An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.

It’s clear that a short-circuiting operation doesn’t guaranty a finite time termination, e.g. when a filter doesn’t match any item the processing can’t complete, but an implementation which doesn’t support any termination in finite time by simply ignoring the short-circuiting nature of an operation is far off the specification.


The elements of the input stream are consumed lazily one by one. The first element, 1, is transformed by the two flatMaps into the stream -1, 0, 1, 0, 1, 2, 1, 2, 3, so that entire stream corresponds to just the first input element. The nested streams are eagerly materialized by the pipeline, then flattened, then fed to the filter stage. This explains your output.

The above does not stem from a fundamental limitation, but it would probably make things much more complicated to get full-blown laziness for nested streams. I suspect it would be an even greater challenge to make it performant. For comparison, Clojure's lazy seqs get another layer of wrapping for each such level of nesting. Due to this design the operations may even fail with StackOverflowError when nesting is exercised to the extreme.


With regard to breakage with infinite sub-streams, the behavior of flatMap becomes still more surprising when one throws in an intermediate (as opposed to terminal) short-circuiting operation.

While the following works as expected, printing out the infinite sequence of integers

Stream.of("x").flatMap(_x -> Stream.iterate(1, i -> i + 1)).forEach(System.out::println);

the following code prints out only the "1", but still does not terminate:

Stream.of("x").flatMap(_x -> Stream.iterate(1, i -> i + 1)).limit(1).forEach(System.out::println);

I cannot imagine a reading of the spec in which that were not a bug.


In my free StreamEx library I introduced the short-circuiting collectors. When collecting sequential stream with short-circuiting collector (like MoreCollectors.first()) exactly one element is consumed from the source. Internally it's implemented in quite dirty way: using a custom exception to break the control flow. Using my library your sample could be rewritten in this way:

System.out.println(
        "Result: " +
                StreamEx.of(1, 2, 3)
                .flatMap(i -> Stream.of(i - 1, i, i + 1))
                .flatMap(i -> Stream.of(i - 1, i, i + 1))
                .filter(i -> {
                    System.out.println(i);
                    return true;
                })
                .collect(MoreCollectors.first())
                .get()
        );

The result is the following:

-1
Result: -1

Unfortunately .flatMap() is not lazy. However, a custom flatMap workaround is available here: Why .flatMap() is so inefficient (non lazy) in java 8 and java 9


I agree with other people this is a bug opened at JDK-8075939. And since it's still not fixed more than one year later. I would like to recommend you: AbacusUtil

N.println("Result: " + Stream.of(1, 2, 3).peek(N::println).first().get());

N.println("-----------");

N.println("Result: " + Stream.of(1, 2, 3)
                        .flatMap(i -> Stream.of(i - 1, i, i + 1))
                        .flatMap(i -> Stream.of(i - 1, i, i + 1))
                        .peek(N::println).first().get());

// output:
// 1
// Result: 1
// -----------
// -1
// Result: -1

Disclosure: I'm the developer of AbacusUtil.

참고URL : https://stackoverflow.com/questions/29229373/why-filter-after-flatmap-is-not-completely-lazy-in-java-streams

반응형