Java Streams Interview Questions - BEHIND JAVA

Java Streams Interview Questions

Share This

How to convert a list of object into map using java streams? The key of the map become a property in the object

You can use Collectors.toMap implementation for that

How to convert a stream into an array?

The easiest method is to use the toArray(IntFunction generator) method with an array constructor reference.

String[] stringArray = stringStream.toArray(String[]::new);

What it does is find a method that takes in an integer (the size) as argument, and returns a String[], which is exactly what (one of the overloads of) new String[] does.

You could also write your own IntFunction:

Stream stringStream = ...;
String[] stringArray = stringStream.toArray(size -> new String[size]);

The purpose of the IntFunction<A[]> generator is to convert an integer, the size of the array, to a new array.

Example code:

What's the difference between map and flatMap methods in Java Stream?

Both map and flatMap can be applied to a Stream and they both return a Stream. The difference is that the map operation produces one output value for each input value, whereas the flatMap operation produces an arbitrary number (zero or more) values for each input value.

This is reflected in the arguments to each operation.

The map operation takes a Function, which is called for each value in the input stream and produces one result value, which is sent to the output stream.

The flatMap operation takes a function that conceptually wants to consume one value and produce an arbitrary number of values. However, in Java, it's cumbersome for a method to return an arbitrary number of values, since methods can return only zero or one value. One could imagine an API where the mapper function for flatMap takes a value and returns an array or a List of values, which are then sent to the output. Given that this is the streams library, a particularly apt way to represent an arbitrary number of return values is for the mapper function itself to return a stream! The values from the stream returned by the mapper are drained from the stream and are passed to the output stream. The "clumps" of values returned by each call to the mapper function are not distinguished at all in the output stream, thus the output is said to have been "flattened."

Typical use is for the mapper function of flatMap to return Stream.empty() if it wants to send zero values, or something like Stream.of(a, b, c) if it wants to return several values. But of course any stream can be returned.

Here below i am putting 2 examples to get a more practical point of view:

First example making usage of map:

Nothing special in the first example, a Function is applied to return the String in uppercase.

Second example making usage of flatMap:

In the second example, a Stream of List is passed. It is NOT a Stream of Integer!

If a transformation Function has to be used (through map), then first the Stream has to be flattened to something else (a Stream of Integer).

If flatMap is removed then the following error is returned: The operator + is undefined for the argument type(s) List, int.

It is NOT possible to apply + 1 on a List of Integers!

How to convert a stream to list in java?

Collectors.toList() helps to converting the stream to list

What are the difference between serial and parallel streams?

Parallel streams divide the provided task into many and run them in different threads, utilizing multiple cores of the computer. On the other hand sequential streams work just like for-loop using a single core.

The tasks provided to the streams are typically the iterative operations performed on the elements of a collection or array or from other dynamic sources. Parallel execution of streams run multiple iterations simultaneously in different available cores.

In parallel execution, if number of tasks are more than available cores at a given time, the remaining tasks are queued waiting for currently running task to finish.

It is also important to know that iterations are only performed at a terminal operation. The streams are designed to be lazy.

Let's test sequential and parallel behavior with an example.

In above example we are printing various information, i.e. time, collection element value and thread name. We are doing that in forEach() terminal function. Other than parallel() and sequential(), we are not using any other intermediate operations, but that doesn't matter if we use the same intermediate operations for the both. We are also making each iteration to sleep for 200ms so that we can clearly compare the time taken by sequential and parallel invocations.

Output:
Following is the output, on an 8 logical processors (4 Core) machine.

-------
Running sequential
-------
02:29:02.817 - value: 1 - thread: main
02:29:03.022 - value: 2 - thread: main
02:29:03.223 - value: 3 - thread: main
02:29:03.424 - value: 4 - thread: main
02:29:03.624 - value: 5 - thread: main
02:29:03.824 - value: 6 - thread: main
02:29:04.025 - value: 7 - thread: main
02:29:04.225 - value: 8 - thread: main
02:29:04.426 - value: 9 - thread: main
02:29:04.626 - value: 10 - thread: main
-------
Running parallel
-------
02:29:04.830 - value: 7 - thread: main
02:29:04.830 - value: 3 - thread: ForkJoinPool.commonPool-worker-1
02:29:04.830 - value: 8 - thread: ForkJoinPool.commonPool-worker-4
02:29:04.830 - value: 2 - thread: ForkJoinPool.commonPool-worker-3
02:29:04.830 - value: 9 - thread: ForkJoinPool.commonPool-worker-2
02:29:04.830 - value: 5 - thread: ForkJoinPool.commonPool-worker-5
02:29:04.830 - value: 1 - thread: ForkJoinPool.commonPool-worker-6
02:29:04.831 - value: 10 - thread: ForkJoinPool.commonPool-worker-7
02:29:05.030 - value: 4 - thread: ForkJoinPool.commonPool-worker-3
02:29:05.030 - value: 6 - thread: ForkJoinPool.commonPool-worker-2

Why cant you use parallel stream everywhere?

A parallel stream has a much higher overhead compared to a sequential one. Coordinating the threads takes a significant amount of time. I would use sequential streams by default and only consider parallel ones if

  • I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)
  • I have a performance problem in the first place
  • I don't already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)

In your example, the performance will anyway be driven by the synchronized access to System.out.println(), and making this process parallel will have no effect, or even a negative one.

Moreover, remember that parallel streams don't magically solve all the synchronization problems. If a shared resource is used by the predicates and functions used in the process, you'll have to make sure that everything is thread-safe. In particular, side effects are things you really have to worry about if you go parallel.

In any case, measure, don't guess! Only a measurement will tell you if the parallelism is worth it or not.

What is the difference between intermediate and terminal operations?

Stream operations are combined into pipelines to process streams. All operations are either intermediate or terminal.

Intermediate operations are those operations that return Stream itself allowing for further operations on a stream.

These operations are always lazy, i.e. they do not process the stream at the call site, an intermediate operation can only process data when there is a terminal operation. Some of the intermediate operations are filter, map and flatMap.

Terminal operations terminate the pipeline and initiate stream processing. The stream is passed through all intermediate operations during terminal operation call. Terminal operations include forEach, reduce, Collect and sum.

To drive this point home, let us look at an example with side effects:

The output will be as follows:

Stream without terminal operation
Stream with terminal operation
doubling 1
doubling 2
doubling 3

As you can see, the intermediate operations are only triggered when a terminal operation exists.

What is stream pipelining in Java 8?

Stream pipelining is the concept of chaining operations together. This is done by splitting the operations that can happen on a stream into two categories: intermediate operations and terminal operations.

Each intermediate operation returns an instance of Stream itself when it runs, an arbitrary number of intermediate operations can, therefore, be set up to process data forming a processing pipeline.

There must then be a terminal operation which returns a final value and terminates the pipeline.

What does peek() method does? When should you use it?

The peek() method of Stream class allows you to see through a Stream pipeline. You can peek through each step and print meaningful messages on the console. It's generally used for debugging issues related to lambda expression and Stream processing.

The Stream.peek() method is mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline.

The Stream.peek() method returns a stream consisting of the elements of this stream, additionally performing the provided action on each element as elements are consumed from the resulting stream.

For parallel stream pipelines, the action may be called at whatever time and in whatever thread the element is made available by the upstream operation. If the action modifies shared state, it is responsible for providing the required synchronization.

Output:

Filtered value: bycle
Mapped value: BYCLE
Filtered value: flight
Mapped value: FLIGHT
Filtered value: train
Mapped value: TRAIN

What does map() function do? why you use it?

The map() function perform map functional operation in Java. This means it can transform one type of object to other by applying a function.

For example, if you have a List of String and you want to convert that to a List of Integer, you can use map() to do so.

Just supply a function to convert String to Integer e.g. parseInt() to map() and it will apply that to all elements of List and give you a List of Integer. In other words, the map can convert one object to other.

What is a Predicate interface?

A Predicate is a functional interface which represents a function, which takes an Object and returns a boolean. It is used in several Stream methods e.g. filter() which uses Predicate to filter unwanted elements.

here is how a Predicate function looks like:

public boolean test(T object){
   return boolean; 
}

You can see it just has one test() method which takes an object and returns a boolean. The method is used to test a condition if it passes it returns true otherwise false.

Is there is any way to iterate over stream indices?

The Java 8 streams API lacks the features of getting the index of a stream element

There are often workarounds, however. Usually this can be done by "driving" the stream with an integer range, and taking advantage of the fact that the original elements are often in an array or in a collection accessible by index. For example,

The resulting list contains "Erik" only.

How to combine multiple streams?

For combining multiple streams you can use following scenario

Stream.of(stream1, stream2, Stream.of(element)).flatMap(identity());

Also you can use Stream.concat function for combining two streams

Stream.concat(s1, s2)

What are the basic features of stream as compared with collection?

Parallelism

Parallelism utilizes hardware capabilities at their best, as nowadays, more CPU cores are available on a computer, so it doesn't make sense to have a single thread in a multi-core system. Designing and writing multi-threaded applications is challenging and error-prone, hence Streams has two implementations: sequential and parallel. Using parallel Streams is easy and no expertise is needed for thread handling.

Laziness

As we know, Java8 Streams have two types of operations, known as Intermediate and Terminal. These two operations are meant for processing and providing the end results, respectively. You might have seen that if a terminal operation is not associated with intermediate operations, it can't be executed.

In summary, intermediate operations just create another stream, but won't perform any processing until the terminal operation is called. Once the terminal operation is called, traversal of streams begins and the associated function is applied one by one. Intermediate operations are lazy operations, so Streams supports laziness.

Short-Circuit Behavior

This is another way of optimizing the Streams processing. Short-circuiting will terminate the processing once condition met. There are a number of short-circuiting operations available. For e.g. anyMatch, allMatch, findFirst, findAny, limit, etc.

Compare limit and skip methods in java streams?

The limit(long n) method of java.util.stream.Stream object returns a reduced stream of first n elements. This method will throw an exception if n is negative.

The skip(long n) method of java.util.stream.Stream object returns a stream of remaining elements after skipping first n elements. This method will throw an exception if n is negative.

The following example demonstrates the use of the limit() and skip() methods.

Output

--------Stream elements after limiting----------
1
2
3
--------Stream elements after skipping----------
3
4
5
6

What are the advantages and disadvantages of stream?

Interesting that the interview question asks about the advantages, without asking about disadvantages, for there are are both.

Streams are a more declarative style. Or a more expressive style. It may be considered better to declare your intent in code, than to describe how it's done:

 return people
     .filter( p -> p.age() < 19)
     .collect(toList());
     

says quite clearly that you're filtering matching elements from a list, whereas:

 List filtered = new ArrayList<>();
 for(Person p : people) {
     if(p.age() < 19) {
         filtered.add(p);
     }
 }
 return filtered;
 

Says "I'm doing a loop". The purpose of the loop is buried deeper in the logic.

Streams are often terser. The same example shows this. Terser isn't always better, but if you can be terse and expressive at the same time, so much the better.

Streams have a strong affinity with functions. Java 8 introduces lambdas and functional interfaces, which opens a whole toybox of powerful techniques. Streams provide the most convenient and natural way to apply functions to sequences of objects.

Streams encourage less mutability. This is sort of related to the functional programming aspect -- the kind of programs you write using streams tend to be the kind of programs where you don't modify objects.

Streams encourage looser coupling. Your stream-handling code doesn't need to know the source of the stream, or its eventual terminating method.

Streams can succinctly express quite sophisticated behaviour. For example:

 stream.filter(myfilter).findFirst();
 

Might look at first glance as if it filters the whole stream, then returns the first element. But in fact findFirst() drives the whole operation, so it efficiently stops after finding one item.

Streams provide scope for future efficiency gains. Some people have benchmarked and found that single-threaded streams from in-memory Lists or arrays can be slower than the equivalent loop. This is plausible because there are more objects and overheads in play.

But streams scale. As well as Java's built-in support for parallel stream operations, there are a few libraries for distributed map-reduce using Streams as the API, because the model fits.

Disadvantages

Performance: A for loop through an array is extremely lightweight both in terms of heap and CPU usage. If raw speed and memory thriftiness is a priority, using a stream is worse.

Familiarity: The world is full of experienced procedural programmers, from many language backgrounds, for whom loops are familiar and streams are novel. In some environments, you want to write code that's familiar to that kind of person.

Cognitive overhead: Because of its declarative nature, and increased abstraction from what's happening underneath, you may need to build a new mental model of how code relates to execution. Actually you only need to do this when things go wrong, or if you need to deeply analyse performance or subtle bugs. When it "just works", it just works.

Debuggers are improving: but even now, when you're stepping through stream code in a debugger, it can be harder work than the equivalent loop, because a simple loop is very close to the variables and code locations that a traditional debugger works with.

No comments:

Post a Comment

Pages