Skip to main content

The difference between HighlandJS's map vs flatMap

Definition

The flatMap function applies the transformation function to each piece of data in the stream and returns a new stream that emits the transformed data. This means that the transformation function is applied to each piece of data in the stream one at a time, but the transformed data is not emitted to the new stream until it is available.

This is different from the map function, which applies the transformation function to each piece of data in the stream and returns a new stream that emits the transformed data synchronously. The key difference synchronously. This means that the transformation function is applied to each piece of data in the stream one at a time, and the transformed data is emitted to the new stream as soon as it is available.

Performance

If the transformation function takes a long time to complete for each piece of data, using the map function could cause performance issues because the transformed data will not be emitted to the new stream until the transformation function has completed for each piece of data. This could cause the stream pipeline to become blocked, resulting in slower performance.

For example, if you have a CSV file with 100,000 rows being streamed through a CSV parser step in a pipeline and you use the map function to apply the parser to the stream, the parsed data will not be emitted to the new stream until the parser has completed for each row. This could cause the stream pipeline to become blocked, resulting in slower performance.

On the other hand, if you use the flatMap function to apply the parser to the stream, the parsed data will be emitted to the new stream as soon as it is available. This can help to prevent the stream pipeline from becoming blocked, resulting in better performance.

If the transformation function takes a long time to complete for each piece of data, using the flatMap function can improve performance because the transformed data is not emitted to the new stream until it is available. This can help to prevent the stream pipeline from becoming blocked, resulting in better performance.

Analogy: Friends building a house

Imagine that you have a group of friends who are working together to build a house. Each friend is responsible for completing a task, such as painting a wall or laying the foundation. So lets say the tasks are paint a wall, hang paintings, install a sink, lay the floor and install plumbing. Now, why would the map cause a performance issue and flatmap wont?

The map function is like assigning each friend a task and waiting for them to complete it before moving on to the next task. For example, if you have five friends and you assign them each a task using the map function, the task will not be completed until all five friends have finished their tasks. In this scenario, using the map function to assign tasks to your friends could cause a performance issue because it will wait for each friend to complete their task before moving on to the next task. This could result in slower performance because all of the tasks will be completed sequentially, one after the other.

For example, if painting a wall takes a long time to complete, all of the other tasks will be delayed until the wall is finished being painted. Similarly, if installing plumbing takes a long time to complete, all of the other tasks will be delayed until the plumbing is finished being installed. This could result in slower performance overall.

On the other hand, the flatMap function is like assigning each friend a task and allowing them to complete their tasks as they finish. For example, if you have five friends and you assign them each a task using the flatMap function, the tasks will be completed as soon as each friend finishes their task. Using the flatMap function to assign tasks to your friends would not cause a performance issue because it allows each friend to complete their task as they finish. This can help to improve performance because the tasks will be completed as soon as each friend finishes their task, rather than waiting for all of the tasks to be completed sequentially.

For example, if painting a wall takes a long time to complete, the other tasks can still be completed while the wall is being painted. Similarly, if installing plumbing takes a long time to complete, the other tasks can still be completed while the plumbing is being installed. This can help to improve overall performance because the tasks will be completed concurrently, rather than sequentially.

Is there a point in using .map?

The map function can be useful in some situations, but it is generally better to use the flatMap function when working with streams in Highland.js.

The map function is best used when you want to apply a transformation function to each piece of data in the stream and emit the transformed data to a new stream synchronously. This can be useful when the transformation function is quick to complete and you want to ensure that the transformed data is emitted to the new stream in the same order as the original data.

However, if the transformation function takes a long time to complete for each piece of data, using the map function could cause performance issues because the transformed data will not be emitted to the new stream until the transformation function has completed for each piece of data. This could cause the stream pipeline to become blocked, resulting in slower performance.

On the other hand, the flatMap function is better suited for situations where the transformation function takes a long time to complete for each piece of data, because it allows the transformed data to be emitted to the new stream as soon as it is available. This can help to prevent the stream pipeline from becoming blocked, resulting in better performance.

In general, it is usually better to use the flatMap function when working with streams in Highland.js, unless you have a specific reason to use the map function.