Java8 分组和分片

在 Java8 中，我们可以使用 Stream API 进行各种集合操作，包括分组（groupingBy）和分片（partitioningBy）。但请注意，Java 8 的 Stream API 没有直接提供名为“分片”的方法，但有一个类似的功能，即 partitioningBy，它根据一个谓词（Predicate）将元素分成两部分：满足谓词的元素和不满足谓词的元素。

分组（groupingBy）

groupingBy 方法用于根据某个属性或条件对集合中的元素进行分组。它返回一个 Map，其中键是分组条件的结果，值是满足该条件的元素的列表。

方法定义：

// 返回对 T 类型的输入元素执行 "分组" 操作的收集器，根据分类函数对元素进行分组，
// 并将结果以 Map 的形式返回。
static <T,K> Collector<T,?,Map<K,List<T>>> groupingBy(
    Function<? super T,? extends K> classifier) 

// 返回对 T 类型输入元素执行级联 "分组" 操作的收集器，根据分类函数对元素进行分组，
// 然后使用指定的下游收集器对与给定键相关的值执行还原操作。
static <T,K,A,D> Collector<T,?,Map<K,D>> groupingBy(
    Function<? super T,? extends K> classifier, 
    Collector<? super T,A,D> downstream) 

// 返回对 T 类型输入元素执行级联 "分组" 操作的收集器，根据分类函数对元素进行分组，
// 然后使用指定的下游收集器对与给定键相关的值执行还原操作。
static <T,K,D,A,M extends Map<K,D>> Collector<T,?,M> groupingBy(
    Function<? super T,? extends K> classifier, 
    Supplier<M> mapFactory, 
    Collector<? super T,A,D> downstream) 

// 返回对 T 类型输入元素执行 "分组" 操作的并发收集器，根据分类函数对元素进行分组。
static <T,K> Collector<T,?,ConcurrentMap<K,List<T>>> groupingByConcurrent(
    Function<? super T,? extends K> classifier) 

// 返回对 T 类型输入元素执行级联 "分组" 操作的并发收集器，根据分类函数对元素进行分组，
// 然后使用指定的下游收集器对与给定键相关的值执行还原操作。
static <T,K,A,D> Collector<T,?,ConcurrentMap<K,D>> groupingByConcurrent(
    Function<? super T,? extends K> classifier, 
    Collector<? super T,A,D> downstream) 

// 返回对 T 类型输入元素执行级联 "分组" 操作的并发收集器，根据分类函数对元素进行分组，
// 然后使用指定的下游收集器对与给定键相关的值执行还原操作。
static <T,K,A,D,M extends ConcurrentMap<K,D>> Collector<T,?,M> groupingByConcurrent(
    Function<? super T,? extends K> classifier, 
    Supplier<M> mapFactory, 
    Collector<? super T,A,D> downstream)

示例：

假如我们有一个字符串 List，我们根据字符串的长度进行分组，如下：

package com.hxstrive.jdk8.stream_api.group_by;

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

/**
* groupingBy 方法
* @author HuangXin
* @since 1.0.0  2024/6/24 9:23
*/
public class GroupByDemo1 {

   public static void main(String[] args) {
       List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");

       //根据流中元素字符串的长度进行分组
       Map<Integer,List<String>> map = list.stream().collect(Collectors.groupingBy(String::length));
       System.out.println(map);
       //输出：
       //{3=[one, two, six], 4=[four, five], 5=[three]}
   }

}

上例中，使用了只有分组函数的 groupingBy 方法，该方法默认将分组的值放到 List 中，源码如下：

public static <T, K> Collector<T, ?, Map<K, List<T>>>
   groupingBy(Function<? super T, ? extends K> classifier) {
       return groupingBy(classifier, toList());
   }

如果我们想将分组后的值放到 Set 或 Map 中，则可以通过第二个参数自定义，例如：

List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");

//根据流中元素字符串的长度进行分组
Map<Integer,List<String>> map = list.stream().collect(Collectors.groupingBy(String::length, Collectors.toList()));
System.out.println(map);
//输出：
//{3=[one, two, six], 4=[four, five], 5=[three]}

Map<Integer, Set<String>> map2 = list.stream().collect(Collectors.groupingBy(String::length, Collectors.toSet()));
System.out.println(map2);
//输出：{3=[six, one, two], 4=[four, five], 5=[three]}

//将分组后的值映射为Map，key 为元素值，value 为转换后大写值
Map<Integer,Map<String,String>> map3 = list.stream().collect(Collectors.groupingBy(String::length,
       Collectors.toMap(Function.identity(), String::toUpperCase, (a,b)-> a + "," + b)));
System.out.println(map3);
//输出：
//{3={six=SIX, one=ONE, two=TWO}, 4={four=FOUR, five=FIVE}, 5={three=THREE}}

如果你需要手动指定分组后的 Map 类型，可以使用 groupingBy 方法的第二个参数来指定，例如：

List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");

//根据流中元素字符串的长度进行分组
//通过第二个参数指定分组后的结果类型，这里使用LinkedHashMap，保证有序
Map<Integer,List<String>> map = list.stream().collect(
       //Collectors.groupingBy("分类器", "映射工厂", "下游收集器")
       Collectors.groupingBy(String::length, LinkedHashMap::new, Collectors.toList()));
System.out.println(map);
System.out.println(map.getClass());
//输出：
//{3=[one, two, six], 5=[three], 4=[four, five]}
//class java.util.LinkedHashMap

注意：

（1）默认情况下，groupingBy 方法使用 HashMap 来存储结果，这意味着结果可能是无序的。如果你需要保持元素的插入顺序或其他特定顺序，你应该提供一个自定义的映射工厂，如LinkedHashMap::new。

（2）当使用并行流（parallel streams）与groupingBy时，结果可能不是按照元素在原始流中的顺序分组的。这是因为并行流会分割数据并在多个线程上处理。如果需要保持顺序，应避免在并行流上使用groupingBy，或者提供一个线程安全的映射实现，但这可能会降低性能。

分片（partitioningBy）

当分类函数是一个 Predicate 函数 (即返回一个布尔值的函数) 时，流元素会被分为两组列表：一组是函数会返回 true 的元素，另一组返回 false 的元素。例如：

List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");

//根据流中元素字符串的长度进行分组
Map<Boolean,List<String>> map = list.stream().collect(
       Collectors.groupingBy(s -> s.length() > 3));
System.out.println(map);
//输出：
//{false=[one, two, six], true=[three, four, five]}

在这种情况下，使用 partitioningBy 会比使用 groupingBy 更有效率。

partitioningBy 方法类似于 groupingBy，但它只根据一个谓词（Predicate）将元素分成两部分：满足谓词的元素和不满足谓词的元素，结果是一个 Map，其中键是 Boolean 类型的值（true 或 false），值是满足或不满足谓词的元素的列表。

方法定义：

// 返回一个收集器，该收集器根据 Predicate 对输入元素进行分区，
// 并将它们组织成 Map<Boolean，List<T>>。
static <T> Collector<T,?,Map<Boolean,List<T>>> partitioningBy(
       Predicate<? super T> predicate)

// 返回一个收集器，该收集器根据 Predicate 对输入元素进行分区，
// 根据另一个收集器对每个分区中的值进行还原，并将它们组织到 Map<Boolean, D> 中，
// 该 Map 的值是下游还原的结果。
static <T,D,A> Collector<T,?,Map<Boolean,D>> partitioningBy(
       Predicate<? super T> predicate, Collector<? super T,A,D> downstream)

示例：

将List 中的字符串按照长度分成长度大于 3，和长度小于等于 3 的两组。

package com.hxstrive.jdk8.stream_api.partitioning_by;

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

/**
* partitioningBy 方法
* @author HuangXin
* @since 1.0.0  2024/6/24 9:23
*/
public class PartitionByDemo5 {

   public static void main(String[] args) {
       List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");

       //根据流中元素字符串的长度进行分组
       Map<Boolean,List<String>> map = list.stream().collect(
               Collectors.partitioningBy(s -> s.length() > 3));
       System.out.println(map);
       //输出：
       //{false=[one, two, six], true=[three, four, five]}
   }

}

注意：如果你调用 groupingByConcurrent 方法，便会获得一个并发 map，当用于并行流时可以并发地插入值。

其他方法

Java8 还提供了其他一些收集器，用来对分组后的元素进行下游处理，介绍如下：

counting 方法会返回所收集元素的总个数。例如：

List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");
Map<Boolean,Long> map = list.stream().collect(
       Collectors.groupingBy(s -> s.length() > 3, Collectors.counting()));
System.out.println(map);
//输出：
//{false=3, true=3}

summing(Int|LongIDouble) 方法接受一个函数作为参数，它会将该函数应用到下游元素中，并生成它们的求和。例如：

List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");
Map<Boolean,Long> map = list.stream().collect(
       Collectors.groupingBy(s -> s.length() > 3, Collectors.summingLong(String::length)));
System.out.println(map);
//输出：
//{false=9, true=13}

maxBy 方法和 minBy 会接受一个比较器，并生成下游元素中的最大值和最小值，例如：

List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");
Map<Boolean, Optional<String>> map = list.stream().collect(
       Collectors.groupingBy(s -> s.length() > 3, Collectors.maxBy(String::compareTo)));
System.out.println(map);
//输出：
//{false=Optional[two], true=Optional[three]}

List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");
Map<Boolean, Optional<String>> map = list.stream().collect(
       Collectors.groupingBy(s -> s.length() > 3, Collectors.minBy(String::compareTo)));
System.out.println(map);
//输出：
//{false=Optional[one], true=Optional[five]}

mapping 方法会将一个函数应用到下游结果上，并且需要另一个收集器来处理结果。例如：

List<String> list = Arrays.asList("one", "two", "three", "four", "five", "six");
Map<Boolean, Optional<String>> map = list.stream().collect(
       Collectors.groupingBy(s -> s.length() > 3,
               Collectors.mapping(String::toUpperCase, Collectors.maxBy(String::compareTo))));
System.out.println(map);
//输出：
//{false=Optional[TWO], true=Optional[THREE]}

上一章：Java8 将结果收集到Map中下一章：Java8 原始类型流

说说我的看法

* 必填

全部评论（0）

没有评论

更多教程

关于

本网站专注于 Java、数据库（MySQL、Oracle）、Linux、软件架构及大数据等多领域技术知识分享。涵盖丰富的原创与精选技术文章，助力技术传播与交流。无论是技术新手渴望入门，还是资深开发者寻求进阶，这里都能为您提供深度见解与实用经验，让复杂编码变得轻松易懂，携手共赴技术提升新高度。如有侵权，请来信告知：hxstrive@outlook.com

链接

其他应用

开源镜像网站

公众号