Result Grouping

Result Grouping groups documents with a common field value into groups and returns the top documents for each group.

For example, if you searched for "DVD" on an electronic retailer’s e-commerce site, you might be returned three categories such as "TV and Video", "Movies", and "Computers" with three results per category. In this case, the query term "DVD" appeared in all three categories, so Solr groups them together in order to increase relevancy for the user.

Note
Prefer Collapse & Expand instead

Solr’s Collapse and Expand feature is newer and mostly overlaps with Result Grouping. There are features unique to both, and they have different performance characteristics. That said, in most cases Collapse and Expand is preferable to Result Grouping.

Result Grouping is separate from Faceting. Though it is conceptually similar, faceting returns all relevant results and allows the user to refine the results based on the facet category. For example, if you search for "shoes" on a footwear retailer’s e-commerce site, Solr would return all results for that query term, along with selectable facets such as "size," "color," "brand," and so on.

You can however combine grouping with faceting. Grouped faceting supports facet.field and facet.range but currently doesn’t support date and pivot faceting. The facet counts are computed based on the first group.field parameter, and other group.field parameters are ignored.

Grouped faceting differs from non grouped facets (sum of all facets) == (total of products with that property) as shown in the following example:

Object 1

  • name: Phaser 4620a

  • ppm: 62

  • product_range: 6

Object 2

  • name: Phaser 4620i

  • ppm: 65

  • product_range: 6

Object 3

  • name: ML6512

  • ppm: 62

  • product_range: 7

If you ask Solr to group these documents by "product_range", then the total amount of groups is 2, but the facets for ppm are 2 for 62 and 1 for 65.

Grouping Parameters

Result Grouping takes the following request parameters. Any number of these request parameters can be included in a single request:

group

If true, query results will be grouped.

group.field

The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as ExternalFileField. It must also be a string-based field, such as StrField or TextField

group.func

Group based on the unique values of a function query.

Note
This option does not work with distributed searches.
group.query

Return a single group of documents that match the given query.

rows

The number of groups to return. The default value is 10.

start

Specifies an initial offset for the list of groups.

group.limit

Specifies the number of results to return for each group. The default value is 1.

group.offset

Specifies an initial offset for the document list of each group.

sort

Specifies how Solr sorts the groups relative to each other. For example, sort=popularity desc will cause the groups to be sorted according to the highest popularity document in each group. The default value is score desc.

group.sort

Specifies how Solr sorts documents within each group. The default behavior if group.sort is not specified is to use the same effective value as the sort parameter.

group.format

If this parameter is set to simple, the grouped documents are presented in a single flat list, and the start and rows parameters affect the numbers of documents instead of groups. An alternate value for this parameter is grouped.

group.main

If true, the result of the first field grouping command is used as the main result list in the response, using group.format=simple.

group.ngroups

If true, Solr includes the number of groups that have matched the query in the results. The default value is false.

See below for Distributed Result Grouping Caveats when using sharded indexes.

group.truncate

If true, facet counts are based on the most relevant document of each group matching the query. The default value is false.

group.facet

Determines whether to compute grouped facets for the field facets specified in facet.field parameters. Grouped facets are computed based on the first specified group. As with normal field faceting, fields shouldn’t be tokenized (otherwise counts are computed for each token). Grouped faceting supports single and multivalued fields. Default is false.

Warning
There can be a heavy performance cost to this option.

See below for Distributed Result Grouping Caveats when using sharded indexes.

group.cache.percent

Setting this parameter to a number greater than 0 enables caching for result grouping. Result Grouping executes two searches; this option caches the second search. The default value is 0. The maximum value is 100.

Testing has shown that group caching only improves search time with Boolean, wildcard, and fuzzy queries. For simple queries like term or "match all" queries, group caching degrades performance.

Any number of group commands (e.g., group.field, group.func, group.query, etc.) may be specified in a single request.

Grouping Examples

All of the following sample queries work with Solr’s “bin/solr -e techproducts” example.

Grouping Results by Field

In this example, we will group results based on the manu_exact field, which specifies the manufacturer of the items in the sample dataset.

http://localhost:8983/solr/techproducts/select?fl=id,name&q=solr+memory&group=true&group.field=manu_exact

{
"..."
"grouped":{
  "manu_exact":{
    "matches":6,
    "groups":[{
        "groupValue":"Apache Software Foundation",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"SOLR1000",
              "name":"Solr, the Enterprise Search Server"}]
        }},
      {
        "groupValue":"Corsair Microsystems Inc.",
        "doclist":{"numFound":2,"start":0,"docs":[
            {
              "id":"VS1GB400C3",
              "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"}]
        }},
      {
        "groupValue":"A-DATA Technology Inc.",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"VDBDB1A16",
              "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"}]
        }},
      {
        "groupValue":"Canon Inc.",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"0579B002",
              "name":"Canon PIXMA MP500 All-In-One Photo Printer"}]
        }},
      {
        "groupValue":"ASUS Computer Inc.",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"EN7800GTX/2DHTV/256M",
              "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
        }
      }]}}}

The response indicates that there are six total matches for our query. For each of the five unique values of group.field, Solr returns a docList for that groupValue such that the numFound indicates the total number of documents in that group, and the top documents are returned according to the implicit default group.limit=1 and group.sort=score desc parameters. The resulting groups are then sorted by the score of the top document within each group based on the implicit sort=score desc, and the number of groups returned is limited to the implicit rows=10.

We can run the same query with the request parameter group.main=true. This will format the results as a single flat document list. This flat format does not include as much information as the normal result grouping query results – notably the numFound in each group – but it may be easier for existing Solr clients to parse.

http://localhost:8983/solr/techproducts/select?fl=id,name,manufacturer&q=solr+memory&group=true&group.field=manu_exact&group.main=true

{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"id,name,manufacturer",
      "indent":"true",
      "q":"solr memory",
      "group.field":"manu_exact",
      "group.main":"true",
      "group":"true"}},
  "grouped":{},
  "response":{"numFound":6,"start":0,"docs":[
      {
        "id":"SOLR1000",
        "name":"Solr, the Enterprise Search Server"},
      {
        "id":"VS1GB400C3",
        "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"},
      {
        "id":"VDBDB1A16",
        "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"},
      {
        "id":"0579B002",
        "name":"Canon PIXMA MP500 All-In-One Photo Printer"},
      {
        "id":"EN7800GTX/2DHTV/256M",
        "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
  }
}

Grouping by Query

In this example, we will use the group.query parameter to find the top three results for "memory" in two different price ranges: 0.00 to 99.99, and over 100.

http://localhost:8983/solr/techproducts/select?indent=true&fl=name,price&q=memory&group=true&group.query=price:0+TO+99.99&group.query=price:[100+TO+*]&group.limit=3

{
  "responseHeader":{
    "status":0,
    "QTime":42,
    "params":{
      "fl":"name,price",
      "indent":"true",
      "q":"memory",
      "group.limit":"3",
      "group.query":["price:[0 TO 99.99]",
      "price:[100 TO *]"],
      "group":"true"}},
  "grouped":{
    "price:[0 TO 99.99]":{
      "matches":5,
      "doclist":{"numFound":1,"start":0,"docs":[
          {
            "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",
            "price":74.99}]
      }},
    "price:[100 TO *]":{
      "matches":5,
      "doclist":{"numFound":3,"start":0,"docs":[
          {
            "name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
            "price":185.0},
          {
            "name":"Canon PIXMA MP500 All-In-One Photo Printer",
            "price":179.99},
          {
            "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)",
            "price":479.95}]
      }
    }
  }
}

In this case, Solr found five matches for "memory," but only returns four results grouped by price. This is because one result for "memory" did not have a price assigned to it.

Distributed Result Grouping Caveats

Grouping is supported for distributed searches, with some caveats:

  • Currently group.func is is not supported in any distributed searches

  • group.ngroups and group.facet require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations.