Neo4j/Cypher: Combining COUNT and COLLECT in One Query

Curator's Note: Check raw code snippets to see properly formatted tables.

In my continued playing around with football data I wanted to write a cypher query against neo4j which would show me which teams had missed the most penalties this season and who missed them.

I started off with a query that returned all the penalties that have been missed this season and the games those missed happened in:

START player = node:players('name:*')
MATCH player-[:missed_penalty_in]-game, 
      player-[:played|subbed_on]-stats-[:in]-game,
      stats-[:for]-team,
      game-[:home_team]-home,
      game-[:away_team]-away
RETURN player.name, team.name, home.name, away.name

+-------------------------------------------------------------------------------------------------+
| player.name          | team.name              | home.name              | away.name              |
+-------------------------------------------------------------------------------------------------+
| "Papiss Demba Cisse" | "Newcastle United"     | "Newcastle United"     | "Norwich City"         |
| "Wayne Rooney"       | "Manchester United"    | "Manchester United"    | "Arsenal"              |
| "Mikel Arteta"       | "Arsenal"              | "Arsenal"              | "Fulham"               |
| "David Silva"        | "Manchester City"      | "Manchester City"      | "Southampton"          |
| "Frank Lampard"      | "Chelsea"              | "Manchester City"      | "Chelsea"              |
| "Adel Taarabt"       | "Queens Park Rangers"  | "Queens Park Rangers"  | "Norwich City"         |
| "Javier Hernández"   | "Manchester United"    | "Manchester United"    | "Wigan Athletic"       |
| "Robin Van Persie"   | "Manchester United"    | "Southampton"          | "Manchester United"    |
| "Jonathan Walters"   | "Stoke City"           | "Fulham"               | "Stoke City"           |
| "Shane Long"         | "West Bromwich Albion" | "West Bromwich Albion" | "Liverpool"            |
| "Steven Gerrard"     | "Liverpool"            | "Liverpool"            | "West Bromwich Albion" |
| "Lucas Piazon"       | "Chelsea"              | "Chelsea"              | "Aston Villa"          |
+-------------------------------------------------------------------------------------------------+
12 rows

(there should actually be another penalty miss for Jonathan Walters against Chelsea but for some reason the data source has missed it off!

I then grouped the penalty misses by team so that I’d have one row for each team and a collection showing the people who’d missed.

We can use the COLLECT function to do the latter:

START player = node:players('name:*')
MATCH player-[:missed_penalty_in]-game, 
      player-[:played|subbed_on]-stats-[:in]-game,
      stats-[:for]-team
RETURN DISTINCT team.name, COLLECT(player.name) AS players

I wanted to order the teams by the number of penalties they’d missed so Manchester United would be first in the table in this case and initially tried to order the results by a count of players:
START player = node:players('name:*')
MATCH player-[:missed_penalty_in]-game, 
      player-[:played|subbed_on]-stats-[:in]-game,
      stats-[:for]-team
RETURN DISTINCT team.name, COLLECT(player.name) AS players
ORDER BY COUNT(player.name)

which doesn’t actually compile:

SyntaxException: Aggregation expressions must be listed in the RETURN clause to be used in ORDER BY

I tried a few other variations such as the following:

START player = node:players('name:*')
MATCH player-[:missed_penalty_in]-game, 
      player-[:played|subbed_on]-stats-[:in]-game,
      stats-[:for]-team
RETURN DISTINCT team.name, COUNT(player.name) AS numberOfPlayers, 
       COLLECT(player.name) AS players
ORDER BY numberOfPlayers DESC

which again doesn’t compile:

SyntaxException: Aggregation expressions must be listed in the RETURN clause to be used in ORDER BY

I eventually found a post by Andres where he explains that you need to split the query into two and make use of WITH if you want to make use of two aggregation expressions.

I ended up with the following query which does the job:

START player = node:players('name:*')
MATCH player-[:missed_penalty_in]-game, 
      player-[:played|subbed_on]-stats-[:in]-game,
      stats-[:for]-team
WITH DISTINCT team, COLLECT(player.name) AS players
 
MATCH player-[:missed_penalty_in]-game, 
      player-[:played|subbed_on]-stats-[:in]-game,
      stats-[:for]-team
WITH DISTINCT team, COUNT(player) AS numberOfPlayers, players
 
RETURN team.name, players
ORDER BY numberOfPlayers DESC
+---------------------------------------------------------------------------------+
| team.name              | players                                                |
+---------------------------------------------------------------------------------+
| "Manchester United"    | ["Wayne Rooney","Javier Hernández","Robin Van Persie"] |
| "Chelsea"              | ["Frank Lampard","Lucas Piazon"]                       |
| "Liverpool"            | ["Steven Gerrard"]                                     |
| "Manchester City"      | ["David Silva"]                                        |
| "Newcastle United"     | ["Papiss Demba Cisse"]                                 |
| "Queens Park Rangers"  | ["Adel Taarabt"]                                       |
| "Stoke City"           | ["Jonathan Walters"]                                   |
| "Arsenal"              | ["Mikel Arteta"]                                       |
| "West Bromwich Albion" | ["Shane Long"]                                         |
+---------------------------------------------------------------------------------+
9 rows




 

 

 

 

Top