We instrumented a C++ program equipped with our own mark-region collector (using 64kB regions) that is equivalent to the following F# implementation of a list-based n-queens solver in order to gather information about the structures being collected:
let safe x1 y1 x2 y2 = x1 <> x2 && y1 <> y2 && x2 - x1 <> y2 - y1 && x1 - y2 <> x2 - y1 let ps n = [ for i in 1 .. n do for j in 1 .. n do yield i, j ] let rec filter x0 y0 = function |  ->  | (x1, y1 as q)::xys as list -> let xys' = filter x0 y0 xys if safe x0 y0 x1 y1 then if System.Object.ReferenceEquals(xys', xys) then list else q::xys' else xys' let rec search n nqs qs ps a = match ps with |  -> if nqs=n then a+1 else a | (x, y as q)::ps -> search n nqs qs ps a |> search n (nqs+1) (q::qs) (filter x y ps) let solve n = search n 0  (ps n) 0
Stopping this program on its 1,000th GC cycle when it is solving the 11-queens problem, extracting references between allocated but unmarked cons cells, filtering out connected components with 3 or more vertices and visualizing the result using Mathematica's GraphPlot function gives the following:
In this case, we find that just 6% of the cons cells die alone (i.e. singleton lists) and 84% die in a clump of two cons cells, 6.5% die in a clump of 3 cons cells and 3.4% die in clumps containing four or more cons cells. Only a handful of clumps had interesting structures.
Although these results suggest that it might be easy to obtain a speedup by adding a special case to the list type for lists containing two elements we have found that this is not the case because the required changes also add expensive operations elsewhere, e.g. destructuring the special two-element list now incurs an allocation. Therefore, the next best solution would be to optimize the garbage collector for this case but it would be prudent to analyze the clusters collected during the running of other programs first. Analyzing the garbage generated by a program using red-black trees would be interesting...