[backport from gcc-4.8/trunk r191363 ] From: Richard Sandiford Subject: Re: Finish up PR rtl-optimization/44194 Date: Sun, 16 Sep 2012 11:52:56 +0100 List-Archive: Eric Botcazou writes: > This is the PR about the useless spilling to memory of structures that are > returned in registers. It was essentially addressed last year by Easwaran with > an enhancement of the RTL DSE pass, but Easwaran also noted that we still spill > to memory in the simplest cases, e.g. gcc.dg/pr44194-1.c, because expand_call > creates a temporary on the stack to store the value returned in registers... > > The attached patch solves this problem by copying the value into pseudos > instead by means of emit_group_move_into_temps. This is sufficient to get rid > of the remaining memory accesses for gcc.dg/pr44194-1.c on x86-64 for example, > but not on strict-alignment platforms like SPARC64. Thanks for doing this. It'll obviously help n32 and n64 long doubles too. I hit one problem building libgfortran for mips64-linux-gnu. The calls.c change was: > Index: calls.c > =================================================================== > --- calls.c (revision 191198) > +++ calls.c (working copy) > @@ -3272,16 +3272,8 @@ expand_call (tree exp, rtx target, int i > else if (GET_CODE (valreg) == PARALLEL) > { > if (target == 0) > - { > - /* This will only be assigned once, so it can be readonly. */ > - tree nt = build_qualified_type (rettype, > - (TYPE_QUALS (rettype) > - | TYPE_QUAL_CONST)); > - > - target = assign_temp (nt, 1, 1); > - } > - > - if (! rtx_equal_p (target, valreg)) > + target = emit_group_move_into_temps (valreg); > + else if (!rtx_equal_p (target, valreg)) > emit_group_store (target, valreg, rettype, > int_size_in_bytes (rettype)); > > /* We can not support sibling calls for this case. */ > sibcall_failure = 1; But if we're trying to use a sibcall, we go through this loop twice, and the second iteration has to cope with a PARALLEL target created by the first. How about the patch below? Tested on mips64-linux-gnu, where a full testrun looked good. In some ways it's a bit silly to emit anything at all in the first iteration, given that we then go on to set sibcall_failure. It's not the kind of loop we can just continue out of though. Also, your patch probably means that we only need to set sibcall_failure for the emit_group_store case, although I've not tested that. Richard gcc/ 2012-09-16 Richard Sandiford * calls.c (expand_call): Use emit_group_move for PARALLEL->PARALLEL moves. --- gcc-4.7.2/gcc/calls.c.~1~ 2012-10-21 16:34:07.000000000 +0200 +++ gcc-4.7.2/gcc/calls.c 2012-10-21 16:34:54.000000000 +0200 @@ -3231,7 +3231,13 @@ expand_call (tree exp, rtx target, int i { if (target == 0) target = emit_group_move_into_temps (valreg); - else if (!rtx_equal_p (target, valreg)) + else if (rtx_equal_p (target, valreg)) + ; + else if (GET_CODE (target) == PARALLEL) + /* Handle the result of a emit_group_move_into_temps + call in the previous pass. */ + emit_group_move (target, valreg); + else emit_group_store (target, valreg, rettype, int_size_in_bytes (rettype));