[re-apply gcc-4.8/trunk r192676 which was reverted in r192701 due to regressions, had a followup fix for PR55030 in r193802/r193803, which itself had a folloup fix for PR55511 in r193911; in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55030#c10 comments #10 and #11 Eric Botcazou suggests reinstating r192676 with the dse.c and cselib.c hunks of r193802 reverted ] Date: Thu, 11 Oct 2012 19:34:33 -0400 (EDT) From: Hans-Peter Nilsson Subject: [RFA:] Fix frame-pointer-clobbering in builtins.c:expand_builtin_setjmp_receiver List-Archive: The md.texi entry for nonlocal_goto_receiver says "A typical reason why you might need this pattern is if some value, such as a pointer to a global table, must be restored when the frame pointer is restored. Note that a nonlocal goto only occurs within a unit-of-translation, so a global table pointer that is shared by all functions of a given module need not be restored". One use would be to restore a hardware-register-value saved in the current frame; the frame where __builtin_setjmp is called (i.e. not a global context). This is what the MMIX port does, saving the register-stack-pointer-register for use when unwinding the register stack. I can imagine other similar register-restoring needs that require something saved in the current frame, but current ports with that pattern (or setjmp_receiver) but without a nonlocal_goto pattern (see the HAVE_nonlocal_goto condition in the patch) don't. (The AVR port performs what appears to be a cargo-cult song and dance; the bug below copied into the port, but the port will not otherwise be affected by the bug or this patch, as it has a nonlocal_goto pattern.) But, in the builtins.c:expand_builtin_setjmp_receiver, the frame-pointer is *clobbered* for a mysterious and fuddy reason: /* This might change the hard frame pointer in ways that aren't apparent to early optimization passes, so force a clobber. */ emit_clobber (hard_frame_pointer_rtx); That comment might have been true eons ago, but these days clobbering the frame-pointer means that its value is void and any restoring insns emitted before the clobber are deleted *because* of the clobber. For example, in built-in-setjmp.c at -O2. Before .191r.ud_dce (i.e. in the .190r.init-regs dump): (code_label/s 47 115 48 3 3 "" [2 uses]) (note 48 47 49 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 49 48 168 3 (use (reg/f:DI 253 $253)) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 -1 (nil)) (insn 168 49 51 3 (set (reg/f:DI 253 $253) (plus:DI (reg/f:DI 253 $253) (const_int 24 [0x18]))) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 -1 (nil)) (insn 51 168 52 3 (clobber (reg/f:DI 253 $253)) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 -1 (nil)) (insn 52 51 54 3 (parallel [ (unspec_volatile [ (reg/f:DI 253 $253) ] 1) (clobber (scratch:DI)) (clobber (reg:DI 259 rJ)) ]) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 63 {*nonlocal_goto_receiver_expanded} (expr_list:REG_UNUSED (reg:DI 259 rJ) (nil))) (insn 54 52 136 3 (asm_input/v ("") (null):0) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 -1 (nil)) After: (code_label/s 47 115 48 3 3 "" [2 uses]) (note 48 47 49 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 49 48 51 3 (use (reg/f:DI 253 $253)) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 -1 (nil)) (insn 51 49 52 3 (clobber (reg/f:DI 253 $253)) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 -1 (nil)) (insn 52 51 54 3 (parallel [ (unspec_volatile [ (reg/f:DI 253 $253) ] 1) (clobber (scratch:DI)) (clobber (reg:DI 259 rJ)) ]) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 63 {*nonlocal_goto_receiver_expanded} (expr_list:REG_UNUSED (reg:DI 259 rJ) (nil))) (insn 54 52 136 3 (asm_input/v ("") (null):0) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 -1 (nil)) Note that insn 168 deleted, which seems a logical optimization. The bug is to emit the clobber, not that the restoring insn is removed. While grepping around for other emitters of nonlocal_goto_receiver I noticed that builtins.c:expand_builtin_setjmp_receiver is identical to stmt.c:expand_nl_goto_receiver save for two things: the frame-pointer clobbering(!) and that expand_builtin_setjmp_receiver instead prefers to emit setjmp_receiver. I don't see how the frame-pointer-clobbering would be needed as part of emitting setjmp_receiver. I suggest eliminating the bug and one copy of the apparently bug-prone code. I kept the function in builtins.c for obvious reasons (if not obvious, consider the name: expand *builtin* setjmp_receiver) with the setjmp-ness expressed through the label parameter, which is non-NULL for pre-existing calls. Note also the fixed clobber-comment, obviously incorrect in the stmt.c almost-copy, and at least on the wrong line in expand_builtin_setjmp_receiver. Tested mmix-knuth-mmixware, x86_64-unknown-linux-gnu (for good measure) and rl78-elf (a SJLJ target; fixed-up with a patch from the maintainer for the current breakage in PR54882, but without fortran), no regressions. This fixes the following FAILs for mmix-knuth-mmixware: Running /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp ... ... FAIL: gcc.c-torture/execute/built-in-setjmp.c execution, -O2 FAIL: gcc.c-torture/execute/built-in-setjmp.c execution, -O3 -fomit-frame-pointer FAIL: gcc.c-torture/execute/built-in-setjmp.c execution, -O3 -fomit-frame-pointer -funroll-loops FAIL: gcc.c-torture/execute/built-in-setjmp.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-function\ s FAIL: gcc.c-torture/execute/built-in-setjmp.c execution, -O3 -g FAIL: gcc.c-torture/execute/built-in-setjmp.c execution, -Os FAIL: gcc.c-torture/execute/built-in-setjmp.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none FAIL: gcc.c-torture/execute/built-in-setjmp.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects ... Running /tmp/mmiximp2/gcc/gcc/testsuite/gcc.dg/torture/stackalign/stackalign.exp ... ... FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution t\ est FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -g execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -Os execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution t\ est FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -g execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -Os execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution t\ est FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -g execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -Os execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test Ok to commit? gcc/ 2012-10-22 Hans-Peter Nilsson * stmt.c (expand_nl_goto_receiver): Remove almost-copy of expand_builtin_setjmp_receiver. (expand_label): Adjust, call expand_builtin_setjmp_receiver with NULL for the label parameter. * builtins.c (expand_builtin_setjmp_receiver): Don't clobber the frame-pointer. Adjust comments. [HAVE_builtin_setjmp_receiver]: Emit builtin_setjmp_receiver only if LABEL is non-NULL. --- gcc-4.8.0/gcc/builtins.c.~1~ 2013-01-10 21:38:27.000000000 +0100 +++ gcc-4.8.0/gcc/builtins.c 2013-05-26 14:31:15.791534231 +0200 @@ -883,14 +883,15 @@ expand_builtin_setjmp_setup (rtx buf_add } /* Construct the trailing part of a __builtin_setjmp call. This is - also called directly by the SJLJ exception handling code. */ + also called directly by the SJLJ exception handling code. + If RECEIVER_LABEL is NULL, instead contruct a nonlocal goto handler. */ void expand_builtin_setjmp_receiver (rtx receiver_label ATTRIBUTE_UNUSED) { rtx chain; - /* Clobber the FP when we get here, so we have to make sure it's + /* Mark the FP as used when we get here, so we have to make sure it's marked as used by this function. */ emit_use (hard_frame_pointer_rtx); @@ -905,17 +906,28 @@ expand_builtin_setjmp_receiver (rtx rece #ifdef HAVE_nonlocal_goto if (! HAVE_nonlocal_goto) #endif - { - emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx); - /* This might change the hard frame pointer in ways that aren't - apparent to early optimization passes, so force a clobber. */ - emit_clobber (hard_frame_pointer_rtx); - } + /* First adjust our frame pointer to its actual value. It was + previously set to the start of the virtual area corresponding to + the stacked variables when we branched here and now needs to be + adjusted to the actual hardware fp value. + + Assignments to virtual registers are converted by + instantiate_virtual_regs into the corresponding assignment + to the underlying register (fp in this case) that makes + the original assignment true. + So the following insn will actually be decrementing fp by + STARTING_FRAME_OFFSET. */ + emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx); #if !HARD_FRAME_POINTER_IS_ARG_POINTER if (fixed_regs[ARG_POINTER_REGNUM]) { #ifdef ELIMINABLE_REGS + /* If the argument pointer can be eliminated in favor of the + frame pointer, we don't need to restore it. We assume here + that if such an elimination is present, it can always be used. + This is the case on all known machines; if we don't make this + assumption, we do unnecessary saving on many machines. */ size_t i; static const struct elims {const int from, to;} elim_regs[] = ELIMINABLE_REGS; @@ -936,7 +948,7 @@ expand_builtin_setjmp_receiver (rtx rece #endif #ifdef HAVE_builtin_setjmp_receiver - if (HAVE_builtin_setjmp_receiver) + if (receiver_label != NULL && HAVE_builtin_setjmp_receiver) emit_insn (gen_builtin_setjmp_receiver (receiver_label)); else #endif --- gcc-4.8.0/gcc/stmt.c.~1~ 2013-02-27 08:26:53.000000000 +0100 +++ gcc-4.8.0/gcc/stmt.c 2013-05-26 14:31:15.791534231 +0200 @@ -104,7 +104,6 @@ extern basic_block label_to_block_fn (st static int n_occurrences (int, const char *); static bool tree_conflicts_with_clobbers_p (tree, HARD_REG_SET *); -static void expand_nl_goto_receiver (void); static bool check_operand_nalternatives (tree, tree); static bool check_unique_operand_names (tree, tree, tree); static char *resolve_operand_name_1 (char *, tree, tree, tree); @@ -198,7 +197,7 @@ expand_label (tree label) if (DECL_NONLOCAL (label)) { - expand_nl_goto_receiver (); + expand_builtin_setjmp_receiver (NULL); nonlocal_goto_handler_labels = gen_rtx_EXPR_LIST (VOIDmode, label_r, nonlocal_goto_handler_labels); @@ -1554,77 +1553,6 @@ expand_return (tree retval) } } -/* Emit code to restore vital registers at the beginning of a nonlocal goto - handler. */ -static void -expand_nl_goto_receiver (void) -{ - rtx chain; - - /* Clobber the FP when we get here, so we have to make sure it's - marked as used by this function. */ - emit_use (hard_frame_pointer_rtx); - - /* Mark the static chain as clobbered here so life information - doesn't get messed up for it. */ - chain = targetm.calls.static_chain (current_function_decl, true); - if (chain && REG_P (chain)) - emit_clobber (chain); - -#ifdef HAVE_nonlocal_goto - if (! HAVE_nonlocal_goto) -#endif - /* First adjust our frame pointer to its actual value. It was - previously set to the start of the virtual area corresponding to - the stacked variables when we branched here and now needs to be - adjusted to the actual hardware fp value. - - Assignments are to virtual registers are converted by - instantiate_virtual_regs into the corresponding assignment - to the underlying register (fp in this case) that makes - the original assignment true. - So the following insn will actually be - decrementing fp by STARTING_FRAME_OFFSET. */ - emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx); - -#if !HARD_FRAME_POINTER_IS_ARG_POINTER - if (fixed_regs[ARG_POINTER_REGNUM]) - { -#ifdef ELIMINABLE_REGS - /* If the argument pointer can be eliminated in favor of the - frame pointer, we don't need to restore it. We assume here - that if such an elimination is present, it can always be used. - This is the case on all known machines; if we don't make this - assumption, we do unnecessary saving on many machines. */ - static const struct elims {const int from, to;} elim_regs[] = ELIMINABLE_REGS; - size_t i; - - for (i = 0; i < ARRAY_SIZE (elim_regs); i++) - if (elim_regs[i].from == ARG_POINTER_REGNUM - && elim_regs[i].to == HARD_FRAME_POINTER_REGNUM) - break; - - if (i == ARRAY_SIZE (elim_regs)) -#endif - { - /* Now restore our arg pointer from the address at which it - was saved in our stack frame. */ - emit_move_insn (crtl->args.internal_arg_pointer, - copy_to_reg (get_arg_pointer_save_area ())); - } - } -#endif - -#ifdef HAVE_nonlocal_goto_receiver - if (HAVE_nonlocal_goto_receiver) - emit_insn (gen_nonlocal_goto_receiver ()); -#endif - - /* We must not allow the code we just generated to be reordered by - scheduling. Specifically, the update of the frame pointer must - happen immediately, not later. */ - emit_insn (gen_blockage ()); -} /* Emit code to save the current value of stack. */ rtx