Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AVR] Wrap out-of-bounds relative jumps #118015

Merged
merged 1 commit into from
Dec 27, 2024
Merged

Conversation

Patryk27
Copy link
Contributor

@Patryk27 Patryk27 commented Nov 28, 2024

This commit improves the relative jumps, so that we are able to emit rjmp that wraps around the memory boundary.

Motivation

Over Rahix/avr-hal#585 it was found out that the AVR codegen sometimes crashes with the out of range branch target message - this turned out to be a fallout from #106722 related to the fact that linkers are smarter than they seem!

The issue stems from the rjmp instruction - even though it can only encode offsets in the +-4 kB range, it can be used to jump into further memory addresses by relying on the fact that AVR memory wraps around the boundaries. That is, for a CPU with 8 kB of flash, rjmp 5000 (illegal) can be actually encoded as rjmp -3192 (legal).

Linkers apply this trick automatically, but since over #106722 we've decided to go with the let's avoid emitting relocations route...

We would've caught this, but since the branch-relaxation-long.ll test generated only the textual assembly, it never went near the assertion - I've changed the test to use -filetype=obj | llvm-objdump -dr now (which causes it to crash on the main branch, but succeeds with my changes here).

@Patryk27
Copy link
Contributor Author

cc @benshi001

@llvmbot llvmbot added the mc Machine (object) code label Nov 28, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 28, 2024

@llvm/pr-subscribers-mc

Author: Patryk Wychowaniec (Patryk27)

Changes

This commit changes shouldForceRelocation() to always return true for indirect jumps, direct jumps and calls.

Motivation

Over Rahix/avr-hal#585 it was found out that the AVR codegen sometimes crashes with the out of range branch target message - this turned out to be a fallout from #106722 related to the fact that linkers are smarter than they seem!

The issue stems from the rjmp instruction - even though it can only encode offsets in the +-4 kB range, it can be used to jump into further memory addresses by relying on the fact that AVR memory wraps around the boundaries. That is, for a CPU with 8 kB of flash, rjmp 5000 (illegal) can be actually encoded as rjmp -3192 (legal).

Linkers apply this trick automatically, but since over #106722 we've decided to go with the let's avoid emitting relocations route...

We would've caught this, but since the branch-relaxation-long.ll test generated only the textual assembly, it never went near the assertion - I've changed the test to use -filetype=obj | llvm-objdump -dr now (which causes it to crash on the main branch, but succeeds with my changes here).

Alternatives

We could re-do the wrapping logic directly within the codegen - in fact, that's what I started with! I've taught AVRDevices.td about the flash sizes (relying on data from avr-gcc):

class Device<string Name, Family Fam, ELFArch Arch, int FlashSizeArg,
             list<SubtargetFeature> ExtraFeatures = []>
    : Processor<Name, NoItineraries, !listconcat([Fam, Arch], ExtraFeatures)> {
  int FlashSize = FlashSizeArg;
}

def : Device<"avr1", FamilyAVR1, ELFArchAVR1, 0x400>;
def : Device<"avr2", FamilyAVR2, ELFArchAVR2, 0x60000>;
def : Device<"avr25", FamilyAVR25, ELFArchAVR25, 0x2000>;
def : Device<"avr3", FamilyAVR3, ELFArchAVR3, 0x6000>;
/* ... */

... created a new TableGen emitter:

static void EmitAVRTargetDef(RecordKeeper &RK, raw_ostream &OS) {
  OS << "// Autogenerated by AVRTargetDefEmitter.cpp\n\n";

  OS << "namespace llvm {\n";
  OS << "namespace AVR {\n";
  OS << "\n";
  OS << "static const std::map<StringRef, uint32_t> FlashSizes = {\n";

  for (const Record *Rec : RK.getAllDerivedDefinitions("Device")) {
    OS << "  { \"" << Rec->getValueAsString("Name") << "\", "
       << Rec->getValueAsInt("FlashSize") << " },\n";
  }

  OS << "};\n";
  OS << "\n";
  OS << "inline uint32_t getFlashSize(StringRef cpu) {\n";
  OS << "  return FlashSizes.at(cpu);\n";
  OS << "}\n";
  OS << "\n";
  OS << "} // end of namespace AVR\n";
  OS << "} // end of namespace llvm\n";
}

static TableGen::Emitter::Opt X("gen-avr-target-def", EmitAVRTargetDef,
                                "Generate the list of CPUs for AVR");

... adjusted adjustRelativeBranch() to access this brand new AVR::getFlashSize():

/// Adjusts the value of a relative branch target before fixup application.
static void adjustRelativeBranch(unsigned Size, const MCFixup &Fixup,
                                 uint64_t &Value, MCContext *Ctx) {
  // Jumps are relative to the current instruction.
  Value -= 2;

  // We have one extra bit of precision because the value is rightshifted by
  // one.
  Size += 1;

  if (!isIntN(Size, Value)) {
    uint64_t FlashSize =
        AVR::getFlashSize(Ctx->getSubtargetInfo()->getCPU().data());

    uint64_t ReverseValue =
        Value > 0 ? (uint64_t)((int32_t)Value - (int32_t)FlashSize)
                  : (uint64_t)((int32_t)FlashSize + (int32_t)Value);

    if (isIntN(Size, ReverseValue)) {
      Value = ReverseValue;
    }
  }

  ensureSignedWidth(Size, Value, std::string("branch target"), Fixup, Ctx);

  AVR::fixups::adjustBranchTarget(Value);
}

... and it worked!

I've eventually settled on relocations, because that's both the simpler approach and what we'd like to have anyway (to resemble avr-gcc, as I remember from discussions from the past).


Patch is 30.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118015.diff

26 Files Affected:

  • (modified) llvm/lib/Target/AVR/AVRInstrInfo.cpp (+14-8)
  • (modified) llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp (+26-129)
  • (modified) llvm/test/CodeGen/AVR/branch-relaxation-long.ll (+38-36)
  • (modified) llvm/test/CodeGen/AVR/jmp.ll (+3-2)
  • (modified) llvm/test/MC/AVR/inst-brbc.s (+8-6)
  • (modified) llvm/test/MC/AVR/inst-brbs.s (+8-6)
  • (modified) llvm/test/MC/AVR/inst-brcc.s (+8-4)
  • (modified) llvm/test/MC/AVR/inst-brcs.s (+8-4)
  • (modified) llvm/test/MC/AVR/inst-breq.s (+8-4)
  • (modified) llvm/test/MC/AVR/inst-brge.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brhc.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brhs.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brid.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brie.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brlo.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brlt.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brmi.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brne.s (+8-4)
  • (modified) llvm/test/MC/AVR/inst-brpl.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brsh.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brtc.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brts.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brvc.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-brvs.s (+6-3)
  • (modified) llvm/test/MC/AVR/inst-rcall.s (+9-5)
  • (modified) llvm/test/MC/AVR/inst-rjmp.s (+25-15)
diff --git a/llvm/lib/Target/AVR/AVRInstrInfo.cpp b/llvm/lib/Target/AVR/AVRInstrInfo.cpp
index 7d58ece95c869f..6f6da56367e0cb 100644
--- a/llvm/lib/Target/AVR/AVRInstrInfo.cpp
+++ b/llvm/lib/Target/AVR/AVRInstrInfo.cpp
@@ -557,17 +557,23 @@ void AVRInstrInfo::insertIndirectBranch(MachineBasicBlock &MBB,
                                         MachineBasicBlock &RestoreBB,
                                         const DebugLoc &DL, int64_t BrOffset,
                                         RegScavenger *RS) const {
-  // This method inserts a *direct* branch (JMP), despite its name.
-  // LLVM calls this method to fixup unconditional branches; it never calls
-  // insertBranch or some hypothetical "insertDirectBranch".
-  // See lib/CodeGen/RegisterRelaxation.cpp for details.
-  // We end up here when a jump is too long for a RJMP instruction.
+  // When an instruction's jump target is too far away¹, the branch relaxation
+  // pass might split an existing, large basic block into two pieces - if that
+  // happens, this function gets called to create a new jump between the blocks.
+  //
+  // Since this new jump might be too large for an `rjmp` anyway, let's
+  // conservatively emit `jmp` and back off to `rjmp` only if the target arch
+  // doesn't support `jmp`.
+  //
+  // TODO maybe we could use `BrOffset` to determine whether it's safe to emit
+  //      `rjmp`? (it takes one byte less to encode, so it'd be a small win)
+  //
+  // ¹ "too far" in the sense of "the instruction encoding doesn't allow for an
+  //   offset that large"
+
   if (STI.hasJMPCALL())
     BuildMI(&MBB, DL, get(AVR::JMPk)).addMBB(&NewDestBB);
   else
-    // The RJMP may jump to a far place beyond its legal range. We let the
-    // linker to report 'out of range' rather than crash, or silently emit
-    // incorrect assembly code.
     BuildMI(&MBB, DL, get(AVR::RJMPk)).addMBB(&NewDestBB);
 }
 
diff --git a/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp b/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp
index 17c48c2fc35ff8..be3ca4cfcd94f5 100644
--- a/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp
+++ b/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp
@@ -27,129 +27,32 @@
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 
-// FIXME: we should be doing checks to make sure asm operands
-// are not out of bounds.
-
 namespace adjust {
 
 using namespace llvm;
 
-static void signed_width(unsigned Width, uint64_t Value,
-                         std::string Description, const MCFixup &Fixup,
-                         MCContext *Ctx = nullptr) {
-  if (!isIntN(Width, Value)) {
-    std::string Diagnostic = "out of range " + Description;
-
-    int64_t Min = minIntN(Width);
-    int64_t Max = maxIntN(Width);
-
-    Diagnostic += " (expected an integer in the range " + std::to_string(Min) +
-                  " to " + std::to_string(Max) + ")";
-
-    if (Ctx) {
-      Ctx->reportError(Fixup.getLoc(), Diagnostic);
-    } else {
-      llvm_unreachable(Diagnostic.c_str());
-    }
-  }
-}
-
-static void unsigned_width(unsigned Width, uint64_t Value,
-                           std::string Description, const MCFixup &Fixup,
-                           MCContext *Ctx = nullptr) {
+static void ensureUnsignedWidth(unsigned Width, uint64_t Value,
+                                std::string Description, const MCFixup &Fixup,
+                                MCContext *Ctx) {
   if (!isUIntN(Width, Value)) {
     std::string Diagnostic = "out of range " + Description;
 
     int64_t Max = maxUIntN(Width);
 
-    Diagnostic +=
-        " (expected an integer in the range 0 to " + std::to_string(Max) + ")";
+    Diagnostic += " (expected an unsigned integer in the range 0 to " +
+                  std::to_string(Max) + ", got " + std::to_string(Value) + ")";
 
-    if (Ctx) {
-      Ctx->reportError(Fixup.getLoc(), Diagnostic);
-    } else {
-      llvm_unreachable(Diagnostic.c_str());
-    }
+    Ctx->reportError(Fixup.getLoc(), Diagnostic);
   }
 }
 
-/// Adjusts the value of a branch target before fixup application.
-static void adjustBranch(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
-                         MCContext *Ctx = nullptr) {
-  // We have one extra bit of precision because the value is rightshifted by
-  // one.
-  unsigned_width(Size + 1, Value, std::string("branch target"), Fixup, Ctx);
-
-  // Rightshifts the value by one.
-  AVR::fixups::adjustBranchTarget(Value);
-}
-
-/// Adjusts the value of a relative branch target before fixup application.
-static void adjustRelativeBranch(unsigned Size, const MCFixup &Fixup,
-                                 uint64_t &Value, MCContext *Ctx = nullptr) {
-  // Jumps are relative to the current instruction.
-  Value -= 2;
-
-  // We have one extra bit of precision because the value is rightshifted by
-  // one.
-  signed_width(Size + 1, Value, std::string("branch target"), Fixup, Ctx);
-
-  // Rightshifts the value by one.
-  AVR::fixups::adjustBranchTarget(Value);
-}
-
-/// 22-bit absolute fixup.
-///
-/// Resolves to:
-/// 1001 kkkk 010k kkkk kkkk kkkk 111k kkkk
-///
-/// Offset of 0 (so the result is left shifted by 3 bits before application).
-static void fixup_call(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
-                       MCContext *Ctx = nullptr) {
-  adjustBranch(Size, Fixup, Value, Ctx);
-
-  auto top = Value & (0xf00000 << 6);   // the top four bits
-  auto middle = Value & (0x1ffff << 5); // the middle 13 bits
-  auto bottom = Value & 0x1f;           // end bottom 5 bits
-
-  Value = (top << 6) | (middle << 3) | (bottom << 0);
-}
-
-/// 7-bit PC-relative fixup.
-///
-/// Resolves to:
-/// 0000 00kk kkkk k000
-/// Offset of 0 (so the result is left shifted by 3 bits before application).
-static void fixup_7_pcrel(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
-                          MCContext *Ctx = nullptr) {
-  adjustRelativeBranch(Size, Fixup, Value, Ctx);
-
-  // Because the value may be negative, we must mask out the sign bits
-  Value &= 0x7f;
-}
-
-/// 12-bit PC-relative fixup.
-/// Yes, the fixup is 12 bits even though the name says otherwise.
-///
-/// Resolves to:
-/// 0000 kkkk kkkk kkkk
-/// Offset of 0 (so the result isn't left-shifted before application).
-static void fixup_13_pcrel(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
-                           MCContext *Ctx = nullptr) {
-  adjustRelativeBranch(Size, Fixup, Value, Ctx);
-
-  // Because the value may be negative, we must mask out the sign bits
-  Value &= 0xfff;
-}
-
 /// 6-bit fixup for the immediate operand of the STD/LDD family of
 /// instructions.
 ///
 /// Resolves to:
 /// 10q0 qq10 0000 1qqq
-static void fixup_6(const MCFixup &Fixup, uint64_t &Value,
-                    MCContext *Ctx = nullptr) {
-  unsigned_width(6, Value, std::string("immediate"), Fixup, Ctx);
+static void fixup_6(const MCFixup &Fixup, uint64_t &Value, MCContext *Ctx) {
+  ensureUnsignedWidth(6, Value, std::string("immediate"), Fixup, Ctx);
 
   Value = ((Value & 0x20) << 8) | ((Value & 0x18) << 7) | (Value & 0x07);
 }
@@ -160,8 +63,8 @@ static void fixup_6(const MCFixup &Fixup, uint64_t &Value,
 /// Resolves to:
 /// 0000 0000 kk00 kkkk
 static void fixup_6_adiw(const MCFixup &Fixup, uint64_t &Value,
-                         MCContext *Ctx = nullptr) {
-  unsigned_width(6, Value, std::string("immediate"), Fixup, Ctx);
+                         MCContext *Ctx) {
+  ensureUnsignedWidth(6, Value, std::string("immediate"), Fixup, Ctx);
 
   Value = ((Value & 0x30) << 2) | (Value & 0x0f);
 }
@@ -170,9 +73,8 @@ static void fixup_6_adiw(const MCFixup &Fixup, uint64_t &Value,
 ///
 /// Resolves to:
 /// 0000 0000 AAAA A000
-static void fixup_port5(const MCFixup &Fixup, uint64_t &Value,
-                        MCContext *Ctx = nullptr) {
-  unsigned_width(5, Value, std::string("port number"), Fixup, Ctx);
+static void fixup_port5(const MCFixup &Fixup, uint64_t &Value, MCContext *Ctx) {
+  ensureUnsignedWidth(5, Value, std::string("port number"), Fixup, Ctx);
 
   Value &= 0x1f;
 
@@ -183,9 +85,8 @@ static void fixup_port5(const MCFixup &Fixup, uint64_t &Value,
 ///
 /// Resolves to:
 /// 1011 0AAd dddd AAAA
-static void fixup_port6(const MCFixup &Fixup, uint64_t &Value,
-                        MCContext *Ctx = nullptr) {
-  unsigned_width(6, Value, std::string("port number"), Fixup, Ctx);
+static void fixup_port6(const MCFixup &Fixup, uint64_t &Value, MCContext *Ctx) {
+  ensureUnsignedWidth(6, Value, std::string("port number"), Fixup, Ctx);
 
   Value = ((Value & 0x30) << 5) | (Value & 0x0f);
 }
@@ -195,8 +96,8 @@ static void fixup_port6(const MCFixup &Fixup, uint64_t &Value,
 /// Resolves to:
 /// 1010 ikkk dddd kkkk
 static void fixup_lds_sts_16(const MCFixup &Fixup, uint64_t &Value,
-                             MCContext *Ctx = nullptr) {
-  unsigned_width(7, Value, std::string("immediate"), Fixup, Ctx);
+                             MCContext *Ctx) {
+  ensureUnsignedWidth(7, Value, std::string("immediate"), Fixup, Ctx);
   Value = ((Value & 0x70) << 8) | (Value & 0x0f);
 }
 
@@ -213,7 +114,7 @@ namespace ldi {
 /// 0000 KKKK 0000 KKKK
 /// Offset of 0 (so the result isn't left-shifted before application).
 static void fixup(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
-                  MCContext *Ctx = nullptr) {
+                  MCContext *Ctx) {
   uint64_t upper = Value & 0xf0;
   uint64_t lower = Value & 0x0f;
 
@@ -223,25 +124,25 @@ static void fixup(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
 static void neg(uint64_t &Value) { Value *= -1; }
 
 static void lo8(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
-                MCContext *Ctx = nullptr) {
+                MCContext *Ctx) {
   Value &= 0xff;
   ldi::fixup(Size, Fixup, Value, Ctx);
 }
 
 static void hi8(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
-                MCContext *Ctx = nullptr) {
+                MCContext *Ctx) {
   Value = (Value & 0xff00) >> 8;
   ldi::fixup(Size, Fixup, Value, Ctx);
 }
 
 static void hh8(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
-                MCContext *Ctx = nullptr) {
+                MCContext *Ctx) {
   Value = (Value & 0xff0000) >> 16;
   ldi::fixup(Size, Fixup, Value, Ctx);
 }
 
 static void ms8(unsigned Size, const MCFixup &Fixup, uint64_t &Value,
-                MCContext *Ctx = nullptr) {
+                MCContext *Ctx) {
   Value = (Value & 0xff000000) >> 24;
   ldi::fixup(Size, Fixup, Value, Ctx);
 }
@@ -263,13 +164,9 @@ void AVRAsmBackend::adjustFixupValue(const MCFixup &Fixup,
   default:
     llvm_unreachable("unhandled fixup");
   case AVR::fixup_7_pcrel:
-    adjust::fixup_7_pcrel(Size, Fixup, Value, Ctx);
-    break;
   case AVR::fixup_13_pcrel:
-    adjust::fixup_13_pcrel(Size, Fixup, Value, Ctx);
-    break;
   case AVR::fixup_call:
-    adjust::fixup_call(Size, Fixup, Value, Ctx);
+    // Handled by linker, see `shouldForceRelocation()`
     break;
   case AVR::fixup_ldi:
     adjust::ldi::fixup(Size, Fixup, Value, Ctx);
@@ -330,13 +227,15 @@ void AVRAsmBackend::adjustFixupValue(const MCFixup &Fixup,
     adjust::ldi::ms8(Size, Fixup, Value, Ctx);
     break;
   case AVR::fixup_16:
-    adjust::unsigned_width(16, Value, std::string("port number"), Fixup, Ctx);
+    adjust::ensureUnsignedWidth(16, Value, std::string("port number"), Fixup,
+                                Ctx);
 
     Value &= 0xffff;
     break;
   case AVR::fixup_16_pm:
     Value >>= 1; // Flash addresses are always shifted.
-    adjust::unsigned_width(16, Value, std::string("port number"), Fixup, Ctx);
+    adjust::ensureUnsignedWidth(16, Value, std::string("port number"), Fixup,
+                                Ctx);
 
     Value &= 0xffff;
     break;
@@ -517,8 +416,6 @@ bool AVRAsmBackend::shouldForceRelocation(const MCAssembler &Asm,
     return Fixup.getKind() >= FirstLiteralRelocationKind;
   case AVR::fixup_7_pcrel:
   case AVR::fixup_13_pcrel:
-    // Always resolve relocations for PC-relative branches
-    return false;
   case AVR::fixup_call:
     return true;
   }
diff --git a/llvm/test/CodeGen/AVR/branch-relaxation-long.ll b/llvm/test/CodeGen/AVR/branch-relaxation-long.ll
index bca505e5edd5f1..11c79dbccc1773 100644
--- a/llvm/test/CodeGen/AVR/branch-relaxation-long.ll
+++ b/llvm/test/CodeGen/AVR/branch-relaxation-long.ll
@@ -1,24 +1,25 @@
-; RUN: llc < %s -march=avr -mattr=avr3 | FileCheck %s
-; RUN: llc < %s -march=avr -mattr=avr2 | FileCheck --check-prefix=AVR2 %s
+; RUN: llc < %s -march=avr -mcpu=avr25 -filetype=obj -o - | llvm-objdump --mcpu=avr25 -dr --no-show-raw-insn --no-leading-addr - | FileCheck --check-prefix=AVR25 %s
+; RUN: llc < %s -march=avr -mcpu=avr3 -filetype=obj -o - | llvm-objdump --mcpu=avr3 -dr --no-show-raw-insn --no-leading-addr - | FileCheck --check-prefix=AVR3 %s
 
-; CHECK-LABEL: relax_to_jmp:
-; CHECK: cpi     r{{[0-9]+}}, 0
-; CHECK: brne    [[BB1:.LBB[0-9]+_[0-9]+]]
-; CHECK: jmp     [[BB2:.LBB[0-9]+_[0-9]+]]
-; CHECK: [[BB1]]:
-; CHECK: nop
-; CHECK: [[BB2]]:
+; AVR25: <relax_to_jmp>:
+; AVR25-NEXT: andi r24, 0x1
+; AVR25-NEXT: cpi r24, 0x0
+; AVR25-NEXT: brne .+0
+; AVR25-NEXT: R_AVR_7_PCREL .text+0x8
+; AVR25-NEXT: rjmp .+0
+; AVR25-NEXT: R_AVR_13_PCREL .text+0x100c
+; AVR25: ldi r24, 0x3
+; AVR25-NEXT: ret
 
-;; A `RJMP` is generated instead of expected `JMP` for AVR2,
-;; and it is up to the linker to report 'out of range' or
-;; 'exceed flash maximum size'.
-; AVR2-LABEL: relax_to_jmp:
-; AVR2: cpi     r{{[0-9]+}}, 0
-; AVR2: brne    [[BB1:.LBB[0-9]+_[0-9]+]]
-; AVR2: rjmp    [[BB2:.LBB[0-9]+_[0-9]+]]
-; AVR2: [[BB1]]:
-; AVR2: nop
-; AVR2: [[BB2]]:
+; AVR3: <relax_to_jmp>:
+; AVR3-NEXT: andi r24, 0x1
+; AVR3-NEXT: cpi r24, 0x0
+; AVR3-NEXT: brne .+0
+; AVR3-NEXT: R_AVR_7_PCREL .text+0xa
+; AVR3-NEXT: jmp 0x0
+; AVR3-NEXT: R_AVR_CALL .text+0x100e
+; AVR3: ldi r24, 0x3
+; AVR3-NEXT: ret
 
 define i8 @relax_to_jmp(i1 %a) {
 entry-block:
@@ -2081,24 +2082,25 @@ finished:
   ret i8 3
 }
 
-; CHECK-LABEL: relax_to_jmp_backwards:
-; CHECK: [[BB1:.LBB[0-9]+_[0-9]+]]
-; CHECK: nop
-; CHECK: cpi     r{{[0-9]+}}, 0
-; CHECK: breq    [[BB2:.LBB[0-9]+_[0-9]+]]
-; CHECK: jmp     [[BB1]]
-; CHECK: [[BB2]]:
+; AVR25: <relax_to_jmp_backwards>:
+; AVR25-NEXT: andi r24, 0x1
+; AVR25: cpi r24, 0x0
+; AVR25-NEXT: breq .+0
+; AVR25-NEXT: R_AVR_7_PCREL .text+0x201c
+; AVR25-NEXT: rjmp .+0
+; AVR25-NEXT: R_AVR_13_PCREL .text+0x1012
+; AVR25: ldi r24, 0x3
+; AVR25-NEXT: ret
 
-;; A `RJMP` is generated instead of expected `JMP` for AVR2,
-;; and it is up to the linker to report 'out of range' or
-;; 'exceed flash maximum size'.
-; AVR2-LABEL: relax_to_jmp_backwards:
-; AVR2: [[BB1:.LBB[0-9]+_[0-9]+]]
-; AVR2: nop
-; AVR2: cpi     r{{[0-9]+}}, 0
-; AVR2: breq    [[BB2:.LBB[0-9]+_[0-9]+]]
-; AVR2: rjmp    [[BB1]]
-; AVR2: [[BB2]]:
+; AVR3: <relax_to_jmp_backwards>:
+; AVR3-NEXT: andi r24, 0x1
+; AVR3: cpi r24, 0x0
+; AVR3-NEXT: breq .+0
+; AVR3-NEXT: R_AVR_7_PCREL .text+0x2020
+; AVR3-NEXT: jmp 0x0
+; AVR3-NEXT: R_AVR_CALL .text+0x1014
+; AVR3: ldi r24, 0x3
+; AVR3-NEXT: ret
 
 define i8 @relax_to_jmp_backwards(i1 %a) {
 entry-block:
diff --git a/llvm/test/CodeGen/AVR/jmp.ll b/llvm/test/CodeGen/AVR/jmp.ll
index 95dfff4836b4e8..f3d6724074e845 100644
--- a/llvm/test/CodeGen/AVR/jmp.ll
+++ b/llvm/test/CodeGen/AVR/jmp.ll
@@ -15,10 +15,11 @@ bb2:
 
 declare i8 @bar(i8);
 
-; CHECK: rcall   .-2
+; CHECK: rcall   .+0
 ; CHECK-NEXT: 00000000: R_AVR_13_PCREL bar
 ; CHECK-NEXT: cpi     r24, 0x7b
-; CHECK-NEXT: brne    .+4
+; CHECK-NEXT: brne    .+0
+; CHECK-NEXT: 00000004: R_AVR_7_PCREL .text+0xa
 ; CHECK-NEXT: ldi     r24, 0x64
 ; CHECK-NEXT: ret
 ; CHECK-NEXT: ldi     r24, 0xc8
diff --git a/llvm/test/MC/AVR/inst-brbc.s b/llvm/test/MC/AVR/inst-brbc.s
index 4edbbfd8580248..6df400068f1024 100644
--- a/llvm/test/MC/AVR/inst-brbc.s
+++ b/llvm/test/MC/AVR/inst-brbc.s
@@ -17,9 +17,11 @@ foo:
 ; CHECK-NEXT:                ; fixup A - offset: 0, value: (.Ltmp1-16)+2, kind: fixup_7_pcrel
 
 ; INST-LABEL: <foo>:
-; INST-NEXT: 23 f4   brvc .+8
-; INST-NEXT: c0 f7   brsh .-16
-; INST-NEXT: 59 f7   brne .-42
-; INST-NEXT: 52 f7   brpl .-44
-; INST-NEXT: 4c f7   brge .-46
-; INST-NEXT: c7 f4   brid .+48
+; INST-NEXT: 03 f4 brvc .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0xa
+; INST-NEXT: 00 f4 brsh .+0
+; INST-NEXT: R_AVR_7_PCREL .text-0xc
+; INST-NEXT: 59 f7 brne .-42
+; INST-NEXT: 52 f7 brpl .-44
+; INST-NEXT: 4c f7 brge .-46
+; INST-NEXT: c7 f4 brid .+48
diff --git a/llvm/test/MC/AVR/inst-brbs.s b/llvm/test/MC/AVR/inst-brbs.s
index 3f4b134aef682c..e6c14ce3c255ee 100644
--- a/llvm/test/MC/AVR/inst-brbs.s
+++ b/llvm/test/MC/AVR/inst-brbs.s
@@ -16,9 +16,11 @@ foo:
 ; CHECK-NEXT:                ; fixup A - offset: 0, value: (.Ltmp1-12)+2, kind: fixup_7_pcrel
 
 ; INST-LABEL: <foo>:
-; INST-NEXT: 23 f0   brvs .+8
-; INST-NEXT: d0 f3   brlo .-12
-; INST-NEXT: 59 f3   breq .-42
-; INST-NEXT: 52 f3   brmi .-44
-; INST-NEXT: 4c f3   brlt .-46
-; INST-NEXT: 77 f0   brie .+28
+; INST-NEXT: 03 f0 brvs .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0xa
+; INST-NEXT: 00 f0 brlo .+0
+; INST-NEXT: R_AVR_7_PCREL .text-0x8
+; INST-NEXT: 59 f3 breq .-42
+; INST-NEXT: 52 f3 brmi .-44
+; INST-NEXT: 4c f3 brlt .-46
+; INST-NEXT: 77 f0 brie .+28
diff --git a/llvm/test/MC/AVR/inst-brcc.s b/llvm/test/MC/AVR/inst-brcc.s
index dd1b2b11a6d306..e9482c5cf1e656 100644
--- a/llvm/test/MC/AVR/inst-brcc.s
+++ b/llvm/test/MC/AVR/inst-brcc.s
@@ -22,7 +22,11 @@ bar:
 ; CHECK-NEXT:                ;   fixup A - offset: 0, value: bar, kind: fixup_7_pcrel
 
 ; INST-LABEL: <foo>:
-; INST-NEXT: 08 f5      brsh .+66
-; INST-NEXT: a8 f7      brsh .-22
-; INST-NEXT: 08 f5      brsh .+66
-; INST-NEXT: 00 f4      brsh .+0
+; INST-NEXT: 00 f4 brsh .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0x44
+; INST-NEXT: 00 f4 brsh .+0
+; INST-NEXT: R_AVR_7_PCREL .text-0x12
+; INST-NEXT: 00 f4 brsh .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0x48
+; INST-NEXT: 00 f4 brsh .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0x8
diff --git a/llvm/test/MC/AVR/inst-brcs.s b/llvm/test/MC/AVR/inst-brcs.s
index 3fafccdb49257c..52a20a8e334d28 100644
--- a/llvm/test/MC/AVR/inst-brcs.s
+++ b/llvm/test/MC/AVR/inst-brcs.s
@@ -22,7 +22,11 @@ bar:
 ; CHECK-NEXT:               ;   fixup A - offset: 0, value: bar, kind: fixup_7_pcrel
 
 ; INST-LABEL: <foo>:
-; INST-NEXT: 20 f0      brlo .+8
-; INST-NEXT: 10 f0      brlo .+4
-; INST-NEXT: 20 f0      brlo .+8
-; INST-NEXT: 00 f0      brlo .+0
+; INST-NEXT: 00 f0 brlo .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0xa
+; INST-NEXT: 00 f0 brlo .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0x8
+; INST-NEXT: 00 f0 brlo .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0xe
+; INST-NEXT: 00 f0 brlo .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0x8
diff --git a/llvm/test/MC/AVR/inst-breq.s b/llvm/test/MC/AVR/inst-breq.s
index 7a6eac6f01ad07..4e10438762daba 100644
--- a/llvm/test/MC/AVR/inst-breq.s
+++ b/llvm/test/MC/AVR/inst-breq.s
@@ -22,7 +22,11 @@ bar:
 ; CHECK-NEXT:                      ;   fixup A - offset: 0, value: bar, kind: fixup_7_pcrel
 
 ; INST-LABEL: <foo>:
-; INST-NEXT: b9 f3      breq .-18
-; INST-NEXT: d1 f3      breq .-12
-; INST-NEXT: b9 f3      breq .-18
-; INST-NEXT: 01 f0      breq .+0
+; INST-NEXT: 01 f0 breq .+0
+; INST-NEXT: R_AVR_7_PCREL .text-0x10
+; INST-NEXT: 01 f0 breq .+0
+; INST-NEXT: R_AVR_7_PCREL .text-0x8
+; INST-NEXT: 01 f0 breq .+0
+; INST-NEXT: R_AVR_7_PCREL .text-0xc
+; INST-NEXT: 01 f0 breq .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0x8
diff --git a/llvm/test/MC/AVR/inst-brge.s b/llvm/test/MC/AVR/inst-brge.s
index 6cf79db4dbd659..9657413d14895b 100644
--- a/llvm/test/MC/AVR/inst-brge.s
+++ b/llvm/test/MC/AVR/inst-brge.s
@@ -19,6 +19,9 @@ bar:
 ; CHECK-NEXT:                ;   fixup A - offset: 0, value: bar, kind: fixup_7_pcrel
 
 ; INST-LABEL: <foo>:
-; INST-NEXT: cc f4      brge .+50
-; INST-NEXT: ac f4      brge .+42
-; INST-NEXT: 04 f4      brge .+0
+; INST-NEXT: 04 f4 brge .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0x34
+; INST-NEXT: 04 f4 brge .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0x2e
+; INST-NEXT: 04 f4 brge .+0
+; INST-NEXT: R_AVR_7_PCREL .text+0x6
diff --git a/llvm/test/MC/AVR/inst-brhc.s b/llvm/test/MC/AVR/inst-brhc.s
index 924895e4bf5df1..f6c2b66a89a66c 100644
--- a/llvm/test/MC/AVR/inst-brhc.s
+++ b/llvm/test/MC/AVR/inst-brhc.s
@@ -19,6 +19,9 @@ bar:
 ; CHECK-NEXT:                ;   fixup A - offset: 0, value: bar, kind: fixup_7_pcrel
 
 ; INST-LABEL: <foo>:
-; INST-NEXT: 35 f4      brhc .+12
-; INST-NEXT: 3d f4      brhc .+14
-; INST-NEXT...
[truncated]

Copy link
Member

@benshi001 benshi001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is so aggressive, just for some special devices with 8KB but without long range JMP/CALL.

I suggest this way:

  1. Introduce a new target feature, something like "RjmpWrap" (or a better name??)
  2. Add this target feature to these special devices.
  3. And then we only perform force relocation with this new target feature specified.

@Patryk27
Copy link
Contributor Author

Hmm, so we'd like to diverge from avr-gcc's behavior, right?

@benshi001
Copy link
Member

Hmm, so we'd like to diverge from avr-gcc's behavior, right?

Is there a strong reason for us to strictly follow avr-gcc? My key concern is that the AVR backend has been stable, we need not do aggressive functional changes.

@benshi001
Copy link
Member

But I also think supporting the flash space wrapping is necessary. If you have no time for my suggested solution, I will try it later this month.

@Patryk27
Copy link
Contributor Author

Patryk27 commented Dec 3, 2024

Is there a strong reason for us to strictly follow avr-gcc?

No, I don't think so - I'm up for avoiding relocations as well 🙂

If you have no time for my suggested solution, I will try it later this month.

I can do that, no problem - I'm just reflecting on the approach:

  • Option A: Create a new feature (like RjmpWrap) and enable it for the smaller (8 kB) microcontrollers,
  • Option B: Teach LLVM flash sizes of all of the AVRs and implement logic that automatically applies the memory wrapping trick.

Option A is simpler to implement, while option B (as described in the Alternatives approach at the top) is potentially more versatile (we'll be able to apply the wrapping trick even for larger microcontrollers, if we happen to generate a branch large enough). Note that for larger uCs this trick is just a program-size optimization, not a requirement.

I'm leaning towards option B, since it feels less hacky to me, but I don't have any strong preference here - care to choose?

@benshi001
Copy link
Member

benshi001 commented Dec 4, 2024

Is there a strong reason for us to strictly follow avr-gcc?

No, I don't think so - I'm up for avoiding relocations as well 🙂

If you have no time for my suggested solution, I will try it later this month.

I can do that, no problem - I'm just reflecting on the approach:

  • Option A: Create a new feature (like RjmpWrap) and enable it for the smaller (8 kB) microcontrollers,
  • Option B: Teach LLVM flash sizes of all of the AVRs and implement logic that automatically applies the memory wrapping trick.

Option A is simpler to implement, while option B (as described in the Alternatives approach at the top) is potentially more versatile (we'll be able to apply the wrapping trick even for larger microcontrollers, if we happen to generate a branch large enough). Note that for larger uCs this trick is just a program-size optimization, not a requirement.

I'm leaning towards option B, since it feels less hacky to me, but I don't have any strong preference here - care to choose?

I prefer A to B, because:

  1. It is simpler, and less code changes.
  2. For larger devices, we use long JMP, and let the linker to do the branch relaxation.

What's more,

  1. we only add this new TargetFeature to devices with larger flash but without long JMP. Devices with flash size less than 4KB or with long JMP are unnecessary.
  2. Generally I prefer force relocation, but if the devices have a unique flash size (for example 8KB), I also accept hard coding it into adjustRelativeBranch .

@Patryk27 Patryk27 changed the title [AVR] Force relocations for jumps and calls [AVR] Wrap out-of-bounds relative jumps Dec 10, 2024
@Patryk27
Copy link
Contributor Author

Okie, ready! I've adjusted the description above as well. cc @benshi001

@Patryk27
Copy link
Contributor Author

Rebased due to a conflict over 806761a.

@benshi001
Copy link
Member

Rebased due to a conflict over 806761a.

Are you now on the Christmas holiday? What's your opinion on my comments? Do you mind I do the modification based on your code?

@Patryk27
Copy link
Contributor Author

What's your opinion on my comments?

Which comments? 👀

You mentioned using the feature-based approach, which I did implement - I don’t see any more comments 😅

@benshi001
Copy link
Member

截屏2024-12-24 09 04 57 截屏2024-12-24 09 05 03 截屏2024-12-24 09 05 59

@benshi001
Copy link
Member

benshi001 commented Dec 24, 2024

Sorry, it is my fault to not post them.

@@ -239,7 +251,7 @@ class Device<string Name, Family Fam, ELFArch Arch,
// in AVR binutils. We do not replicate this.
def : Device<"avr1", FamilyAVR1, ELFArchAVR1>;
def : Device<"avr2", FamilyAVR2, ELFArchAVR2>;
def : Device<"avr25", FamilyAVR25, ELFArchAVR25>;
def : Device<"avr25", FamilyAVR25, ELFArchAVR25, [FeatureWrappingRjmp]>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems not all of avr25 serial devices have a large SRAM, such as attiny13, so I would rather leave device avr25 unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp Outdated Show resolved Hide resolved
@Patryk27
Copy link
Contributor Author

Cool, thanks! I'll take a look at the comments tomorrow 🙂

@Patryk27
Copy link
Contributor Author

Ready!

Copy link
Member

@benshi001 benshi001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@benshi001 benshi001 merged commit c6ea7fb into llvm:main Dec 27, 2024
8 checks passed
@Patryk27 Patryk27 deleted the fix-avr-jumps branch December 27, 2024 06:35
@mbuesch
Copy link

mbuesch commented Dec 27, 2024

Does this fix attiny26, too?

I get this

error: out of range branch target (expected an integer in the range -4096 to 4095)

with the nightly rust compiler on an attiny26 target, too.

Would this change fix this, even though it doesn't apply FeatureWrappingRjmp to attiny26?

@Patryk27
Copy link
Contributor Author

attiny26 has 2 KB of Flash memory, so this trick/fix shouldn't be needed there - if you get that error, it might mean that the generated binary is just too huge.

Is this an open source project, something I/we could compile locally?

@mbuesch
Copy link

mbuesch commented Dec 27, 2024

I get this behavior with recent rust-nightly in this completely empty project:

attiny26test.tar.gz

It does build and work fine with nightly-2024-03-22. See rust-toolchain.toml.

@Patryk27
Copy link
Contributor Author

Thanks for the reproducer, I'll take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants