Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation with LLVM > 6 #630

Closed
elliottslaughter opened this issue Aug 10, 2023 · 5 comments · Fixed by #631
Closed

Miscompilation with LLVM > 6 #630

elliottslaughter opened this issue Aug 10, 2023 · 5 comments · Fixed by #631

Comments

@elliottslaughter
Copy link
Member

elliottslaughter commented Aug 10, 2023

This reproducer was minimized from a Regent program discussed in StanfordLegion/legion#1385.

https://gist.github.com/elliottslaughter/c8bc2252c92639f8d3618b9bf3d33fb2

Running with Terra compiled with any LLVM version > 6 produces (in this case, with LLVM 13):

$ terra test3_min_terra.t
extern global gamma_table : double[700][18]
0   terra (JIT)                         0x000000010ecd8383 $g3d + 851

Running with Terra compiled with LLVM 6 produces:

$ terra-llvm6 seema_test3_min_terra.t
extern global gamma_table : double[700][18]

The issue seems to have something to do with the array we're bringing in via the header file.

I tested macOS x86_64. I believe the issue also occurs on Linux x86_64.

@elliottslaughter
Copy link
Member Author

elliottslaughter commented Aug 10, 2023

Adding getGammaTableArrayBig:setinlined(false) produces a slightly different error (presumably just moving the same error to a different site):

$ ./regent.py test3_min_terra.t
extern global gamma_table : double[700][18]
0   terra (JIT)                         0x000000010fc74271 $getGammaTableArrayBig + 33 
1   terra                               0x000000010aa72a45 lj_ccall_func + 2101
2   ???                                 0x000000000000001b 0x0 + 27
LLVM 13 disas (optimized)
extern global gamma_table : double[700][18]
definition 	{int32,int32} -> double
; Function Attrs: noinline
define dso_local double @"$getGammaTableArrayBig"(i32 %0, i32 %1) #0 {
entry:
  %2 = sext i32 %0 to i64
  %3 = getelementptr <{ [680 x double], [20 x double] }>, <{ [680 x double], [20 x double] }>* getelementptr inbounds (<{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }>, <{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }>* @gamma_table, i64 0, i32 0), i64 %2, i32 0
  %4 = sext i32 %1 to i64
  %gval.0..sroa_idx = getelementptr [680 x double], [680 x double]* %3, i64 %4, i64 0
  %gval.0.copyload = load double, double* %gval.0..sroa_idx, align 16
  ret double %gval.0.copyload
}
assembly for function at address 0x1133df000
0x1133df000(+0):		movsxd	rax, edi
0x1133df003(+3):		imul	rax, rax, 5600
0x1133df00a(+10):		movsxd	rcx, esi
0x1133df00d(+13):		imul	rcx, rcx, 5440
0x1133df014(+20):		add	rcx, rax
0x1133df017(+23):		movabs	rax, 4617797632
0x1133df021(+33):		vmovsd	xmm0, qword ptr [rax + rcx]
0x1133df026(+38):		ret
0   terra (JIT)                         0x00000001133df021 $getGammaTableArrayBig + 33 
1   terra                               0x000000010e1dda45 lj_ccall_func + 2101
2   ???                                 0x000000000000001b 0x0 + 27
LLVM 13 disas (unoptimized)
extern global gamma_table : double[700][18]
definition 	{int32,int32} -> double
; Function Attrs: noinline optnone
define dso_local double @"$getGammaTableArrayBig"(i32 %0, i32 %1) #0 {
entry:
  %gval = alloca double, align 8
  %index = alloca i32, align 4
  %j = alloca i32, align 4
  store i32 %0, i32* %j, align 4
  store i32 %1, i32* %index, align 4
  %2 = load i32, i32* %j, align 4
  %3 = sext i32 %2 to i64
  %4 = getelementptr <{ [680 x double], [20 x double] }>, <{ [680 x double], [20 x double] }>* getelementptr inbounds (<{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }>, <{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }>* @gamma_table, i32 0, i32 0), i64 %3
  %5 = getelementptr <{ [680 x double], [20 x double] }>, <{ [680 x double], [20 x double] }>* %4, i32 0, i32 0
  %6 = load i32, i32* %index, align 4
  %7 = sext i32 %6 to i64
  %8 = getelementptr [680 x double], [680 x double]* %5, i64 %7
  %9 = load [680 x double], [680 x double]* %8, align 8
  %10 = bitcast double* %gval to i8*
  %11 = bitcast [680 x double]* %8 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %10, i8* align 8 %11, i64 5440, i1 false)
  %12 = load double, double* %gval, align 8
  ret double %12

dead:                                             ; No predecessors!
  ret double undef
}
assembly for function at address 0x10baad000
0x10baad000(+0):		sub	rsp, 24
0x10baad004(+4):		mov	dword ptr [rsp + 8], edi
0x10baad008(+8):		mov	dword ptr [rsp + 12], esi
0x10baad00c(+12):		movsxd	rax, dword ptr [rsp + 8]
0x10baad011(+17):		imul	rsi, rax, 5600
0x10baad018(+24):		movabs	rax, 4490715136
0x10baad022(+34):		add	rsi, rax
0x10baad025(+37):		movsxd	rax, dword ptr [rsp + 12]
0x10baad02a(+42):		imul	rax, rax, 5440
0x10baad031(+49):		add	rsi, rax
0x10baad034(+52):		lea	rdi, [rsp + 16]
0x10baad039(+57):		movabs	rax, 140703517103482
0x10baad043(+67):		mov	edx, 5440
0x10baad048(+72):		call	rax
0x10baad04a(+74):		vmovsd	xmm0, qword ptr [rsp + 16]
0x10baad050(+80):		add	rsp, 24
0x10baad054(+84):		ret
0   libsystem_platform.dylib            0x00007ff817276b66 _platform_memmove$VARIANT$Haswell + 422
1   terra (JIT)                         0x000000010baad04a $getGammaTableArrayBig + 74 
2   ???                                 0x3feffffafe1bfd20 0x0 + 4607182397293460768
LLVM 6 disas (optimized)
extern global gamma_table : double[700][18]
definition 	{int32,int32} -> double

; Function Attrs: noinline
define double @"$getGammaTableArrayBig"(i32, i32) #0 {
entry:
  %2 = sext i32 %0 to i64
  %3 = sext i32 %1 to i64
  %4 = getelementptr [18 x [700 x double]], [18 x [700 x double]]* @gamma_table, i64 0, i64 %2, i64 %3
  %5 = load double, double* %4, align 8
  ret double %5
}
assembly for function at address 0x1093a3000
0x1093a3000(+0):		movsxd	rax, edi
0x1093a3003(+3):		movabs	rdx, 4449779712
0x1093a300d(+13):		movsxd	rcx, esi
0x1093a3010(+16):		imul	rax, rax, 5600
0x1093a3017(+23):		lea	rdx, [rdx + rax]
0x1093a301b(+27):		vmovsd	xmm0, qword ptr [rdx + 8*rcx]
0x1093a3020(+32):		ret
LLVM 6 disas (unoptimized)
extern global gamma_table : double[700][18]
definition 	{int32,int32} -> double

; Function Attrs: noinline optnone
define double @"$getGammaTableArrayBig"(i32, i32) #0 {
entry:
  %gval = alloca double
  %index = alloca i32
  %j = alloca i32
  store i32 %0, i32* %j
  store i32 %1, i32* %index
  %2 = load i32, i32* %j
  %3 = sext i32 %2 to i64
  %4 = getelementptr [700 x double], [700 x double]* getelementptr inbounds ([18 x [700 x double]], [18 x [700 x double]]* @gamma_table, i32 0, i32 0), i64 %3
  %5 = getelementptr [700 x double], [700 x double]* %4, i32 0, i32 0
  %6 = load i32, i32* %index
  %7 = sext i32 %6 to i64
  %8 = getelementptr double, double* %5, i64 %7
  %9 = load double, double* %8
  store double %9, double* %gval
  %10 = load double, double* %gval
  ret double %10

dead:                                             ; No predecessors!
  ret double undef
}
assembly for function at address 0x1096f1000
0x1096f1000(+0):		mov	dword ptr [rsp - 16], edi
0x1096f1004(+4):		mov	dword ptr [rsp - 12], esi
0x1096f1008(+8):		movsxd	rax, dword ptr [rsp - 16]
0x1096f100d(+13):		imul	rax, rax, 5600
0x1096f1014(+20):		movabs	rcx, 4453244928
0x1096f101e(+30):		add	rcx, rax
0x1096f1021(+33):		movsxd	rax, dword ptr [rsp - 12]
0x1096f1026(+38):		vmovsd	xmm0, qword ptr [rcx + 8*rax]
0x1096f102b(+43):		vmovsd	qword ptr [rsp - 8], xmm0
0x1096f1031(+49):		vmovsd	xmm0, qword ptr [rsp - 8]
0x1096f1037(+55):		ret
For posterity, the version of the reproducer used in this test
local c = terralib.includec("stdlib.h")
local gamma_header_big = terralib.includec("gamma_table_min.h", {"-I", "./"})

print(gamma_header_big.gamma_table)

terra getGammaTableArrayBig(j : int, index : int) : double
  var gval: double
  gval = gamma_header_big.gamma_table[j][index]
  return gval
end
getGammaTableArrayBig:setinlined(false)
getGammaTableArrayBig:setoptimized(false)
getGammaTableArrayBig:disas()

terra g3d(r_gamma_table : &double, N : int, M : int)
  for x = 0, N do
    for y = 0, M do
      r_gamma_table[x * M + y] =  getGammaTableArrayBig(x, y)
    end
  end
end

terra main()
  var N = 18
  var M = 700
  var r_gamma_table_big = [&double](c.malloc(N*M*terralib.sizeof(double)))
  g3d(r_gamma_table_big, N, M)
end

main()

Edit: I also ran LLVM 7, and the results look very similar to LLVM 13.

Edit 2: all of this on macOS x86_64.

@elliottslaughter
Copy link
Member Author

A couple places in the code possibly worth checking:

terra/src/tcompiler.cpp

Lines 1295 to 1299 in 7ad7756

#if LLVM_VERSION < 70
Value *m = B->CreateMemCpy(addr_dest, addr_source, size_v, a1);
#else
Value *m = B->CreateMemCpy(addr_dest, a1, addr_source, a1, size_v);
#endif

terra/src/tcompiler.cpp

Lines 2526 to 2530 in 7ad7756

#if LLVM_VERSION <= 60
Value *m = B->CreateMemCpy(addr_dst, addr_src, size_v, a1, isVolatile);
#elif LLVM_VERSION <= 90
Value *m = B->CreateMemCpy(addr_dst, a1, addr_src, l->getAlignment(), size_v,
isVolatile);

@elliottslaughter
Copy link
Member Author

Confirmed that LLVM 7 is going through this code path:

terra/src/tcompiler.cpp

Lines 2528 to 2530 in 7ad7756

#elif LLVM_VERSION <= 90
Value *m = B->CreateMemCpy(addr_dst, a1, addr_src, l->getAlignment(), size_v,
isVolatile);

@elliottslaughter
Copy link
Member Author

elliottslaughter commented Aug 17, 2023

The root cause appears to be that starting in LLVM 7, Clang changed the way that it encodes certain global variables. In particular, arrays with trailing zeros appear to be reencoded as a struct containing two sub-arrays, one with the non-zeros and one with zeroes.

This is pretty clear if you dump the full LLVM IR for the complete examples:

main-llvm6.ll:

@gamma_table = local_unnamed_addr constant [18 x [700 x double]] ...

main-llvm7.ll:

@gamma_table = local_unnamed_addr constant <{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }> <{ <{ [680 x double], [20 x double] }> <{ [680 x double] ...

So then there's a question of what to do about it. Some options:

  1. Teach Terra to emit array access expressions for the above types. This seems like a bad idea, particularly because we're not in control of what Clang emits here.
  2. Cast the global to the right type before accessing. So far, all my attempts to do this fail internal asserts in LLVM.
  3. Copy the global, either into another global of the correct type, or an alloca. I'm not a fan of the alloca approach, but it may end up being easiest. It's not immediately obvious to me how to re-type the global if casting doesn't work either.

@elliottslaughter
Copy link
Member Author

Nevermind, I was just doing the cast wrong. Fix in #631.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant