Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coding error trying to encode null during livingatlas clustering #1005

Closed
vjrj opened this issue Dec 22, 2023 · 1 comment
Closed

Coding error trying to encode null during livingatlas clustering #1005

vjrj opened this issue Dec 22, 2023 · 1 comment
Assignees

Comments

@vjrj
Copy link
Collaborator

vjrj commented Dec 22, 2023

I get this error during clustering, that I think is introduced by #987 in 2.18.0-SNAPSHOT+0~20231207231802.1143~1.gbp8802a4:

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalStateException: Error encoding value: ValueInGlobalWindow{value=KV{5[179/10458$
0|-710|1996|7|28, au.org.ala.clustering.HashKeyOccurrence@318a655e}, pane=PaneInfo.NO_FIRING}                                                                                       
        at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:73)                                                                         
        at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:104)                                                                          
        at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)                                                                           
        at au.org.ala.pipelines.beam.ClusteringPipeline.run(ClusteringPipeline.java:373)                                                                                            
        at au.org.ala.pipelines.beam.ClusteringPipeline.main(ClusteringPipeline.java:81)                                                                                            
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                              
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)                                                                                            
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                                                    
        at java.lang.reflect.Method.invoke(Method.java:498)                                                                                                                         
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)                                                                                             
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)                                                                  
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)                                                                                                   
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)                                                                                                        
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)                                                                                                       
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)                                                                                              
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)                                                                                                         
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)                                                                                                              
Caused by: java.lang.IllegalStateException: Error encoding value: ValueInGlobalWindow{value=KV{5051752|38120|-710|1996|7|28, au.org.ala.clustering.HashKeyOccurrence@318a655e}, pane
=PaneInfo.NO_FIRING}                                                                                                                                                                
        at org.apache.beam.runners.spark.coders.CoderHelpers.toByteArray(CoderHelpers.java:60)                                                                                      
        at org.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions.lambda$groupByKeyAndWindow$c9b6f5c4$1(GroupNonMergingWindowsFunctions.java:87)                 
        at org.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions.lambda$bringWindowToKey$0(GroupNonMergingWindowsFunctions.java:130)                            
        at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators$6.transform(Iterators.java:785)                                                               
        at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)                                                   
        at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)                                                                                               
        at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)                                                                                                              
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:201)                                                                                      
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)                                                                                        
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)                                                                                               
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)                                                                                               
        at org.apache.spark.scheduler.Task.run(Task.scala:123)                                                                                                                      
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)                                                                                      
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)                                                                                                        
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)                                                                                                    
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)                                                                                          
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                                                                                          
        at java.lang.Thread.run(Thread.java:750)                                                                                                                                    
Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null String                                                                                                   
        at org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:74)                                                                                               
        at org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:68)                                                                                               
        at org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:37)                                                                                               
        at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:114)                                                                                          
        at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:60)                                                                                           
        at org.apache.beam.sdk.coders.RowCoderGenerator$EncodeInstruction.encodeDelegate(RowCoderGenerator.java:337)                                                                
        at org.apache.beam.sdk.coders.Coder$ByteBuddy$5tgsQW7H.encode(Unknown Source)                                                                                               
        at org.apache.beam.sdk.coders.Coder$ByteBuddy$5tgsQW7H.encode(Unknown Source)                                                                                               
        at org.apache.beam.sdk.schemas.SchemaCoder.encode(SchemaCoder.java:124)                                                                                                     
        at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)                                                                                                                  
        at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:73)                                                                                                               
        at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:37)                                                                                                               
        at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:591)                                                                             
        at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:582)                                                                             
        at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:542)                                                                             
        at org.apache.beam.runners.spark.coders.CoderHelpers.toByteArray(CoderHelpers.java:58)                                                                                      

Command:

sudo -u spark la-pipelines clustering all --cluster

cc @adam-collins .

@adam-collins adam-collins self-assigned this Jan 1, 2024
adam-collins pushed a commit that referenced this issue Jan 1, 2024
adam-collins added a commit that referenced this issue Jan 5, 2024
@adam-collins
Copy link
Collaborator

Feel free to reopen if it is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants