marklogic-community / marklogic-spring-batch

Write batch processing applications in MarkLogic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error Updating Job Repository when Partitioning

rjkennedy98 opened this issue · comments

We are using Spring Batch Partitioning and we are seeing errors updating the job repository. There are no errors when the partition size is only 1. As soon as we have a partition size > 1 the job fails.

Subset of Job Code:

	@Bean
	public Job job(JobBuilderFactory jobBuilderFactory, 
			@Qualifier("partitionInputFlowStep") Step partitionInputFlowStep, 
			@Qualifier("inputHarmonizationStep") Step inputHarmonizationStep) {
		return jobBuilderFactory.get(JOB_NAME)
				.start(partitionInputFlowStep)
				.next(inputHarmonizationStep)
				.incrementer(new RunIdIncrementer()).build();
	}

	@Bean
	@JobScope
	@Qualifier("partitionInputFlowStep")
	public Step partitionInputFlowStep(
			StepBuilderFactory stepBuilderFactory,
			DatabaseClientProvider databaseClientProvider,
			@Value("#{jobParameters['directory']}") String directory,
			@Value("#{jobParameters['batchId']}") String batchId
			){
		EntityPartitioner entityPartitioner = new EntityPartitioner();
		entityPartitioner.setDirectory(directory);
		entityPartitioner.setBatchId(batchId);
		
	    return stepBuilderFactory.get("partitionInputFlowStep")
	    	      .partitioner("entityPartitioner1", entityPartitioner)
	    	      .step(inputFlowStep(stepBuilderFactory, databaseClientProvider))
	    	      .taskExecutor(new SimpleAsyncTaskExecutor())
	    	      .build();
	}
	

	@Bean
	@JobScope
	@Qualifier("inputHarmonizationStep")
	public Step partitionHarmonizationFlowStep(
			StepBuilderFactory stepBuilderFactory,
			DatabaseClientProvider databaseClientProvider,
			@Value("#{jobParameters['directory']}") String directory,
			@Value("#{jobParameters['batchId']}") String batchId
			){
		
		EntityPartitioner entityPartitioner = new EntityPartitioner();
		entityPartitioner.setDirectory(directory);
		entityPartitioner.setBatchId(batchId);
		
	    return stepBuilderFactory.get("partitionInputFlowStep")
	    	      .partitioner("entityPartitioner2", entityPartitioner)
	    	      .step(harmonizeFlowStep(stepBuilderFactory, databaseClientProvider))
	    	      .taskExecutor(new SimpleAsyncTaskExecutor())
	    	      .build();
	}
	

Stack trace:

16:23:52.309 [SimpleAsyncTaskExecutor-2] INFO  c.m.s.b.c.r.d.MarkLogicStepExecutionDao - update step execution: 4827936691968208743,jobExecution:3310947134256853238
16:23:52.444 [SimpleAsyncTaskExecutor-2] INFO  c.m.s.b.c.r.d.MarkLogicJobExecutionDao - update JobExecution:/projects.spring.io/spring-batch/2242995104553619222.xml,15103526323095394
16:23:52.444 [SimpleAsyncTaskExecutor-2] INFO  c.m.s.b.c.r.d.MarkLogicStepExecutionDao - update step execution: 4827936691968208743,jobExecution:3310947134256853238
16:23:53.523 [pool-1-thread-1] WARN  c.m.c.d.impl.WriteBatcherImpl - Error writing batch: com.marklogic.client.FailedRequestException: Local message: failed to apply resource at documents: Internal Server Error. Server Message: The specified flow Account:Account Load is missing. (MISSING_FLOW): . See the MarkLogic server error log for further detail.
16:23:53.529 [main] ERROR o.s.batch.core.step.AbstractStep - Encountered an error executing step partitionInputFlowStep in job exampleIngestJob
java.util.concurrent.ExecutionException: com.marklogic.client.FailedRequestException: Local message: Content version must match to write document. Server Message: RESTAPI-CONTENTWRONGVERSION: (err:FOER0000) Content version mismatch:  uri /projects.spring.io/spring-batch/2242995104553619222.xml has current version 15103526304335394 that doesn't match if-match: 15103526301705394
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler.doHandle(TaskExecutorPartitionHandler.java:121)
	at org.springframework.batch.core.partition.support.AbstractPartitionHandler.handle(AbstractPartitionHandler.java:61)
	at org.springframework.batch.core.partition.support.PartitionStep.doExecute(PartitionStep.java:106)
	at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:200)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:133)
	at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:121)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
	at com.sun.proxy.$Proxy31.execute(Unknown Source)
	at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148)
	at org.springframework.batch.core.job.AbstractJob.handleStep(AbstractJob.java:392)
	at org.springframework.batch.core.job.SimpleJob.doExecute(SimpleJob.java:135)
	at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:306)
	at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135)
	at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
	at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:128)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:127)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
	at com.sun.proxy.$Proxy24.run(Unknown Source)
	at com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:365)
	at com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner.execute(CommandLineJobRunner.java:568)
	at com.marklogic.spring.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:531)
Caused by: com.marklogic.client.FailedRequestException: Local message: Content version must match to write document. Server Message: RESTAPI-CONTENTWRONGVERSION: (err:FOER0000) Content version mismatch:  uri /projects.spring.io/spring-batch/2242995104553619222.xml has current version 15103526304335394 that doesn't match if-match: 15103526301705394
	at com.marklogic.client.impl.OkHttpServices.putPostDocumentImpl(OkHttpServices.java:1562)
	at com.marklogic.client.impl.OkHttpServices.putDocument(OkHttpServices.java:1223)
	at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:919)
	at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:866)
	at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:805)
	at com.marklogic.spring.batch.core.repository.dao.MarkLogicJobExecutionDao.updateJobExecution(MarkLogicJobExecutionDao.java:116)
	at com.marklogic.spring.batch.core.repository.dao.MarkLogicStepExecutionDao.updateStepExecution(MarkLogicStepExecutionDao.java:114)
	at org.springframework.batch.core.repository.support.SimpleJobRepository.update(SimpleJobRepository.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99)
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:282)
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
	at com.sun.proxy.$Proxy22.update(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:127)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
	at com.sun.proxy.$Proxy22.update(Unknown Source)
	at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:188)
	at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:139)
	at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:136)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
16:23:53.671 [main] INFO  c.m.s.b.c.r.d.MarkLogicJobExecutionDao - update JobExecution:/projects.spring.io/spring-batch/2242995104553619222.xml,15103526324435394
16:23:53.671 [main] INFO  c.m.s.b.c.r.d.MarkLogicStepExecutionDao - update step execution: 7324074983296106230,jobExecution:3310947134256853238
16:23:53.807 [main] INFO  c.m.s.b.c.r.d.MarkLogicJobExecutionDao - update JobExecution:/projects.spring.io/spring-batch/2242995104553619222.xml,15103526336705394
16:23:53.807 [main] INFO  c.m.s.b.c.r.d.MarkLogicStepExecutionDao - update step execution: 7324074983296106230,jobExecution:3310947134256853238
16:23:53.942 [main] INFO  c.m.s.b.c.r.d.MarkLogicJobExecutionDao - update JobExecution:/projects.spring.io/spring-batch/2242995104553619222.xml,15103526338065394
16:23:54.092 [main] INFO  c.m.s.b.c.r.d.MarkLogicJobExecutionDao - update JobExecution:/projects.spring.io/spring-batch/2242995104553619222.xml,15103526339415394
16:23:54.092 [main] INFO  o.s.b.c.l.support.SimpleJobLauncher - Job: [SimpleJob: [name=exampleIngestJob]] completed with the following parameters: [{batchId=58, directory=C:\Users\RK367\workspace\testData\20171023\}] and the following status: [FAILED]
16:23:54.092 [main] INFO  o.s.c.a.AnnotationConfigApplicationContext - Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@c285f4: startup date [Fri Nov 10 16:23:44 CST 2017]; root of context hierarchy
:runYourJob FAILED

Version 1.3.0 was just released this week that changes how the MarkLogicJobRepository persists job metadata. Could you upgrade versions and test it out?

@sastafford Does that also include performance improvements? Doing chunk reader-processor-writer was pretty slow with the ML Job Repository as you mentioned.

It will be faster, IF your chunk size is not a small number.

Spring Batch is going to persist data to the JobRepository upon the completion of each chunk. If you chunk size is 1, a write occurs to your jobRepo for every write to your target db. That's going to slow things down considerably. Start with a chunk size of 100 and see if you notice a difference.

Make sure to run the mlJobRepo utility to redeploy your database on the new release.

@rjkennedy98 - any update on this?