[Bug]: ftp文件过多程序崩溃
Bear-big-code opened this issue · comments
What happened?
配置:
{
"job": {
"setting": {
"speed": {
"byte": -1,
"channel": 8
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [
{
"reader": {
"name": "ftpreader",
"parameter": {
"column": ["*"],
"protocol": "ftp",
"host": "127.0.0.1",
"port": "3021",
"username": "admin",
"password": "123456",
"compress": "stream",
"skipDelimiter": true,
"path": "/2024-01-06"
}
},
"writer": {
"name": "ftpwriter",
"parameter": {
"column": ["*"],
"protocol": "ftp",
"host": "127.0.0.1",
"port": "3021",
"username": "admin",
"password": "123456",
"path": "/data-xmh",
"fileName": "101-测试",
"writeMode": "truncate",
"compress": "stream",
"skipDelimiter": true
}
}
}
]
}
}
Version
4.1.3 (Default)
OS Type
No response
Java JDK Version
Oracle JDK 1.8.0
Relevant log output
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_8arwds66]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_46r477u6]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_zfca1d5d]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_72y56sya]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_t3qmgt8q]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_2v9g1mq6]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_u04ha4x0]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_y22h13w2]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- Finished split.
23:56:20.467 [job-0] INFO com.wgzhao.addax.core.job.JobContainer -- The Writer.Job [ftpwriter] is divided into [158632] task(s).
23:56:20.467 [job-0] DEBUG com.wgzhao.addax.core.job.JobContainer -- The transformer configuration:[]
23:56:36.145 [job-0] DEBUG com.wgzhao.addax.core.job.JobContainer --
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
G1 Young Generation | 18 | 18 | 18 | 0.625s | 0.625s | 0.625s
G1 Old Generation | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
已与地址为 ''127.0.0.1:50283',传输: '套接字'' 的目标虚拟机断开连接
这是我本地测试的结果,读取指定目录下 311个文件。为了减少篇幅,相似日志输出做了截断
2024-03-13 13:32:58.801 [ main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-03-13 13:32:58.810 [ main] INFO Engine -
{
"setting":{
"speed":{
"byte":-1,
"channel":8
},
"errorLimit":{
"record":0,
"percentage":0.02
}
},
"content":{
"reader":{
"name":"ftpreader",
"parameter":{
"column":[
"*"
],
"protocol":"ftp",
"host":"127.0.0.1",
"port":"21",
"username":"wgzhao",
"password":"*****",
"skipDelimiter":true,
"path":"/home/wgzhao/ftptest/ftpreader"
}
},
"writer":{
"name":"ftpwriter",
"parameter":{
"column":[
"*"
],
"protocol":"ftp",
"host":"127.0.0.1",
"port":"21",
"username":"wgzhao",
"password":"*****",
"path":"/home/wgzhao/ftptest/ftpwriter",
"fileName":"101-测试",
"writeMode":"truncate",
"compress":"gz",
"skipDelimiter":true
}
}
}
}
2024-03-13 13:32:58.823 [ main] INFO JobContainer - The jobContainer begins to process the job.
2024-03-13 13:32:58.856 [ job-0] WARN StorageWriterUtil - The item encoding is empty, uses [UTF-8] as default.
2024-03-13 13:32:58.856 [ job-0] WARN StorageWriterUtil - The item delimiter is empty, uses [,] as default.
2024-03-13 13:32:58.874 [ job-0] INFO JobContainer - The Reader.Job [ftpreader] perform prepare work .
2024-03-13 13:32:58.916 [ job-0] INFO FtpReader$Job - 您即将读取的文件数为: [311]
2024-03-13 13:32:58.916 [ job-0] INFO JobContainer - The Writer.Job [ftpwriter] perform prepare work .
2024-03-13 13:32:58.917 [ job-0] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:32:58.938 [ job-0] INFO FtpWriter$Job - The current writeMode is truncate, begin to cleanup all files with prefix [101-测试] under [/home/wgzhao/ftptest/ftpwriter].
2024-03-13 13:32:58.938 [ job-0] INFO FtpWriter$Job - The following file(s) will be deleted: [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_793_tdwnz8ut.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_783_4rmxunsq.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_817_44veuzg4.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_821_byy3pc11.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_821_gcfcw8vm.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_790_3xvm4f94.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_821_5t4agz8d.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_817_gu145tv1.txt].
2024-03-13 13:32:58.938 [ job-0] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:32:58.940 [ job-0] INFO JobContainer - Job set Channel-Number to 8 channel(s).
2024-03-13 13:32:58.956 [ job-0] INFO JobContainer - The Reader.Job [ftpreader] is divided into [311] task(s).
2024-03-13 13:32:58.956 [ job-0] INFO StorageWriterUtil - Begin to split...
2024-03-13 13:32:58.963 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_962_x27b8yt1]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_963_2n52sntt]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_e623a1s8]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_hgva8zyv]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_68uqewte]
2024-03-13 13:32:58.964 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_xwreawz9]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_964_6f3q3fyb]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_g3xqcuz4]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_u4ux9ywb]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_cdqacs9e]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_hb98t8ag]
2024-03-13 13:32:58.965 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_965_nshwrf8v]
2024-03-13 13:32:58.966 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133258_966_ct0fr1vx]
......
2024-03-13 13:32:59.000 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_000_ayxeax1h]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_upb2bgwv]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_9v2acmmw]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_bfmfes78]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_efg8bsz3]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_w8yu94xs]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_wtv1v4qp]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_utvqbwwa]
2024-03-13 13:32:59.001 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_uen8emwr]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_001_fexsrc16]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_5rb40g5w]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_9bsxftnm]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_1pqh18dv]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_fpppg6ep]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_n4vx5rq6]
2024-03-13 13:32:59.002 [ job-0] INFO StorageWriterUtil - split write file name:[101-测试__20240313_133259_002_huh199rz]
2024-03-13 13:32:59.010 [ job-0] INFO StorageWriterUtil - Finished split.
2024-03-13 13:32:59.010 [ job-0] INFO JobContainer - The Writer.Job [ftpwriter] is divided into [311] task(s).
2024-03-13 13:32:59.079 [ job-0] INFO JobContainer - The Scheduler launches [1] taskGroup(s).
2024-03-13 13:32:59.092 [ taskGroup-0] INFO TaskGroupContainer - The taskGroupId=[0] started [8] channels for [311] tasks.
2024-03-13 13:32:59.095 [ taskGroup-0] INFO Channel - The Channel set byte_speed_limit to -1, No bps activated.
2024-03-13 13:32:59.095 [ taskGroup-0] INFO Channel - The Channel set record_speed_limit to -1, No tps activated.
2024-03-13 13:32:59.116 [writer-0-284] INFO FtpWriter$Task - begin do write...
2024-03-13 13:32:59.116 [writer-0-284] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133259_007_g1c7ysae.txt]
2024-03-13 13:32:59.116 [reader-0-284] INFO FtpReader$Task - reading file : [/home/wgzhao/ftptest/ftpreader/000851.SZ-300660.SZ.csv]
2024-03-13 13:32:59.116 [writer-0-284] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao
2024-03-13 13:32:59.117 [writer-0-284] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:32:59.118 [writer-0-279] INFO FtpWriter$Task - begin do write...
2024-03-13 13:32:59.118 [writer-0-182] INFO FtpWriter$Task - begin do write...
2024-03-13 13:32:59.119 [writer-0-182] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133258_995_uqn2751q.txt]
2024-03-13 13:32:59.119 [writer-0-279] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133259_007_25dn2gmv.txt]
2024-03-13 13:32:59.120 [writer-0-182] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao
.....
2024-03-13 13:33:03.463 [writer-0-205] INFO FtpWriter$Task - begin do write...
2024-03-13 13:33:03.463 [reader-0-173] WARN StorageReaderUtil - Uses [,] as delimiter by default
2024-03-13 13:33:03.463 [writer-0-205] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133258_998_f4c0mn89.txt]
2024-03-13 13:33:03.463 [writer-0-205] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao
2024-03-13 13:33:03.463 [writer-0-205] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:33:03.463 [writer-0-171] INFO FtpWriter$Task - end do write
2024-03-13 13:33:03.464 [writer-0-173] INFO FtpWriter$Task - begin do write...
2024-03-13 13:33:03.464 [writer-0-173] INFO FtpWriter$Task - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133258_994_s8z8tsdr.txt]
2024-03-13 13:33:03.464 [writer-0-173] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao
2024-03-13 13:33:03.464 [writer-0-173] INFO StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:33:03.465 [writer-0-173] INFO FtpWriter$Task - end do write
2024-03-13 13:33:03.465 [reader-0-205] INFO FtpReader$Task - reading file : [/home/wgzhao/ftptest/ftpreader/600575.SH-000677.SZ.csv]
2024-03-13 13:33:03.466 [reader-0-205] WARN StorageReaderUtil - Uses [,] as delimiter by default
2024-03-13 13:33:03.466 [writer-0-205] INFO FtpWriter$Task - end do write
2024-03-13 13:33:03.467 [ reader-0-42] INFO FtpReader$Task - reading file : [/home/wgzhao/ftptest/ftpreader/688018.SH-000632.SZ.csv]
2024-03-13 13:33:03.467 [ reader-0-42] WARN StorageReaderUtil - Uses [,] as delimiter by default
2024-03-13 13:33:03.468 [ writer-0-42] INFO FtpWriter$Task - end do write
2024-03-13 13:33:03.499 [writer-0-158] INFO FtpWriter$Task - end do write
2024-03-13 13:33:05.096 [ job-0] INFO AbstractScheduler - The scheduler has completed all tasks.
2024-03-13 13:33:05.096 [ job-0] INFO JobContainer - The Writer.Job [ftpwriter] perform post work.
2024-03-13 13:33:05.096 [ job-0] INFO JobContainer - The Reader.Job [ftpreader] perform post work.
2024-03-13 13:33:05.103 [ job-0] INFO StandAloneJobContainerCommunicator - Total 225262 records, 23043447 bytes | Speed 3.66MB/s, 37543 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.974s | All Task WaitReaderTime 0.156s | Percentage 100.00%
2024-03-13 13:33:05.103 [ job-0] INFO JobContainer -
Job start at : 2024-03-13 13:32:58
Job end at : 2024-03-13 13:33:05
Job took secs : 6s
Average bps : 3.66MB/s
Average rps : 37543rec/s
Number of rec : 225262
Failed record : 0
基本上文件夹下的文件在1000~2000左右的时候问题不大,但是我现在的情况是,每天都会有8~20万个文件的增量但文件不大,只有几百个字节一个文件,以天作为的文件夹;
咱这个支持分批去同步吗,比如20万个文件,分成200次,每次1000个文件?
目前的模式每一个文件对应一个任务,也就是一个线程,如果一次读取上万,乃至上十万的话,等于一次性要开这么多个线程,在我本地模拟读取15万个文件时,直接退出了。如果你的文件命名是有规则的话,你可以在 path
项使用通配符的方式一次制定一批文件,这样应该可以临时解决你的问题,类似如下:
{
"parameter": {
"path": "/home/wgzhao/ftptest/ftpreader/100*.csv"
}
}