singer-io / tap-s3-csv

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue while on "Replication options" step in KNOTS

nabinkhadka opened this issue · comments

INFO Starting discover
INFO Starting discover
INFO Starting discover
INFO Starting discover
INFO Sampling records to determine table schema.
INFO Sampling records to determine table schema.
INFO Sampling records to determine table schema.
INFO Sampling records to determine table schema.
INFO Checking bucket "dev.nabin.com" for keys matching "life/Devices/nabintest.csv.gz"
INFO Checking bucket "dev.nabin.com" for keys matching "life/Devices/nabintest.csv.gz"
INFO Checking bucket "dev.nabin.com" for keys matching "life/Devices/nabintest.csv.gz"
INFO Checking bucket "dev.nabin.com" for keys matching "life/Devices/nabintest.csv.gz"
INFO Found 393 files.
INFO Found 393 files.
INFO Found 393 files.
INFO Found 393 files.
INFO Will download key "spark-nlp-resolver-public/tokenizer_std_en_1.3.0_2_1518106877302.zip" as it was last modified 2018-08-22 03:47:28+00:00
INFO Sampling life/Devices/nabintest.csv.gz (1000 records, every 10th record).
INFO Will download key "spark-nlp-resolver-public/tokenizer_std_en_1.3.0_2_1518106877302.zip" as it was last modified 2018-08-22 03:47:28+00:00
INFO Sampling life/Devices/nabintest.csv.gz (1000 records, every 10th record).
INFO Will download key "spark-nlp-resolver-public/tokenizer_std_en_1.3.0_2_1518106877302.zip" as it was last modified 2018-08-22 03:47:28+00:00
INFO Sampling life/Devices/nabintest.csv.gz (1000 records, every 10th record).
INFO Will download key "spark-nlp-resolver-public/tokenizer_std_en_1.3.0_2_1518106877302.zip" as it was last modified 2018-08-22 03:47:28+00:00
INFO Sampling life/Devices/nabintest.csv.gz (1000 records, every 10th record).
CRITICAL 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
CRITICAL 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
CRITICAL 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
CRITICAL 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Traceback (most recent call last):
  File "/usr/local/bin/tap-s3-csv", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/singer/utils.py", line 192, in wrapped
    return fnc(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/__init__.py", line 73, in main
    do_discover(args.config)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/__init__.py", line 17, in do_discover
    catalog = {"streams": discover_streams(config)}
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/discover.py", line 8, in discover_streams
    schema = discover_schema(config, table_spec)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/discover.py", line 13, in discover_schema
    sampled_schema = s3.get_sampled_schema_for_table(config, table_spec)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 33, in get_sampled_schema_for_table
    samples = sample_files(config, table_spec, s3_files)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 98, in sample_files
    sample_rate, max_records)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 71, in sample_file
    iterator = csv_handler.get_row_iterator(table_spec, file_handle, s3_path)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/csv_handler.py", line 37, in get_row_iterator
    headers = set(reader.fieldnames)
  File "/usr/local/lib/python3.6/csv.py", line 98, in fieldnames
    self._fieldnames = next(self.reader)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/csv_handler.py", line 33, in <genexpr>
    reader = csv.DictReader((line.replace('\0', '') for line in file_stream), fieldnames=field_names)
  File "/usr/local/lib/python3.6/codecs.py", line 1041, in iterdecode
    output = decoder.decode(input)
  File "/usr/local/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Traceback (most recent call last):
  File "/usr/local/bin/tap-s3-csv", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/singer/utils.py", line 192, in wrapped
    return fnc(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/__init__.py", line 73, in main
    do_discover(args.config)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/__init__.py", line 17, in do_discover
    catalog = {"streams": discover_streams(config)}
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/discover.py", line 8, in discover_streams
    schema = discover_schema(config, table_spec)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/discover.py", line 13, in discover_schema
    sampled_schema = s3.get_sampled_schema_for_table(config, table_spec)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 33, in get_sampled_schema_for_table
    samples = sample_files(config, table_spec, s3_files)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 98, in sample_files
    sample_rate, max_records)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 71, in sample_file
    iterator = csv_handler.get_row_iterator(table_spec, file_handle, s3_path)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/csv_handler.py", line 37, in get_row_iterator
    headers = set(reader.fieldnames)
  File "/usr/local/lib/python3.6/csv.py", line 98, in fieldnames
    self._fieldnames = next(self.reader)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/csv_handler.py", line 33, in <genexpr>
    reader = csv.DictReader((line.replace('\0', '') for line in file_stream), fieldnames=field_names)
  File "/usr/local/lib/python3.6/codecs.py", line 1041, in iterdecode
    output = decoder.decode(input)
  File "/usr/local/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Traceback (most recent call last):
  File "/usr/local/bin/tap-s3-csv", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/singer/utils.py", line 192, in wrapped
    return fnc(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/__init__.py", line 73, in main
    do_discover(args.config)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/__init__.py", line 17, in do_discover
    catalog = {"streams": discover_streams(config)}
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/discover.py", line 8, in discover_streams
    schema = discover_schema(config, table_spec)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/discover.py", line 13, in discover_schema
    sampled_schema = s3.get_sampled_schema_for_table(config, table_spec)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 33, in get_sampled_schema_for_table
    samples = sample_files(config, table_spec, s3_files)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 98, in sample_files
    sample_rate, max_records)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 71, in sample_file
    iterator = csv_handler.get_row_iterator(table_spec, file_handle, s3_path)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/csv_handler.py", line 37, in get_row_iterator
    headers = set(reader.fieldnames)
  File "/usr/local/lib/python3.6/csv.py", line 98, in fieldnames
    self._fieldnames = next(self.reader)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/csv_handler.py", line 33, in <genexpr>
    reader = csv.DictReader((line.replace('\0', '') for line in file_stream), fieldnames=field_names)
  File "/usr/local/lib/python3.6/codecs.py", line 1041, in iterdecode
    output = decoder.decode(input)
  File "/usr/local/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Traceback (most recent call last):
  File "/usr/local/bin/tap-s3-csv", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/singer/utils.py", line 192, in wrapped
    return fnc(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/__init__.py", line 73, in main
    do_discover(args.config)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/__init__.py", line 17, in do_discover
    catalog = {"streams": discover_streams(config)}
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/discover.py", line 8, in discover_streams
    schema = discover_schema(config, table_spec)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/discover.py", line 13, in discover_schema
    sampled_schema = s3.get_sampled_schema_for_table(config, table_spec)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 33, in get_sampled_schema_for_table
    samples = sample_files(config, table_spec, s3_files)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 98, in sample_files
    sample_rate, max_records)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/s3.py", line 71, in sample_file
    iterator = csv_handler.get_row_iterator(table_spec, file_handle, s3_path)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/csv_handler.py", line 37, in get_row_iterator
    headers = set(reader.fieldnames)
  File "/usr/local/lib/python3.6/csv.py", line 98, in fieldnames
    self._fieldnames = next(self.reader)
  File "/usr/local/lib/python3.6/site-packages/tap_s3_csv/csv_handler.py", line 33, in <genexpr>
    reader = csv.DictReader((line.replace('\0', '') for line in file_stream), fieldnames=field_names)
  File "/usr/local/lib/python3.6/codecs.py", line 1041, in iterdecode
    output = decoder.decode(input)
  File "/usr/local/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

@nabinkhadka It looks like you are trying to replicate gzipped csv files which are not currently supported by the tap. We'd be happy to accept a PR to add support for gzipped files.