dynobo / PyctureStream

PoC for image processing using Kafka, Spark Streaming & TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New Version

SaddamBInSyed opened this issue · comments

@dynobo ,

First of all, thank you for your awesome work.
I really got some insights about the big-data application here.

by the way, I would like to know, Is there any update on this repo? if so where I can see those implementations?

Also, I have scenarios where I need to do the object detections from ~10 CCTV camera streams,
So Is it okay to follow your architecture model here in 2021?

please advise

Hi @SaddamBInSyed ,

this repository will not receive any updates. It was just a one-time PoC with the main goal to learn something about streaming technology and architectures.

However, I think the proposed architecture is still a solid choice today. The components from Hadoop Ecospace are in active development and the underlying Kappa pattern is quiet common and useful. (Depending on the use case, see e.g. LAMBA vs. KAPPA). Something I would switch out is the Object Detection. The progress here was huge in the last years, there are way better models out there, today.

But in the end it all depends on your requirements. ;-)

@dynobo
thank you so much for taking the time to respond.

yes, I have object detection models from the vendor which I will use over the existing TF code.

I have some grey area about the "Establishing Infrastructure" however,
I will start setting up the PoC and update you if I get stuck.

but this repo definitely Kickstarter for anyone to dive deep into bigdata.

thank you once again.

@dynobo

Hi .

Cloudera quickstart CM is now no long available as like before.

if possible, could you advise installation and setting up this software components in ubuntu PC?

is there anything specifically I need to do or configure in Vmware ubuntu images?

thanks in advance.

@SaddamBInSyed
Are you sure Quickstart VMs are not available anymore? Too me it seems like Cloudera is now just asking for some data before the download...
I'm sorry, but I can't provide details on setting things up with the current version etc. Maybe a tutorial like this one from March 2021 can help you?
Good luck and best regards!

Hi @dynobo ,

I have carefully followed your setup guide but I am getting error while executing the setup_cloudera_vm.sh

[cloudera@quickstart ~]$ wget https://raw.githubusercontent.com/dynobo/PyctureStream/master/setup_cloudera_vm.sh && chmod +x ./setup_cloudera_vm.sh && ./setup_cloudera_vm.sh
--2021-05-18 05:54:39-- https://raw.githubusercontent.com/dynobo/PyctureStream/master/setup_cloudera_vm.sh
Resolving raw.githubusercontent.com... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5784 (5.6K) [text/plain]
Saving to: “setup_cloudera_vm.sh”

100%[=====================================================================================>] 5,784 --.-K/s in 0.001s

2021-05-18 05:54:40 (5.53 MB/s) - “setup_cloudera_vm.sh” saved [5784/5784]

Loaded plugins: fastestmirror, security
Cleaning repos: base cloudera-cdh5 cloudera-gplextras5 cloudera-kafka cloudera-manager epel extras updates
Cleaning up Everything
Loaded plugins: fastestmirror, security
Setting up Update Process
Determining fastest mirrors
YumRepo Error: All mirror URLs are not using ftp, http[s] or file.
Eg. Invalid release/repo/arch combination/
removing mirrorlist with no valid mirrors: /var/cache/yum/x86_64/6/base/mirrorlist.txt
Error: Cannot find a valid baseurl for repo: base
Loaded plugins: fastestmirror, security
Setting up Install Process
Loading mirror speeds from cached hostfile
YumRepo Error: All mirror URLs are not using ftp, http[s] or file.
Eg. Invalid release/repo/arch combination/
removing mirrorlist with no valid mirrors: /var/cache/yum/x86_64/6/base/mirrorlist.txt
Error: Cannot find a valid baseurl for repo: base
Loaded plugins: fastestmirror, security
Setting up Install Process
Loading mirror speeds from cached hostfile
YumRepo Error: All mirror URLs are not using ftp, http[s] or file.
Eg. Invalid release/repo/arch combination/
removing mirrorlist with no valid mirrors: /var/cache/yum/x86_64/6/base/mirrorlist.txt
Error: Cannot find a valid baseurl for repo: base
sed: can't read /etc/kafka/conf.dist/server.properties: No such file or directory
sed: can't read /etc/kafka/conf.dist/server.properties: No such file or directory
tee: /etc/kafka/conf.dist/server.properties: No such file or directory
Settings for PyctureStream Project
tee: /etc/kafka/conf.dist/server.properties: No such file or directory
listeners=PLAINTEXT://0.0.0.0:9092
tee: /etc/kafka/conf.dist/server.properties: No such file or directory
advertised.listeners=PLAINTEXT://127.0.0.1:9092
kafka-server: unrecognized service
./setup_cloudera_vm.sh: line 62: kafka-topics: command not found
./setup_cloudera_vm.sh: line 63: kafka-topics: command not found

--2021-05-18 05:54:46-- https://repo.continuum.io/archive/Anaconda2-5.0.1-Linux-x86_64.sh
Resolving repo.continuum.io... 104.18.201.79, 104.18.200.79, 2606:4700::6812:c94f, ...
Connecting to repo.continuum.io|104.18.201.79|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://repo.anaconda.com/archive/Anaconda2-5.0.1-Linux-x86_64.sh [following]
--2021-05-18 05:54:46-- https://repo.anaconda.com/archive/Anaconda2-5.0.1-Linux-x86_64.sh
Resolving repo.anaconda.com... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 532375438 (508M) [application/x-sh]
Saving to: “Anaconda2-5.0.1-Linux-x86_64.sh"

in order to resolve the "YumRepo Error: All mirror URLs are not using ftp, http[s] or file." issue, I have tried the below URL

https://arstech.net/centos-6-error-yumrepo-error-all-mirror-urls-are-not-using-ftp-http/

https://stackoverflow.com/questions/66291083/when-using-centos-6-i-cannot-run-yum-update-anymore-i-get-this-error-cannot

https://support.cpanel.net/hc/en-us/articles/1500002629261

but still no luck

after following the above URL , I am getting error like below

[cloudera@quickstart yum.repos.d]$ yum clean all
Loaded plugins: fastestmirror, security
Cleaning repos: base cloudera-cdh5 cloudera-gplextras5 cloudera-kafka
: cloudera-manager epel extras updates
Cleaning up Everything
[cloudera@quickstart yum.repos.d]$ yum makecache
Loaded plugins: fastestmirror, security
Determining fastest mirrors
epel/metalink | 4.7 kB 00:00

  • epel: d2lzkl7pfhq30w.cloudfront.net
    http://vault.centos.org/6/os/x86_64/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
    Trying other mirror.
    Error: Cannot retrieve repository metadata (repomd.xml) for repository: base. Please verify its path and try again
    [cloudera@quickstart yum.repos.d]$ yum update -y
    Loaded plugins: fastestmirror, security
    You need to be root to perform this command.
    [cloudera@quickstart yum.repos.d]$ sudo yum update -y
    Loaded plugins: fastestmirror, security
    Setting up Update Process
    Loading mirror speeds from cached hostfile
    epel/metalink | 4.7 kB 00:00
  • epel: d2lzkl7pfhq30w.cloudfront.net
    http://vault.centos.org/6/os/x86_64/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
    Trying other mirror.
    Error: Cannot retrieve repository metadata (repomd.xml) for repository: base. Please verify its path and try again
    [cloudera@quickstart yum.repos.d]$

can you please advise what I am doing wrong here?

Thank you.

Hi @dynobo

I managed to run the repo with success.

instead of Cloudera I installed spark and Kafka myself in VM (ubuntu).

all fine. there is some bug in dashboard code apart from that all fine,.

thanks for your work.