philipmulcahy / azad

amazon order history reporter chrome extension

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Size of 1.9.34 is 1.7MB. Size of Chrome store extension is 64MB. Why?

marksolaris opened this issue · comments

Whilst appraising if this is safe to run, I unpack the Chrome extension CRX and see:

drwx------ 2 me me     4096 Oct 14 14:31 _metadata/
-rw-r--r-- 1 me me 17948354 Oct 14 14:15 alltests.bundle.js
-rw-r--r-- 1 me me 14084024 Oct 14 14:15 background.bundle.js
-rw-r--r-- 1 me me 14742964 Oct 14 14:15 control.bundle.js
-rw-r--r-- 1 me me      527 Oct 14 14:15 datatables_override.css
-rw-r--r-- 1 me me     3581 Oct 14 14:15 icon128.png
-rw-r--r-- 1 me me     1558 Oct 14 14:15 icon48.png
-rw-r--r-- 1 me me 18424289 Oct 14 14:15 inject.bundle.js
-rw-r--r-- 1 me me      820 Oct 14 14:15 inject.css
-rw-r--r-- 1 me me    13900 Oct 14 14:15 jquery.dataTables.min.css
-rw-r--r-- 1 me me     1368 Oct 14 14:31 manifest.json
-rw-r--r-- 1 me me     2084 Oct 14 14:15 popup.css
-rw-r--r-- 1 me me     8782 Oct 14 14:15 popup.html
-rw-r--r-- 1 me me      160 Oct 14 14:15 sort_asc.png
-rw-r--r-- 1 me me      201 Oct 14 14:15 sort_both.png
-rw-r--r-- 1 me me      158 Oct 14 14:15 sort_desc.png

Why are those bundles so huge?

I haven't been exposed to Node.js much hence why I have the query.

Did the extension get packaged with the 'development' env var still turned on? I can see all the files in tests/ were included.

I can replicate the chonk with a default build

% du -hs build
63M     build

Setting utils/env.js to 'normal' helps a bit.

% du -hs build
28M     build

-rwxrwxrwx 1 root root 7822147 Oct 22 10:37 alltests.bundle.js*
-rwxrwxrwx 1 root root 6212804 Oct 22 10:37 background.bundle.js*
-rwxrwxrwx 1 root root 6483854 Oct 22 10:37 control.bundle.js*
-rwxrwxrwx 1 root root     527 Oct 22 10:37 datatables_override.css*
-rwxrwxrwx 1 root root    3581 Oct 22 10:37 icon128.png*
-rwxrwxrwx 1 root root    1558 Oct 22 10:37 icon48.png*
-rwxrwxrwx 1 root root 7942361 Oct 22 10:37 inject.bundle.js*
-rwxrwxrwx 1 root root     820 Oct 22 10:37 inject.css*
-rwxrwxrwx 1 root root   13900 Oct 22 10:37 jquery.dataTables.min.css*
-rwxrwxrwx 1 root root    1302 Oct 22 10:37 manifest.json*
-rwxrwxrwx 1 root root    2084 Oct 22 10:37 popup.css*
-rwxrwxrwx 1 root root    8782 Oct 22 10:37 popup.html*
-rwxrwxrwx 1 root root     160 Oct 22 10:37 sort_asc.png*
-rwxrwxrwx 1 root root     201 Oct 22 10:37 sort_both.png*
-rwxrwxrwx 1 root root     158 Oct 22 10:37 sort_desc.png*

I got scared off by the chonk and after messing around with curl, extensions and phantomjs (Amazon's Seige encryption sucks) I settled on coding up a tampermonkey userscript to tear out the current Orders page HTML and save it to disk. It's suitable for my low volume purchasing. I like it because both files add up to 5KB of javascript and perl with none of the recursive dependencies that Node has.

I like your extension, it's awesome work and the funds raised for charity is epic. Alas I'm a "less is more" advocate so I'll bow out from using it.

My userscript sticks a Downloads Orders link at the top of each Orders page and saves that HTML scrape to a filename with the year and pagination number in it.

// ==UserScript==
// @name            Amazon Order HTML Grab
// @version         1.1
// @include         https://www.amazon.com/your-orders/order*
// @require         http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js
// @run-at          document-idle
// @grant           unsafeWindow
// @description     Grab Amazon orders raw HTML
// ==/UserScript==

(function(){
    function insert_download(order_div) {
        var new_a = document.createElement('A');
        new_a.setAttribute('href', '#');
        new_a.textContent = 'Download Orders';
        new_a.addEventListener("click", download_orders_div);
        order_div.prepend('<BR>');
        order_div.prepend('<BR>');
        order_div.prepend(new_a);
        order_div.prepend('<BR>');
        // Add a hidden div which will save the order_div outerHTML
        var filesave_a = document.createElement("A");
        filesave_a.classList.add("filesave");
        filesave_a.style.display = 'none';
        document.body.appendChild(filesave_a);
    }

    function download_orders_div() {
        var target_div = document.getElementsByClassName("your-orders-content-container__content")[0];
        var filesave_div = document.getElementsByClassName("filesave")[0];
        var chosen_year_span = document.getElementsByClassName("a-dropdown-prompt")[0]; // always at the top
        var chosen_year = chosen_year_span.textContent.replace(/[\n\r]+|[\s]{2,}/g, ' ').trim();
        var page = '1';
        var ul_pagination = document.getElementsByClassName("a-pagination")[0];
        if (ul_pagination != null) {
            var selected_li = ul_pagination.getElementsByClassName("a-selected")[0];
            if (selected_li != null) {
                page = selected_li.textContent
            }
        }
        var filename = window.location.hostname.replace(/\./g, '_') + '_order_scrape_' + chosen_year + '_' + page + '.div';
        if (target_div != null) {
            target_div.select;
            document.execCommand("copy");
            // console.log('copied ' + target_div.outerHTML);
            console.log('download_orders_div: saving HTML to ' + filename);
            filesave_div.setAttribute('href', 'data:text/plain;charset=utf-8,' + encodeURIComponent(target_div.outerHTML));
            filesave_div.setAttribute('download', filename);
            filesave_div.click();
        } else {
            console.log('DIV .your-orders-content-container__content not found');
        }
    }

    function waitForKeyElements (selectorTxt, actionFunction, bWaitOnce, iframeSelector) {
        var targetNodes, btargetsFound;
        if (typeof iframeSelector == "undefined") targetNodes = $(selectorTxt);
        else targetNodes = $(iframeSelector).contents().find(selectorTxt);

        if (targetNodes && targetNodes.length > 0) {
            targetNodes.each ( function () {
                var jThis = $(this);
                var alreadyFound = jThis.data ('alreadyFound') || false;

                if (!alreadyFound) {
                    // console.log('waitForKeyELements running ' + jThis);
                    actionFunction (jThis);
                    jThis.data ('alreadyFound', true);
                }
            } );
            btargetsFound = true;
        } else { btargetsFound = false; }
        var controlObj = waitForKeyElements.controlObj || {};
        var controlKey = selectorTxt.replace (/[^\w]/g, "_");
        var timeControl = controlObj [controlKey];
        if (btargetsFound && bWaitOnce && timeControl) {
            clearInterval (timeControl);
            delete controlObj [controlKey];
        } else {
            if ( ! timeControl) {
                timeControl = setInterval ( function () {
                    waitForKeyElements ( selectorTxt, actionFunction, bWaitOnce, iframeSelector );
                }, 500);
                controlObj [controlKey] = timeControl;
            }
        }
        waitForKeyElements.controlObj = controlObj;
    }
    waitForKeyElements ('DIV[class="your-orders-content-container__content js-yo-main-content"]', insert_download);
})();

I end up with:

www_amazon_com_order_scrape_2017_1.div
www_amazon_com_order_scrape_2018_1.div
www_amazon_com_order_scrape_2019_1.div
www_amazon_com_order_scrape_2019_2.div
www_amazon_com_order_scrape_2020_1.div
www_amazon_com_order_scrape_2020_2.div
www_amazon_com_order_scrape_2021_1.div
www_amazon_com_order_scrape_2022_1.div
www_amazon_com_order_scrape_2023_1.div

And with this quick Mojo::Dom extractor I can generate a CSV and/or push the values elsewhere where I need them.

#!/usr/bin/perl
use Mojo::DOM;

# Read in Amazon DIV contents
my $div_file = $ARGV[0];
if (not -f $div_file) {
    print STDOUT "$0 amazon_orders_yyy_x.div\n"; exit(0);
}

# Make a DOM string
my $html = "<HTML><BODY>";
open(DIV, $div_file); $html .= join("", <DIV>); close(DIV);
$html .= "</BODY></HTML>"; 
$html =~ s/\s+/ /g;
$html =~ s/\n\n/\n/g;

# Find printables and output
$dom = Mojo::DOM->new;
$dom->parse($html);
for my $element ($dom->find('*')->each) {
    if (($element->attr('class') =~ /a-color-secondary/)) {
        my $text = $element->text; $text =~ s/^\s+//;
        if (length($text)) { print STDOUT " $text"; }
    }
    if (($element->tag eq 'bdi') && ($element->attr('dir') eq "ltr")) {
        if (length($element->text)) { printf STDOUT " %s\n", $element->text; }
    }
}

Gives this initially. More work to do to extract the item titles.

 ./dump_orders www.amazon.com_order_scrape_2020_2.div
 Order placed  March 21, 2020  Total  $352.32  Ship to Order #  113-7085813-00000000
 Order placed  February 21, 2020  Total  $13402.28  Ship to Order #  114-9894300-00000000
 Order placed  February 9, 2020  Total  $9926.87  Ship to Order #  113-4279419-00000000
 Order placed  January 27, 2020  Total  $6920.52  Ship to Order #  113-9354114-00000000
 Order placed  January 23, 2020  Total  $17897818.23  Ship to Order #  114-5737246-00000000
 Order placed  January 13, 2020  Total  $415635.32  Ship to Order #  113-5279924-00000000

Hello @marksolaris,

What are you hoping to achieve with this ticket?

Yours,

Philip

It can be closed as being answered by the reality of the Node.js usage. Initially the ticket was about the extension being 64MB as that has security implications with exposure to so much of other peoples Node.js codebase.