Plexodus-Tools

A collection of tools to process the Google Plus-related data from Google Takeout.

Google+ Shutting down April 2019

With the announcement of Google+ shutting its doors for consumers, and the second announcement of Google+ expediting its shutdown, those of us who've accumulated a large collection of posts and other data over the lifetime of the platform, are in need of tools to process the data archives that Google are providing us through Google Takeout, and possibly live data through the Google+ API. With the shutdown originally being scheduled for August 2019, but following the second announcement, expedited to April 2nd, 2019, and the Google+ APIs being turned off even sooner, on Sunday March 7th, 2019 at the latest, time has ran out.

This repository will hopefully provide some of those tools.

Installation Instructions

This section is divided into two parts:

platform-specific instructions to get the Plexodus-Tools toolset installed
generic platform-independent instructions to set up Plexodus-Tools' dependencies and run its wrapper command.

1. Platform-specific instructions

Select your desired platform below, and follow its instructions:

Android
macOS
GNU Linux (Ubuntu)

Android via Termux

While most versions of Android don't come with a terminal emulator, the Google Play Store does have an excellent app called Termux, which allows you to install and run various Linux applications.

Get Termux from the Play Store
Open the Termux App
Update Termux packages, by running on the command-line prompt:

pkg upgrade

Install git and bash, by running on the command-line prompt:

pkg install git bash

You should now be able to continue with 2. Plexodus-Tools Installation Instructions

macOS via Homebrew

Start your preferred Terminal emulator. While I personally prefer iTerm2, macOS itself already comes with Terminal.app, which should work fine as well.
Install Homebrew. Homebrew is the package manager used by Plexodus-Tools, and is one of the most popular CLI package managers for macOS used to install CLI tools and dependencies. On Homebrew's homepage you can find the preferred command to install Homebrew on macOS (/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"), but you can also find more advanced installation instructions on Homebrew's Installation page
Update Bash and install Git: brew install bash git
You should now be able to continue with 2. Plexodus-Tools Installation Instructions

GNU Linux (Ubuntu)

These instructions are for Ubuntu, but should work for other versions of Linux as well, replacing apt-get with your preferred package manager.

Install Bash and Git:

sudo apt-get install bash git

That's it. You should now be able to continue with 2. Plexodus-Tools Installation Instructions

2. Plexodus-Tools Installation instructions:

On the command-line prompt, run:

  git clone https://github.com/FiXato/Plexodus-Tools && cd Plexodus-Tools

This will clone the source code into the Plexodus-Tools directory, and change the current working directory to this newly created directory.

Next you can run the wrapper script:

  ./bin/plexodus-tools.sh

This should bring up a console-based menu. Press "1" followed by your enter key to set up the required dependencies for Plexodus-Tools.

Once the Setup task has run, you should be able to run all the other scripts without issues.

Extract files from a Zip/Zip64 archive

If your (Takeout) Zip-archive is greater than 2GB, and has to be extracted on a platform that doesn't support extracting zip64 files natively, I suggest you install p7zip, a port of 7-Zip for POSIX systems. On macOS the easiest would be to install it through Homebrew with brew install p7zip. Once (p)7-Zip has been installed, you can extract all files while retaining their directory structure with:

7z x takeout-20181025T124927Z-001.zip

If you want to just extract the JSON files, you can use this instead:

7z x takeout-20181025T124927Z-001.zip '*.json' -r

To extract them to a different directory (while retaining their directory structure):

7z x takeout-20181025T124927Z-001.zip '*.json' -r -o/path/to/output/dir/

To extract them to a different directory (without creating sub-directories):

7z e takeout-20181025T124927Z-001.zip '*.json' -r -o/path/to/output/dir/

Extract all *.json, *.html, *.csv, *.vcf and *.ics files from multi-part Zip-archives:

7z x -an -ai'!takeout-20181111T153533Z-00*.zip' '*.json' '*.html' '*.csv' '*.vcf' '*.ics' -r -oextracted/2018-11-11/

Explanation of commonly used arguments and flags for `7z` command:

Argument	Explanation
`e`	Extract archive into current folder, without retaining folder structure.
`x`	eXtract archive while retaining folder structure.
`t`	Test (matched) contents of archive.
`l`	List (matched) contents of archive.
`-an`	No Archive Name matching. Recommended since we're doing a 'wildcard' archive match with `-ai!`.
`-ai`	Use Archive Include to define the input archives. We're wrapping the (masked/wildcarded) filename in quotes to prevent shell interpretation of the exclamation (!) mark. The filename is prefixed with an exclamation (`!`) mark to allow for wildcards with the asterisk (`*`) character.
`-r`	Recurse through the archive. Needed to match the wildcard filename patterns through the entire archive.
`-o`	Specify Output path. Files will be extracted with this folder as their root folder. It should be directly followed by the path; no space between the flag (`-o`) and the path (`extracted/2018-11-11`). If the path does not exist yet, it will be automatically created.
`*.json`	apply archive operation on archived files matching `*.json` (JavaScript Object Notation) filename pattern.
`*.html`	apply archive operation on archived files matching `*.html` (HyperText Markup Language) filename pattern.
`*.csv`	apply archive operation on archived files matching `*.csv` (Comma Separated Values) filename pattern.
`*.vcf`	apply archive operation on archived files matching `*.vcf` (Virtual Contact File vCards) filename pattern.
`*.ics`	apply archive operation on archived files matching `*.ics` (Internet Calendaring and Scheduling) filename pattern.

Filtering JSON data with a jq library

One of those tools is plexodus-tools.jq, a library of filter methods for the excellent commandline JSON processor jq. With the library you'll be able to chain filters and sort methods to limit your Google+ Takeout JSON files to a subset of data you can then pass on to other tools or services. For instance, it will allow you to limit your Activity data to just public posts, or those with comments or other interactions with one or more specific users.

Combine all the JSON activity files into a single file

It's useful to combine all the separate JSON activity files into a single JSON file:

jq -s '.' "Takeout/Google+ Stream/Posts/*.json" > combined_activities.json

If you run into the 'argument list too long' error, you can instead use this solution:

gfind 'Takeout/Google+ Stream/Posts/' -iname '*.json' -exec cat {} + | jq -s '.' > combined_activities.json

(Using gfind rather than find to indicate I'm using GNU's find which I've installed through Homebrew, rather than the default BSD find available on macOS.

This way you can directly use this single file for your future jq filter commands.

How to use the library

jq -L /path/to/Plexodus-Tools/ 'include "plexodus-tools"; . ' /path/to/combined_activities.json

You specify the directory in which the plexodus-tools.jq library is located with: -L /path/to/Plexodus-Tools/ and then load it by specifying include "plexodus-tools"; before your actual jq query.

Filter Methods:

Filter Name	Description
`not_empty`	Exclude empty results
`with_comments`	Only return Activity results that have Comments
`without_comments`	Only return Activity results that lack any Comments
`with_image`	Only return Activity results that have an Image Attachment
`with_video`	Only return Activity results that have a Video Attachment
`with_audio`	Only return Activity results that have an Audio Attachment
`with_media`	Only return Activity results that have any kind of Media Attachment
`without_media`	Exclude Activity items without any kind of Media Attachment from the results
`has_legacy_acl`	Only return Activity results whose `postAcl` contains an `isLegacyAcl` item.
`has_collection_acl`	Only return Activity results whose `postAcl` contains a `collectionAcl` item.
`has_community_acl`	Only return Activity results whose `postAcl` contains a `communityAcl` item.
`has_event_acl`	Only return Activity results whose `postAcl` contains an `eventAcl` item.
`has_circle_acl`	Only return Activity results whose `postAcl`.`visibleToStandardAcl` contains a `circles` item.
`has_public_circles_acl`	Only return Public Activity results; i.e. those that have `CIRCLE_TYPE_PUBLIC` as `visibleToStandardAcl` 'circle' type.
`is_public`	For now an alias for `has_public_circles_acl`. Note that this might not (yet) include posts that were posted to public Collections or publicly accessible Communities; this may change in the future.
`has_extended_circles_acl`	Only return Extended Circled Activity results; i.e. those that have `CIRCLE_TYPE_EXTENDED_CIRCLES` as `visibleToStandardAcl` 'circle' type. These are posts that were set to only be visible to your 'Extended Circles'.
`has_own_circles_acl`	Only return Private Activity results; i.e. those that have `CIRCLE_TYPE_YOUR_CIRCLES` as `visibleToStandardAcl` 'circle' type. These are posts that were set to only be visible to your 'Your Circles'.
`has_your_circles_acl`	Alias for `has_own_circles_acl`.
`with_interaction_with(displayNames)`	Only return Activity items that have some form of interaction with users whose `displayName` is an exact match for one of the specified displayNames. `displayNames` can be either a string, or an array of strings.
`with_comment_by(displayNames)`	Only return Activity items as results when they have Comments by any of the users whose `displayName` is an exact match for one of the specified displayNames. `displayNames` can be either a string, or an array of strings.
`url_from_domain(domains)`	Only return Activity results with url items that match any of the specified `domains`. `domains` can be either a string, or an array of strings.
`from_collection(displayNames)`	Only return Activity results that were posted to a Collection of which the `displayName` is an exact match for one of the specified displayNames. The supplied `displayNames` can be either a string, or an array of strings. Note that collections by different owners could have the same name. If you only want to match activities in a specific collection, you'll have to find its resourceName and use that with the `from_collection_with_resource_name(resourceNames)` filter instead.
`from_collection_with_resource_name(resourceNames)`	Similar to `from_collection`, but rather than compare to the `displayName` of the Collection, compares items to the unique `resourceName` instead.
`from_collection_with_resource_id(resourceNames)`	Similar to `from_collection_with_resource_name`, but only needs a `resourceId` rather that the `resourceName`; i.e. it doesn't need the `collections/` prefix.
`sort_by_creation_time`	Sort results by the Activity's `creationTime`.
`sort_by_update_time`	Sort results by the Activity's `updateTime`.
`sort_by_last_modified`	Alias for `sort_by_update_time`.
`sort_by_url`	Sort results by the Activity's `url` item.
`sort_activity_log_by_ts`	Sort ActivityLog items by their `timestampMs` timestamp item.
`get_circles`	Get list of all unique circles items from the current results.
`get_all_circle_types`	Get list of all unique circle types from the current results.
`get_all_circle_display_names`	Get list of all unique circle displayNames from the current results.
`get_all_circle_resource_names`	Get list of all unique circle resourceNames from the current results.
`get_all_acl_keys`	Get list of all unique Access Control List keys from the current results.
`get_all_community_names`	Get list of all unique Community displayNames from the current results.
`get_all_collection_names`	Get list of all unique Collection displayNames from the current results.
`get_all_event_resource_names`	Get list of all unique Event resourceNames from the current results.
`get_all_media_content_types`	Get list of all unique media content-types from the current results.

Examples

Return just the activities that are marked as 'public', and have comments by a user whose displayName is FiXato, and sort the results by the creation time of the Actvity:

jq -L /path/to/Plexodus-Tools/ 'include "plexodus-tools"; . | is_public | with_comment_by("FiXato") | sort_by_creation_time' combined_activities.json

Return just the activities that have any kind of interaction with a users whose displayName is either FiXato or Filip H.F. Slagter, have some form of media attachment, and sort the results by the last modified (updateTime) time of the Actvity:

jq -L /path/to/Plexodus-Tools/ 'include "plexodus-tools"; . | with_interaction_with(["FiXato", "Filip H.F. Slagter"]) | with_media | sort_by_last_modified' combined_activities.json

Return just the activities that were posted to a Collection with the name Google Plus Revisited and sort by their creation time:

jq -L /path/to/Plexodus-Tools/ 'include "plexodus-tools"; . | from_collection("Google Plus Revisited") | sort_by_creation_time' combined_activities.json

Get a list of all the unique Circle types in your JSON archive:

jq -L /path/to/Plexodus-Tools/ 'include "plexodus-tools";get_all_circle_types' combined_activities.json

The result of this is likely:

[
  "CIRCLE_TYPE_EXTENDED_CIRCLES",
  "CIRCLE_TYPE_PUBLIC",
  "CIRCLE_TYPE_USER_CIRCLE",
  "CIRCLE_TYPE_YOUR_CIRCLES"
]

Get all file extensions from archives

Just the last file extension:

7z l -an -ai'!takeout-20181111T153533Z-00*.zip' | gsed -E 's/\s+/ /g' | gcut -d' ' -f1,2,3,4,5 --complement | ggrep -E -o '\.([^.]+)$' | sort -u

Up to the last 3 file extensions, of which the first 2 can be at most 4 characters long, while the last (primary) file extension can be of arbitrary length:

7z l -an -ai'!takeout-20181111T153533Z-00*.zip' | gsed -E 's/\s+/ /g' | gcut -d' ' -f1,2,3,4,5 --complement | ggrep -E -o '(\.[^.]{1,4}){0,2}\.([^.]+)$' | sort -u

Note: I'm using gsed, gcut and ggrep here to indicate I'm using the GNU versions of the utilities, rather than the BSD versions supplied by macOS. These versions can be installed (and linked with g-prefixes) through Homebrew on macOS. For instance with brew install sed cut grep. On other platforms such as Linux and Windows Cygwin, you're likely installing the GNU versions anyway.

Bash scripts

Aside the various Ruby scripts, this toolset also contains some self-contained Bash scripts for data retrieval and processing:

Get contact data details for a profile

While the Google Plus APIs have been terminated on March 7th, 2019, the profile data (at least for numeric IDs) is still available through the newer Google People API. This toolset allows you to retrieve this data.

People API Key

This requires an API key (or OAuth workflow). You can find official developers.google.com instructions on how to get started with the Google People API, and in particular https://console.developers.google.com/apis/credentials/key to create request an API key for your project.

You also need to set the GPLUS_APIKEY ENVironment variable for your Google Plus API key: export GPLUS_APIKEY=Z382rQPoNMlKJIhGfEDc_Ba

(This above key value is obviously an example one; you'll need to replace them with your own actual keys.)

You just need to pass the (numeric) profile id to the people_api_data_for_gplus_profile.sh script:

Getting people data by numeric ID:

./people_api_data_for_gplus_profile.sh 123456

Getting people data by +CustomProfileHandle:

Passing the ID for a Custom Profile URL (e.g. +YonatanZunger for https://plus.google.com/+YonatanZunger), should also work:

./people_api_data_for_gplus_profile.sh +YonatanZunger

Getting people data by profile URL:

Even passing the URL should work:

./people_api_data_for_gplus_profile.sh https://plus.google.com/112064652966583500522

Retrieving People data from a list of IDs:

If you have a list of userIDs or profile URLs stored in memberslist.txt, with each ID on a separate line, you can use xargs to pass all these to the script. For instance with 3 request running in parallel, and deleting the target JSON file if a retrieval error occurs:

rm logs/failed-profile-retrievals.txt; cat memberslist.txt | xargs -L 1 -P 3 -I __UID__ ./people_api_data_for_gplus_profile.sh __UID__ --delete-target

(__UID__ will automatically be filled in by xargs)

Or leave the JSON output files intact when a retrieval error occurs, so you can debug more easily, and so profiles that no longer exist (and thus return a http 404 return code) won't be retried:

rm logs/failed-profile-retrievals.txt; cat memberslist.txt | xargs -L 1 -P 3 ./people_api_data_for_gplus_profile.sh

I would not recommend increasing the amount of parallel processes beyond 3, as you're more likely to hit User Rate Limit Exceeded errors then.

grep

grep is a CLI tool for regular expression-based filtering of files and data. Since the BSD version of grep that comes with macOS is rather limited, please install GNU grep instead. Not sure if it's through brew install grep or brew install gnu-grep.

Once installed on macOS, you should have access to GNU's version of grep via the g-prefix: ggrep. The Bash scripts in this repository will automatically use ggrep rather than grep if they find it available on your system.

sed

sed the Stream EDitor is used for regular expression substitution. Since the BSD version of sed that comes with macOS is rather limited, you will likely need to install GNU sed instead with brew install gsed and replace calls to sed with gsed

Once installed on macOS, you should have access to GNU's version of sed via the g-prefix: gsed. The scripts in this repository will automatically use gsed rather than sed if they find it available on your system.

curl

curl is a CLI tool for retrieving online resources. For the tools in this project, I use it to send HTTP GET requests to the APIs.

jq

jq is an excellent CLI tool for parsing and filtering JSON: https://stedolan.github.io/jq/

It can be downloaded from its website, or through your package manager (for instance on macOS through brew install jq).

Thanks

Michael Prescott for having a need for this script, and for bouncing ideas back and forth with me.
Abdelghafour Elkaaba, for giving some ideas on how to import this back into Blogger's native comments
Edward Morbius, for moderating the Google+ Mass Migration community on Google+
Peggy K, for signal boosting and being a soundboard

Relevant Links

You can follow related discussions here:

Export to other formats

Some of the other tools will assist in converting the (filtered) data to other formats, such as for instance HTML, or possibly Atom of json-ld, for import into other platforms.

License

This project is licensed under the terms of the GPLv3 license.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

FiXato / Plexodus-Tools

Plexodus-Tools

Google+ Shutting down April 2019

Installation Instructions

1. Platform-specific instructions

Android via Termux

macOS via Homebrew

GNU Linux (Ubuntu)

2. Plexodus-Tools Installation instructions:

Extract files from a Zip/Zip64 archive

Explanation of commonly used arguments and flags for `7z` command:

Filtering JSON data with a jq library

Combine all the JSON activity files into a single file

How to use the library

Filter Methods:

Examples

Get all file extensions from archives

Bash scripts

Get contact data details for a profile

People API Key

Getting people data by numeric ID:

Getting people data by +CustomProfileHandle:

Getting people data by profile URL:

Retrieving People data from a list of IDs:

grep

sed

curl

jq

Thanks

Relevant Links

Export to other formats

License

About

Languages

Plexodus-Tools

Google+ Shutting down April 2019

Installation Instructions

1. Platform-specific instructions

Android via Termux

macOS via Homebrew

GNU Linux (Ubuntu)

2. Plexodus-Tools Installation instructions:

Extract files from a Zip/Zip64 archive

Explanation of commonly used arguments and flags for 7z command:

Filtering JSON data with a jq library

Combine all the JSON activity files into a single file

How to use the library

Filter Methods:

Examples

Get all file extensions from archives

Bash scripts

Get contact data details for a profile

People API Key

Getting people data by numeric ID:

Getting people data by +CustomProfileHandle:

Getting people data by profile URL:

Retrieving People data from a list of IDs:

grep

sed

curl

jq

Thanks

Relevant Links

Export to other formats

License

About

Languages

Explanation of commonly used arguments and flags for `7z` command: