Add support for merging/concatenating multiple notebooks

Question

Add support for merging/concatenating multiple notebooks

fperez opened this issue 8 years ago · comments

This simple gist offers a command-line tool for concatenating/merging multiple notebooks. As requested by @jamespjh, this could be a useful nbconvert feature (it would also make it robust against evolution of the internal API for users, as they'd only have to remember the cmd line call, and we'd update the internals if the nbformat API changes).

M Bussonnier · Answer 1 · Tue Feb 23 2016 03:05:33 GMT+0800 (China Standard Time)

I'm worried about the logic for merging metadata at notebook level, and why in many cases it is obvious what to do, I'm worried of the slippery slope we would get into when metadata differ.

Fernando Pérez · Answer 2 · Tue Feb 23 2016 10:27:33 GMT+0800 (China Standard Time)

I would simply make an explicit decision: the metadata is loaded so that it basically corresponds to that of the first nb in the list, plus keys from the others if they differ (the algorithm is simply to do meta.update() with all the notebooks in reverse order from the command line).

That's a simple, unambiguous choice with known semantics. If users don't like it, they can edit it back by hand later.

I don't see a problem with the feature having this constraint.

M Bussonnier · Answer 3 · Tue Feb 23 2016 10:32:32 GMT+0800 (China Standard Time)

Ok, I like a strong limitation like that. I came almost to the same conclusion while walking back home.

It might be hard to shoehorn that into the nbconvert structure itself, as right now it's constructed around the assumption that 1 exporter convert 1 notebook, and the looping on all the notebook is implicit, but we can likely arrange that.

M Bussonnier · Answer 4 · Tue Feb 23 2016 10:48:34 GMT+0800 (China Standard Time)

I propose to add a --merge flag that merge all the notebooks into one before feeding it to the rest of the pipeline. Metadata are as you proposed:

metadata = {}
for n in reversed(notebooks):
    metadata.update(n.metadata)

and for the name of the notebook (if needed) we use the first one.

This allow to not only merge, but merge (virtually) and generate a PDF/HTML version, at once.

Fernando Pérez · Answer 5 · Tue Feb 23 2016 10:51:53 GMT+0800 (China Standard Time)

+1

On Mon, Feb 22, 2016, 18:48 Matthias Bussonnier notifications@github.com
wrote:

I propose to add a --merge flag that merge all the notebooks into one
before feeding it to the rest of the pipeline. Metadata are as you proposed:

metadata = {}
for n in reversed(notebooks):
metadata.update(n.metadata)

and for the name of the notebook (if needed) we use the first one.

This allow to not only merge, but merge (virtually) and generate a
PDF/HTML version, at once.

—
Reply to this email directly or view it on GitHub
#253 (comment).

James Hetherington · Answer 6 · Wed Mar 02 2016 21:43:58 GMT+0800 (China Standard Time)

This would be great. I'm using @fperez nbmerge.py script from https://gist.github.com/fperez/e2bbc0a208e82e450f69 at the moment, and would be delighted to replace it with simple invocation of nbconvert.

Chad Lagore · Answer 7 · Wed Jun 29 2016 07:22:06 GMT+0800 (China Standard Time)

+1 here. Using nbmerge.py fairly frequently as well.

AoBoY · Answer 8 · Thu Dec 15 2016 23:11:53 GMT+0800 (China Standard Time)

I am trying to use fperez version and I am getting the following errors..
Traceback (most recent call last):
File "nbmerge.py", line 49, in
merge_notebooks(notebooks)
File "nbmerge.py", line 38, in merge_notebooks
print(nbformat.writes(merged))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 2519513: ordinal not in range(128)

Noah Young · Answer 9 · Fri Dec 16 2016 05:07:25 GMT+0800 (China Standard Time)

Happen to be using Python 3, @aoboy? I think you're seeing this issue. If so, there's an easy fix mentioned in that thread.

AoBoY · Answer 10 · Fri Dec 16 2016 05:24:54 GMT+0800 (China Standard Time)

@npyoung I solved it.. using p2.7 actually.
I changed the line from print (nbformat.writes(merged)) to
print (nbformat.writes(merged).encode('utf-8'))
basically encoding is what was missing..

David Ketcheson · Answer 11 · Tue Feb 07 2017 02:57:12 GMT+0800 (China Standard Time)

This capability would be very useful for a book I am currently working on, where each chapter is a Jupyter notebook. This feature would make it simpler to generate the print version.

M Bussonnier · Answer 12 · Tue Feb 07 2017 04:50:26 GMT+0800 (China Standard Time)

@ketch have a look at @takluyver's BookBook

Chris Sewell · Answer 13 · Fri Jul 07 2017 08:21:57 GMT+0800 (China Standard Time)

Hey guys, I've just created a repo (ipypublish), with a simple workflow/scripts for creating/editing 'publication ready' scientific reports from one or more Jupyter Notebooks (containing matplotlib, pandas, scipy, ...), without leaving the browser. Sorry for the spam but, since I used the gist posted here (thanks!), I thought it might be nice to share.

In particular it would be great to get any feedback, especially in the case where future Jupyter versions might break (or enhance) this. Since I intend to write my doctoral thesis with it!

Ta, Chris

M Pacer · Answer 14 · Fri Jul 07 2017 09:11:33 GMT+0800 (China Standard Time)

@chrisjsewell Really cool project!! You might be interested in looking at Jupyter lab, it looks like your system is a beautiful application of the kind of workflow it makes possible & you will be able to influence the sevelopmebt of that interface to ensure that it can support true kinds of features you want going forward.

Chris Sewell · Answer 15 · Fri Jul 07 2017 17:05:48 GMT+0800 (China Standard Time)

@mpacer thanks :) Yes I've seen a bit about it, looks good, I'll definitely be keeping tabs on it. I see you mentioning about easier manipulation of metadata (jupyterlab/jupyterlab#902), that's definitely relevant for my repo (chrisjsewell/ipypublish#1).

From the perspective of my research (atomic/quantum level simulations), I'm really interested in the interactive capability that javascript bridging is now offering for 3D graphics (ipywidgets, pythreejs, ipyvolume and my other repo pandas3js) and how it can be applied to the exploratory analysis -> publication workflow that Notebooks offer. Being out to 'pop' out a view of such a GUI to a separate window would definitely be pretty neat.

David Ketcheson · Answer 16 · Sun Jul 09 2017 20:36:42 GMT+0800 (China Standard Time)

People interested in this thread may also be interested in this book project, which is a collection of notebooks viewable as PDF, HTML, or executable notebooks and runnable on binder or Microsoft Azure; it's not completely finished but is in an advanced state:

https://github.com/clawpack/riemann_book

We are using bookbook, among several other tools.

Yensan · Answer 17 · Wed Jan 31 2018 10:31:17 GMT+0800 (China Standard Time)

Although I have finished reading, I have not got the HowTo thing. And nbmerge.py failed...
😕

Matthias Geier · Answer 18 · Fri Dec 14 2018 21:17:39 GMT+0800 (China Standard Time)

Since it hasn't been mentioned yet in this issue, let me suggest using https://nbsphinx.readthedocs.io/.

It basically concatenates notebooks and creates HTML pages or a LaTeX/PDF from them.

Chris Holdgraf · Answer 19 · Sun Dec 30 2018 21:45:25 GMT+0800 (China Standard Time)

Just a note that this project sort-of exists now: https://github.com/jbn/nbmerge

(FWIW, I think it's better to have a separate tool than nbconvert do merging)

Fixed Gear · Answer 20 · Thu Dec 12 2019 16:39:48 GMT+0800 (China Standard Time)

ipynb files are JSON format. What I do is open in a new python notebook all the files I want to merge, and convert them to dicts, then you can use the 'cells' key to concatenate all the cells or whatever you want to do, so finally you convert this dict or dicts back to JSON and export it to a new file.

Here is an example where I import 2 different ipynb files, and merge them into a new ipynb file:

import json
import numpy as np

first file

with open('file1.ipynb', 'r') as file:
json_1 = file.read()
dict_1 = json.loads(json_1)
cells_1 = dict_1['cells']

second file

with open('file2.ipynb', 'r') as file:
json_2 = file.read()
dict_2 = json.loads(json_2)
cells_2 = dict_2['cells']

New file (merging the first and second files)

new_dict = dict_1.copy()
new_dict['cells'] = list(np.concatenate([cells_1, cells_2]))
with open('new_file.ipynb', 'w') as json_file:
json.dump(new_dict, json_file)

Maxim Veksler · Answer 21 · Fri Jun 11 2021 13:39:39 GMT+0800 (China Standard Time)

Does loading a notebook loading as a module feature offer an answer for the discussed use case? https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Importing%20Notebooks.html