grundic / confluence-page-copier

Python script for creating recursive copy of Confluence pages.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem when trying to copy more advanced tree structures

franzhill opened this issue · comments

Hi,
Really appreciate you sharing that piece of code.
I have been playing around with it trying to copy some page trees on my Confluence. I have successfully made it work on small trees with a simple structure. (even cross-space).
However, when trying to copy more 'advanced' structures (with possibly a few dozen pages, and/or pages containing attachements etc.) I run into errors:
E.g.

$ python copier.py --username="*" --password="*" --endpoint="https://*.atlassian.net/wiki" --src-space="DTS" --src-title="Releases Documentation" --dst-space="FT" --dst-title-template="{title} (Copied using copier.py)"
DEBUG:confl-copier:Searching page by space 'DTS' and title 'Releases Documentation'
DEBUG:confl-copier:Found 1 page(s)
DEBUG:confl-copier:Setting ancestor id to 917507
DEBUG:confl-copier:Searching page by space 'FT' and title 'Releases Documentation (Copied using copier.py)'
DEBUG:confl-copier:Found 0 page(s)
INFO:confl-copier:Copying 'DTS/Releases Documentation' => 'FT/Releases Documentation (Copied using copier.py)'
Traceback (most recent call last):
  File "copier.py", line 430, in <module>
    recursion_limit=args.recursion_limit
  File "copier.py", line 95, in copy
    page_copy = self._copy_page(source, ancestor_id, dst_space_key, dst_title)
  File "copier.py", line 272, in _copy_page
    'ancestors': [{'id': ancestor_id}],
  File "/usr/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 794, in create_new_content
    headers={"Content-Type": "application/json"}, callback=callback)
  File "/usr/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 140, in _service_post_request
    return self._service_request("POST", *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 116, in _service_request
    response.raise_for_status()
  File "/usr/lib/python2.7/site-packages/requests/models.py", line 840, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://dexstr.atlassian.net/wiki/rest/api/content

Not the most evocative of error messages...
Have you ever run into this kind of problem or have you got an idea as to why or how to circumvent?

Thanks anyway,
Regards
Francois

Hello,
Thank you for your ticket!

Actually I haven't used it on complex pages, so this is the first time I see this problem.
Error 500 could mean that our request was not processed by WiKi and has triggered some exception. If it's possible, I would recommend to take a look at Confluence logs after this error happens, maybe there would be the answer.
Also you could try to copy only one page after which exception was raised:

INFO:confl-copier:Copying 'DTS/Releases Documentation' => 'FT/Releases Documentation (Copied using copier.py)'

From my side, I would try to reproduce 500 error from Confluence and handle this error appropriate -- maybe there some meaningful message.

Hi, @FHSpam.
There were made some improvements in script, could you please try to reproduce the issue?

Hi Grigory,
Thank you for your email.
I will be trying it again very soon and will let you know.
Cheers,
François

----- Mail original -----
De: "Grigory Chernyshev" notifications@github.com
À: "grundic/confluence-page-copier" confluence-page-copier@noreply.github.com
Cc: "FHSpam" FH.Spam@free.fr, "Mention" mention@noreply.github.com
Envoyé: Mercredi 18 Mai 2016 15:56:34
Objet: Re: [grundic/confluence-page-copier] Problem when trying to copy more advanced tree structures (#1)

Hi, @FHSpam .
There were made some improvements in script, could you please try to reproduce the issue?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

I can reproduce above error "requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://dexstr.atlassian.net/wiki/rest/api/content" with both Python 2 and 3 using current version

Error happens only when copying page to ANOTHER space. there is no error and copy works fine when copying to the SAME space. As you can see, FHSpam was also copying to another space

@rfominych,
Thank you for your comment. I will try to reproduce the issue.

@rfominych,
could you please paste here source page title that you are trying to copy? And if it's possible, attach tail of Confluence logs with corresponding exception.

Confirm, looks like some problem with copying to another space:

java.lang.IllegalArgumentException: Can't add a parent from another space.

added some debug to PythonConfluenceAPI and got:
C:\Users\roman\Documents\documentation\confluence-page-copier-master>C:\Users\roman\AppData\Local\Programs\Python\Python35\python.exe copier.py --username="*" --password="*" --endpoint="https://*.atlassian.net/wiki" --src-space="~rf" --src-id=73564173 --dst-space="DOC" DEBUG:confl-copier:Searching page by id '73564173' DEBUG:confl-copier:Setting ancestor id to 66322435 DEBUG:confl-copier:Searching page by space 'DOC' and title 'test page 1 (1)' DEBUG:confl-copier:Found 0 page(s) INFO:confl-copier:Copying '~rf/test page 1' => 'DOC/test page 1 (1)' DEBUG:api-proxy:content_data is: DEBUG:api-proxy:{'space': {'key': 'DOC'}, 'body': {'storage': {'representation': 'storage', 'value': '<p>some contents</p>'}}, 'title': 'test page 1 (1)', 'type': 'page', 'ancestors': [{'id': '66322435'}]} Traceback (most recent call last): File "copier.py", line 436, in <module> recursion_limit=args.recursion_limit File "copier.py", line 102, in copy page_copy = self._copy_page(source, ancestor_id, dst_space_key, dst_title) File "copier.py", line 277, in _copy_page 'ancestors': [{'id': ancestor_id}], File "C:\Users\roman\AppData\Local\Programs\Python\Python35\lib\site-packages\PythonConfluenceAPI\api.py", line 796, in create_new_content headers={"Content-Type": "application/json"}, callback=callback) File "C:\Users\roman\AppData\Local\Programs\Python\Python35\lib\site-packages\PythonConfluenceAPI\api.py", line 140, in _service_post_request return self._service_request("POST", *args, **kwargs) File "C:\Users\roman\AppData\Local\Programs\Python\Python35\lib\site-packages\PythonConfluenceAPI\api.py", line 116, in _service_request response.raise_for_status() File "C:\Users\roman\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\models.py", line 844, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://parspro.atlassian.net/wiki/rest/api/content

sorry cannot check Confluence logs since I am not System Administrator

so the problem as I see is:

we are creating page in the DOC space (destination space), but supplying 'ancestors': [{'id': '66322435'}] that is the ID of the "~rf" space (that is source space)

so we need to send ancestor id NOT from source, but from destination space

of course application works with this bug when copying pages inside the same space

yep, already found that out, thanks! Trying to test different cases to not introduce regression.

Now it works, fixed another bug that could give code 500 (if page title contains special symbols). Would be nice if you guys could experiment with it.
Also by default when copying to different space I do not set ancestor. I could set first available, but I thought it's better not to guess, but just put it to the root of the space. Later if someone need, he could easily move copied page wherever he wants.

If you got any suggestions/ideas I'm open and would glad to hear them :)

solution would be:

add optional key smth like "--dest-ancestor", so by default root will be used, but you can specify where to copy

Yep, alreadyd did so :)

On Friday, 17 June 2016, rfominych notifications@github.com wrote:

solution would be:

add optional key smth like "--dest-ancestor", so by default root will be
used, but you can specify where to copy


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAMprl15Sy-qgHzS5zZuiL1Nz-8DeWlKks5qMvxWgaJpZM4Hw1Pd
.


Best Regards, Grigory

now when page is copied to another space - it becomes "orphaned page" since it has no ancestor at all

second question - new parameters '--dst-parent-id' and '--dst-parent-title' are only visible in the code, not in the documentation

@rfominych, yep, nice catch! I will update readme.
I think we're finished with this ticket. If something else will arise we can always create new one.