Export space #1458

aleksvujic · 2024-09-26T08:17:59Z

Can you please implement "Export space" functionality available via GUI in your Python library as well? It is available in GUI by clicking on Space settings in the left sidebar and clicking on Export space in the Manage space section.

The text was updated successfully, but these errors were encountered:

gkowalc · 2024-10-01T08:38:25Z

I was considering this idea some time ago, but I couldn't find a working solution. Executing the endpoint to trigger the export is feasible. However, the issue arises after sending the POST request to initiate the export. Another form is displayed to update the status (via AJAX requests). Once the status update reaches 100%, another request is sent with the link to download the generated export file. Does anyone have any ideas on how to capture these AJAX requests that originate from the initial POST HTTP request from the site?

aleksvujic · 2024-10-01T08:45:04Z

@gkowalc This is how I solved it eventually by inspecting HTTP requests that Confluence sends. My script uses HTML as hard-coded format but I am sure it can be further parameterized if needed. This function returns a direct URL to the zipped content which can be downloaded by sending HTTP GET request. I used get_pdf_download_url_for_confluence_cloud method as an inspiration.

def __get_space_html_download_url(self, space_key: str) -> str:
    try:
        url = f"spaces/exportspacehtml.action?key={space_key}"
        response = self.confluence_client.get(url, advanced_mode=True)
        parsed_html = BeautifulSoup(response.text, "html.parser")
        atl_token = parsed_html.find("input", { "name": "atl_token" }).get("value")
        
        form_data = {
            "atl_token": atl_token,
            "exportType": "TYPE_HTML",
            "contentOption": "visibleOnly",
            "includeComments": True,
            "confirm": "Export"
        }
        # bypass self.confluence_client.post method because it serializes form data as JSON which is wrong
        url = self.confluence_client.url_joiner(url=self.confluence_client.url, path=f"spaces/doexportspace.action?key={space_key}")
        response = self.confluence_client.session.post(url, headers=self.confluence_client.form_token_headers, data=form_data)
        parsed_html = BeautifulSoup(response.text, "html.parser")
        poll_url = parsed_html.find("meta", { "name": "ajs-pollURI" }).get("content")
        
        running_task = True
        while running_task:
            progress_response = self.confluence_client.get(poll_url)
            print(progress_response)
            if progress_response['complete']:
                parsed_html = BeautifulSoup(progress_response['message'], "html.parser")
                download_url = parsed_html.find("a", { "class": "space-export-download-path" }).get("href")
                return download_url
            time.sleep(1)
        
        return None
    except Exception as e:
        print(e)
        return None

Maybe someone can tweak it further, make it more general (choice of export format and what to export) and create a PR so it becomes a part of the library. 🙂

gkowalc · 2024-10-01T08:53:55Z

"Hmm, I was setting up a session using the requests library to utilize an object from atl_token, but your approach with Beautiful Soup (BS4) looks promising. I will try to experiment with the code based on your idea, and if I come up with a working solution, I will send a pull request (PR) soon. Thx for sharing your idea.

aleksvujic · 2024-10-01T08:57:08Z

Using BeautifulSoup is not a must here, it can easily be replaced with simple regular expressions. I used it so the code is more readable.

Feel free to experiment with the snippet.

gkowalc · 2024-10-15T11:16:03Z

Hello @aleksvujic and others interested in this feature.
I added a PR #1466 that implements the method. I run my tests on Confluence Cloud (it might not work as expected in confluence server/data-center). but feel free to run your tests, also on bigger space sizes.

I think we might hit an issue when export time takes too long (CSRF token might expire), I run my tests on relatively small space exports that were done in couple of minutes.

gkowalc mentioned this issue Oct 15, 2024

[Confluence] added new method get_space_export + docs + examples #1466

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export space #1458

Export space #1458

aleksvujic commented Sep 26, 2024

gkowalc commented Oct 1, 2024

aleksvujic commented Oct 1, 2024 •

edited

Loading

gkowalc commented Oct 1, 2024

aleksvujic commented Oct 1, 2024

gkowalc commented Oct 15, 2024

Export space #1458

Export space #1458

Comments

aleksvujic commented Sep 26, 2024

gkowalc commented Oct 1, 2024

aleksvujic commented Oct 1, 2024 • edited Loading

gkowalc commented Oct 1, 2024

aleksvujic commented Oct 1, 2024

gkowalc commented Oct 15, 2024

aleksvujic commented Oct 1, 2024 •

edited

Loading