Skip to content

Commit

Permalink
chore: Add Python Client examples [docs] (#191)
Browse files Browse the repository at this point in the history
Part of apify/apify-web/issues/3616

Added multiple examples, consulted with @vdusek. 

First three are the same as [JavaScript
ones](apify/apify-client-js#548), just
reimplemented to Python. Last one is using Python-specific Pandas
library for data analysis.
  • Loading branch information
HonzaTuron authored May 22, 2024
1 parent 17d5cf7 commit 55dd033
Show file tree
Hide file tree
Showing 2 changed files with 144 additions and 0 deletions.
140 changes: 140 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
sidebar_label: Examples
title: 'Code examples'
---

## Passing an input to the Actor

The fastest way to get results from an Actor is to pass input directly to the `call` function.
We can set up the input, pass it to `call` function and get the reference of running Actor (or wait for finish).

```python
from apify_client import ApifyClient

# Client initialization with the API token
apify_client = ApifyClient(token='MY_APIFY_TOKEN')

actor_client = apify_client.actor('apify/instagram-hashtag-scraper')

input_data = { 'hashtags': ['rainbow'], 'resultsLimit': 20 }

# Run the Actor and wait for it to finish up to 60 seconds.
# Input is not persisted for next runs.
run_data = actor_client.call(run_input=input_data, timeout_secs=60)
```

## Manipulating with tasks

To run multiple inputs with the same Actor, most convenient way is to create multiple [tasks](https://docs.apify.com/platform/actors/running/tasks) with different inputs.
Task input is persisted on Apify platform when task is created.

```python

import asyncio

from apify_client import ApifyClientAsync
from apify_client.clients.resource_clients import TaskClientAsync

animal_hashtags = ['zebra', 'lion', 'hippo']


async def run_apify_task(client: TaskClientAsync) -> dict:
result = await client.call()
return result or {}


async def main() -> None:
apify_client = ApifyClientAsync(token='MY_APIFY_TOKEN')

# Create Apify tasks

apify_tasks: list[dict] = []
apify_tasks_client = apify_client.tasks()

for hashtag in animal_hashtags:
apify_task = await apify_tasks_client.create(
name=f'hashtags-{hashtag}',
actor_id='apify/instagram-hashtag-scraper',
task_input={'hashtags': [hashtag], 'resultsLimit': 20},
memory_mbytes=1024,
)
apify_tasks.append(apify_task)

print('Tasks created:', apify_tasks)

# Create Apify task clients

apify_task_clients: list[TaskClientAsync] = []

for apify_task in apify_tasks:
task_id = apify_task['id']
apify_task_client = apify_client.task(task_id)
apify_task_clients.append(apify_task_client)

print('Task clients created:', apify_task_clients)

# Execute Apify tasks

run_apify_tasks = [run_apify_task(client) for client in apify_task_clients]
task_run_results = await asyncio.gather(*run_apify_tasks)

print('Task results:', task_run_results)


if __name__ == '__main__':
asyncio.run(main())
```

## Getting latest data from an Actor, joining datasets

Actor data are stored to [datasets](https://docs.apify.com/platform/storage/dataset). Datasets can be retrieved from Actor runs.
Dataset items can be listed with pagination.
Also, datasets can be merged together to make analysis further on with single file as dataset can be exported to various data format (CSV, JSON, XSLX, XML).
[Integrations](https://docs.apify.com/platform/integrations) can do the trick as well.

```python
from apify_client import ApifyClient

# Client initialization with the API token
apify_client = ApifyClient(token='MY_APIFY_TOKEN')

actor_client = apify_client.actor('apify/instagram-hashtag-scraper')

actor_runs = actor_client.runs()

# See pagination to understand how to get more datasets
actor_datasets = actor_runs.list(limit=20)

merging_dataset = apify_client.datasets().get_or_create(name='merge-dataset')

for dataset_item in actor_datasets.items:
# Dataset items can be handled here. Dataset items can be paginated
dataset_items = apify_client.dataset(dataset_id=dataset_item['id']).list_items(limit=1000)

# Items can be pushed to single dataset
apify_client.dataset(merging_dataset['id']).push_items(dataset_items.items)

# ...
```

## Integration with data analysis libraries (Pandas)

The Apify API client for Python can be easily integrated with data analysis libraries.
Following example demonstrates how to load items from the last dataset run and pass them to a Pandas DataFrame for further analysis.
Pandas is a data analysis library that provides data structures and functions to efficiently manipulate large datasets.

```python
from apify_client import ApifyClient
import pandas

# Initialize the Apify client
client = ApifyClient(token="MY_APIFY_TOKEN")

# Load items from last dataset run
dataset_data = client.actor('apify/web-scraper').last_run().dataset().list_items()

# Pass dataset items to Pandas DataFrame
data_frame = pandas.DataFrame(dataset_data.items)

print(data_frame.info)
```
4 changes: 4 additions & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ module.exports = {
type: 'doc',
id: 'usage-concepts',
},
{
type: 'doc',
id: 'examples',
},
{
type: 'doc',
id: 'changelog',
Expand Down

0 comments on commit 55dd033

Please sign in to comment.