Skip to content

Commit

Permalink
Merge branch 'main' into fix/differentiate-shorts-lives-normal-videos
Browse files Browse the repository at this point in the history
  • Loading branch information
arjitcodes authored Nov 14, 2024
2 parents bcc201b + 234e6fa commit c012d97
Show file tree
Hide file tree
Showing 17 changed files with 158 additions and 159 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/Tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ jobs:
env:
YOUTUBE_API_KEY: ${{ secrets.YOUTUBE_API_KEY }}
OPTIMIZATION_CACHE_URL: ${{ secrets.OPTIMIZATION_CACHE_URL }}
run: docker run -v $PWD/output:/output youtube2zim youtube2zim --api-key "$YOUTUBE_API_KEY" --optimization-cache "$OPTIMIZATION_CACHE_URL" --type channel --id "UC8elThf5TGMpQfQc_VE917Q" --name "tests_en_openzim-testing" --zim-file "openZIM_testing.zim" --tags "tEsTing,x-mark:yes"
run: docker run -v $PWD/output:/output youtube2zim youtube2zim --api-key "$YOUTUBE_API_KEY" --optimization-cache "$OPTIMIZATION_CACHE_URL" --id "UC8elThf5TGMpQfQc_VE917Q" --name "tests_en_openzim-testing" --zim-file "openZIM_testing.zim" --tags "tEsTing,x-mark:yes"

- name: Run integration test suite
run: docker run -v $PWD/scraper/tests-integration/integration.py:/src/scraper/tests-integration/integration.py -v $PWD/output:/output youtube2zim bash -c "pip install pytest; pytest -v /src/scraper/tests-integration/integration.py"
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]


### Fixed

- Diffrentiate user uploaded shorts, lives, & long videos (#367)
- corrected the short video resolution in the UI (#366)

### Fixed

- Check for empty playlists after filtering, and after downloading videos (#375)

## [3.2.1] - 2024-11-01

### Deprecated

- `--type` CLI argument is now deprecated (will be removed in next major)

### Changed

- Raise exception if there are no videos in the playlists (#347)
- Drop `--type` CLI argument and guess `--id` type (#361)
- Always reencode using our presets (even for high quality) and choose best format when downloading from Youtube (#356)

### Fixed

- Filter-out non-public videos and properly cleanup unsuccessful videos (#362)
- Use proper ZIM metadata key for `Scraper` and `Tags` (#369)
- Add missing `playsinline` attribute for Video.JS on iOS (#368)

## [3.2.0] - 2024-10-11

Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ docker build -t local-youtube2zim .
Scrape a channel (here we use the [openZIM_testing](https://www.youtube.com/channel/UC8elThf5TGMpQfQc_VE917Q) channel, but you could use any other one of interest for your UI developments).

```
docker run --rm -it -v "$PWD/output":/output local-youtube2zim youtube2zim --api-key <YOUR-API-KEY> --type channel --id "UC8elThf5TGMpQfQc_VE917Q" --name "openZIM_testing" --zim-file "openZIM_testing"
docker run --rm -it -v "$PWD/output":/output local-youtube2zim youtube2zim --api-key <YOUR-API-KEY> --id "UC8elThf5TGMpQfQc_VE917Q" --name "openZIM_testing" --zim-file "openZIM_testing"
```

Extract interesting ZIM content and move it to `public` folder.
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Youtube2zim
=============

[![CodeFactor](https://www.codefactor.io/repository/github/openzim/youtube/badge)](https://www.codefactor.io/repository/github/openzim/youtube)
[![Docker](https://ghcr-badge.deta.dev/openzim/youtube/latest_tag?label=docker)](https://ghcr.io/openzim/youtube)
[![Docker](https://ghcr-badge.egpl.dev/openzim/youtube/latest_tag?label=docker)](https://ghcr.io/openzim/youtube)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version shields.io](https://img.shields.io/pypi/v/youtube2zim.svg)](https://pypi.org/project/youtube2zim/)

Expand Down Expand Up @@ -78,18 +78,18 @@ To get an API Key:
You can then create a ZIM from a singe channel / user / handle like `Vsauce`:

```bash
youtube2zim --api-key "<your-api-key>" --type channel --id "Vsauce" --name "tests_hi_avanti"
youtube2zim --api-key "<your-api-key>" --id "Vsauce" --name "tests_hi_avanti"
```

When `--type channel` is used, you must pass one single value in `--id` and it can be the channel, user or playlist, or even the corresponding technical ID (see [FAQ/FEE](https://github.com/openzim/youtube/wiki/FAQ---FEE) for more details).
When scraping a channel, you must pass one single value in `--id` and it can be the handle, user, or even the corresponding technical ID (see [FAQ/FEE](https://github.com/openzim/youtube/wiki/FAQ---FEE) for more details).

Or you can create a ZIM from two playlists like `PL3rEvTTL-Jm8cBdskZoQaDTlDT4t7F6kp` and `PL3rEvTTL-Jm_OuyYpMfxtJW3Mcr9fFS2Z`:

```bash
youtube2zim --api-key "<your-api-key>" --type playlist --id "PL3rEvTTL-Jm8cBdskZoQaDTlDT4t7F6kp,PL3rEvTTL-Jm_OuyYpMfxtJW3Mcr9fFS2Z" --name "tests_hi_avanti"
youtube2zim --api-key "<your-api-key>" --id "PL3rEvTTL-Jm8cBdskZoQaDTlDT4t7F6kp,PL3rEvTTL-Jm_OuyYpMfxtJW3Mcr9fFS2Z" --name "tests_hi_avanti"
```

When `--type playlist` is used, you can pass multiple playlist IDs separated by a comma in `--id`.
When scraping playlists, you can pass multiple playlist IDs separated by a comma in `--id`.

For more details / advanced usage, see the [Manual](https://github.com/openzim/youtube/wiki/Manual).

Expand All @@ -110,7 +110,7 @@ This script is a wrapper around `youtube2zim` and is bundled with the main packa
Sample usage:

```
youtube2zim-playlists --indiv-playlists --api-key XXX --type channel --id Vsauce --playlists-name="vsauce_en_playlist-{playlist_id}"
youtube2zim-playlists --indiv-playlists --api-key XXX --id Vsauce --playlists-name="vsauce_en_playlist-{playlist_id}"
```

Those are the required arguments for `youtube2zim-playlists` but **you can also pass any regular `youtube2zim` argument**. Those will be forwarded to `youtube2zim` (which will be run independently for each playlist).
Expand Down
2 changes: 1 addition & 1 deletion scraper/src/youtube2zim/__about__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "3.2.1-dev0"
__version__ = "3.2.2-dev0"
4 changes: 0 additions & 4 deletions scraper/src/youtube2zim/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,6 @@

SCRAPER = f"{NAME} {__version__}"

CHANNEL = "channel"
PLAYLIST = "playlist"
USER = "user"

# Youtube uses some non-standard language codes
YOUTUBE_LANG_MAP = {
"iw": "he", # Hebrew
Expand Down
16 changes: 11 additions & 5 deletions scraper/src/youtube2zim/entrypoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import os
import sys

from youtube2zim.constants import CHANNEL, NAME, PLAYLIST, SCRAPER, USER, logger
from youtube2zim.constants import NAME, SCRAPER, logger
from youtube2zim.scraper import Youtube2Zim


Expand All @@ -16,12 +16,12 @@ def main():
description="Scraper to create a ZIM file from a Youtube Channel or Playlists",
)

# Not used anymore, kept for backward compability till next major release
# Also remove trick lines 211-217 to not handle this anymore
parser.add_argument(
"--type",
help="Type of collection",
choices=[CHANNEL, PLAYLIST, USER],
required=True,
dest="collection_type",
dest="not_used_anymore",
)
parser.add_argument(
"--id", help="Youtube ID of the collection", required=True, dest="youtube_id"
Expand Down Expand Up @@ -208,7 +208,13 @@ def main():
try:
if args.max_concurrency < 1:
raise ValueError(f"Invalid concurrency value: {args.max_concurrency}")
scraper = Youtube2Zim(**dict(args._get_kwargs()))
scraper = Youtube2Zim(
**{
key: value
for key, value in dict(args._get_kwargs()).items()
if key != "not_used_anymore"
}
)
return scraper.run()
except Exception as exc:
logger.error(f"FAILED. An error occurred: {exc}")
Expand Down
8 changes: 4 additions & 4 deletions scraper/src/youtube2zim/playlists/entrypoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import logging
import sys

from youtube2zim.constants import CHANNEL, NAME, PLAYLIST, SCRAPER, USER, logger
from youtube2zim.constants import NAME, SCRAPER, logger
from youtube2zim.utils import has_argument


Expand All @@ -19,13 +19,13 @@ def main():
"{creator_id}, {creator_name}.",
)

# Not used anymore, kept for backward compability till next major release
parser.add_argument(
"--type",
help="Type of collection",
choices=[CHANNEL, PLAYLIST, USER],
required=True,
dest="collection_type",
dest="not_used_anymore",
)

parser.add_argument(
"--id", help="Youtube ID of the collection", required=True, dest="youtube_id"
)
Expand Down
15 changes: 4 additions & 11 deletions scraper/src/youtube2zim/playlists/scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
import requests
from zimscraperlib.logging import nicer_args_join

from youtube2zim.constants import NAME, PLAYLIST, YOUTUBE, logger
from youtube2zim.constants import NAME, YOUTUBE, logger
from youtube2zim.youtube import (
REQUEST_TIMEOUT,
credentials_ok,
Expand All @@ -40,7 +40,6 @@ def __init__(
self.debug = options["debug"]
self.disable_metadata_checks = options["disable_metadata_checks"]
self.playlists_mode = options["playlists_mode"]
self.collection_type = options["collection_type"]
self.youtube_id = options["youtube_id"]

self.extra_args = extra_args
Expand Down Expand Up @@ -76,10 +75,7 @@ def run(self):
shutil.rmtree(self.build_dir, ignore_errors=True) # not needed
return self.handle_single_zim()

logger.info(
f"starting all-playlits {NAME} scraper "
f"for {self.collection_type}#{self.youtube_id}"
)
logger.info(f"starting all-playlists {NAME} scraper for {self.youtube_id}")

# create required sub folders
for sub_folder in ("cache", "videos", "channels"):
Expand All @@ -96,7 +92,8 @@ def run(self):
playlists,
main_channel_id,
uploads_playlist_id,
) = extract_playlists_details_from(self.collection_type, self.youtube_id)
is_playlist,
) = extract_playlists_details_from(self.youtube_id)

logger.info(
".. {} playlists:\n {}".format(
Expand Down Expand Up @@ -128,8 +125,6 @@ def run_playlist_zim(self, playlist):
playlist_id = playlist.playlist_id
args = [
*self.youtube2zim_exe,
"--type",
PLAYLIST,
"--id",
playlist_id,
"--api-key",
Expand Down Expand Up @@ -180,8 +175,6 @@ def handle_single_zim(self):

args = [
*self.youtube2zim_exe,
"--type",
self.collection_type,
"--id",
self.youtube_id,
"--api-key",
Expand Down
8 changes: 2 additions & 6 deletions scraper/src/youtube2zim/processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,11 @@ def process_thumbnail(thumbnail_path, preset):
return True


def post_process_video(video_dir, video_id, preset, video_format, low_quality):
def post_process_video(video_dir, video_id, preset, video_format):
"""apply custom post-processing to downloaded video
- resize thumbnail
- recompress video if incorrect video_format or low_quality requested"""
- recompress video"""

# find downloaded video from video_dir
files = [
Expand All @@ -52,10 +52,6 @@ def post_process_video(video_dir, video_id, preset, video_format, low_quality):
)
src_path = files[0]

# don't reencode if not requesting low-quality and received wanted format
if not low_quality and src_path.suffix[1:] == video_format:
return

dst_path = src_path.with_name(f"video.{video_format}")
logger.info(f"Reencode video to {dst_path}")
success, process = reencode(
Expand Down
1 change: 0 additions & 1 deletion scraper/src/youtube2zim/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,6 @@ class Channel(CamelModel):
profile_path: str | None = None
banner_path: str | None = None
joined_date: str
collection_type: str
main_playlist: str | None = None
user_long_uploads_playlist: str | None = None
user_short_uploads_playlist: str | None = None
Expand Down
Loading

0 comments on commit c012d97

Please sign in to comment.