diff --git a/.rubocop.yml b/.rubocop.yml index ed0d2c1..67af5e2 100644 --- a/.rubocop.yml +++ b/.rubocop.yml @@ -12,3 +12,6 @@ Metrics/BlockLength: - Rakefile ExcludedMethods: - route + +Naming/RescuedExceptionsVariableName: + PreferredName: error diff --git a/README.md b/README.md index bcd3e9b..b8a12d3 100644 --- a/README.md +++ b/README.md @@ -6,35 +6,34 @@ This web application scrapes websites to build and deliver RSS 2.0 feeds. **Features:** -- [create your custom feeds](#how-to-build-your-rss-feeds)! -- comes with plenty of [included configs](https://github.com/html2rss/html2rss-configs) out of the box. -- handles request caching. -- sets caching related HTTP headers. +- [Create your custom feeds](#how-to-build-your-rss-feeds)! +- Comes with plenty of [included configs](https://github.com/html2rss/html2rss-configs) out of the box. +- Handles request caching. +- Sets caching-related HTTP headers. -The functionality of scraping websites and building the RSS is provided by the Ruby gem [`html2rss`](https://github.com/html2rss/html2rss). +The functionality of scraping websites and building the RSS feeds is provided by the Ruby gem [`html2rss`](https://github.com/html2rss/html2rss). ## Get started This application should be used with Docker. It is designed to require as little maintenance as possible. See [Versioning and Releases](#versioning-and-releases) and [consider automatic updates](#docker-automatically-keep-the-html2rss-web-image-up-to-date). -### with Docker +### With Docker ```sh docker run -p 3000:3000 gilcreator/html2rss-web ``` -and open in your browser and click onto the example feed link. +Then open in your browser and click the example feed link. -This is the quickest way to get started. However, it's also the one with least flexibility: it doesn't allow to use custom feed configs and doesn't update automatically. +This is the quickest way to get started. However, it's also the option with the least flexibility: it doesn't allow you to use custom feed configs and doesn't update automatically. -If you wish more flexibility and automatic updates sounds good to you, read on how to get started _with docker-compose_... +If you want more flexibility and automatic updates sound good to you, read on to get started _with docker compose_… -### with docker-compose +### With `docker compose` -Create a `docker-compose.yml` and paste the following into it: +Create a `docker-compose.yml` file and paste the following into it: -```yml -version: "3" +```yaml services: html2rss-web: image: gilcreator/html2rss-web @@ -57,61 +56,59 @@ services: command: --cleanup --interval 7200 ``` -Start it up per: `docker-compose up`. +Start it up with: `docker compose up`. -If you did not create your `feeds.yml` yet, download [this `feeds.yml` as blueprint](https://raw.githubusercontent.com/html2rss/html2rss-web/master/config/feeds.yml) into the directory containing the `docker-compose.yml`. +If you have not created your `feeds.yml` yet, download [this `feeds.yml` as a blueprint](https://raw.githubusercontent.com/html2rss/html2rss-web/master/config/feeds.yml) into the directory containing the `docker-compose.yml`. -## Docker: automatically keep the html2rss-web image up-to-date +## Docker: Automatically keep the html2rss-web image up-to-date -The [watchtower](https://containrrr.dev/watchtower/) service automatically pulls running docker images and checks for updates. If an update is available, it will automatically start the updated image with the same configuration as the running one. Please read its manual. +The [watchtower](https://containrrr.dev/watchtower/) service automatically pulls running Docker images and checks for updates. If an update is available, it will automatically start the updated image with the same configuration as the running one. Please read its manual. The `docker-compose.yml` above contains a service description for watchtower. ## How to use the included configs -html2rss-web comes with many feed configs out of the box. [See file list of all configs.](https://github.com/html2rss/html2rss-configs/tree/master/lib/html2rss/configs) +html2rss-web comes with many feed configs out of the box. [See the file list of all configs.](https://github.com/html2rss/html2rss-configs/tree/master/lib/html2rss/configs) To use a config from there, build the URL like this: -Build the URL of the _feed config_ you'd like to use like this: - | | | -| -----------------------: | :---------------------------- | -| `lib/html2rss/configs/` | `domainname.tld/whatever.yml` | -| Would becomes this URL: | | +| ------------------------ | ----------------------------- | +| `lib/html2rss/configs/` | `domainname.tld/whatever.yml` | +| Would become this URL: | | | `http://localhost:3000/` | `domainname.tld/whatever.rss` | | | `^^^^^^^^^^^^^^^^^^^^^^^^^^^` | ## How to build your RSS feeds -To build your own RSS feed, you need to create a _feed config_. -That _feed config_ goes into the file `feeds.yml`. +To build your own RSS feed, you need to create a _feed config_.\ +That _feed config_ goes into the file `feeds.yml`.\ Check out the [`example` feed config](https://github.com/html2rss/html2rss-web/blob/master/config/feeds.yml#L9). -Please refer to [html2rss' README for a description of _the feed config and its options_](https://github.com/html2rss/html2rss#the-feed-config-and-its-options). html2rss-web is just a small web application which depends on html2rss. +Please refer to [html2rss' README for a description of _the feed config and its options_](https://github.com/html2rss/html2rss#the-feed-config-and-its-options). html2rss-web is just a small web application that depends on html2rss. ## Versioning and releases This web application is distributed in a [rolling release](https://en.wikipedia.org/wiki/Rolling_release) fashion from the `master` branch. -For the latest commit passing the Github CI/CD on the master branch, an updated Docker image will be pushed to [Docker Hub: `gilcreator/html2rss-web`](https://hub.docker.com/r/gilcreator/html2rss-web). +For the latest commit passing GitHub CI/CD on the master branch, an updated Docker image will be pushed to [Docker Hub: `gilcreator/html2rss-web`](https://hub.docker.com/r/gilcreator/html2rss-web). -Github's @dependabot is enabled for dependency updates and are automatically merged to the `master` branch when the CI gives the green light. +GitHub's @dependabot is enabled for dependency updates and they are automatically merged to the `master` branch when the CI gives the green light. -If you use Docker, you should update to the latest image automatically, by [setting up _watchtower_ as described](#get-started). +If you use Docker, you should update to the latest image automatically by [setting up _watchtower_ as described](#get-started). ## Use in production -This app is published on Docker Hub and therefore easy to use with Docker. +This app is published on Docker Hub and therefore easy to use with Docker.\ The above `docker-compose.yml` is a good starting point. -If you're going to host a public instance, _please please please_: +If you're going to host a public instance, _please, please, please_: -- put the application behind a reverse proxy. -- allow outside connections only via HTTPS. -- have an auto update strategy (e.g. watchtower). -- monitor your `/health_check.txt` endpoint. -- [let world know and add your instance to the wiki](https://github.com/html2rss/html2rss-web/wiki/Instances) -- thank you! +- Put the application behind a reverse proxy. +- Allow outside connections only via HTTPS. +- Have an auto-update strategy (e.g., watchtower). +- Monitor your `/health_check.txt` endpoint. +- [Let the world know and add your instance to the wiki](https://github.com/html2rss/html2rss-web/wiki/Instances) -- thank you! ### Supported ENV variables @@ -122,34 +119,34 @@ If you're going to host a public instance, _please please please_: | `RACK_TIMEOUT_SERVICE_TIMEOUT` | default: 15 | | `WEB_CONCURRENCY` | default: 2 | | `WEB_MAX_THREADS` | default: 5 | -| `HEALTH_CHECK_USERNAME` | default: auto generated on start | -| `HEALTH_CHECK_PASSWORD` | default: auto generated on start | +| `HEALTH_CHECK_USERNAME` | default: auto-generated on start | +| `HEALTH_CHECK_PASSWORD` | default: auto-generated on start | ### Runtime monitoring via `GET /health_check.txt` -It is recommended to setup a monitoring of the `/health_check.txt` endpoint. With that, you can find out when one of _your own_ configs break. The endpoint uses HTTP Basic authentication. +It is recommended to set up monitoring of the `/health_check.txt` endpoint. With that, you can find out when one of _your own_ configs breaks. The endpoint uses HTTP Basic authentication. -First, set username and password via these environment variables: `HEALTH_CHECK_USERNAME` and `HEALTH_CHECK_PASSWORD`. If these are not set, html2rss-web will generate a new random username and password on _each_ start. +First, set the username and password via these environment variables: `HEALTH_CHECK_USERNAME` and `HEALTH_CHECK_PASSWORD`. If these are not set, html2rss-web will generate a new random username and password on _each_ start. -An authenticated `GET /health_check.txt` request will be responded with: +An authenticated `GET /health_check.txt` request will respond with: -- if the feeds are generatable: `success`. -- otherwise: the names of the broken configs. +- If the feeds are generatable: `success`. +- Otherwise: the names of the broken configs. -To get notified when one of your configs breaks, setup a monitoring of this endpoint. +To get notified when one of your configs breaks, set up monitoring of this endpoint. -[UptimeRobot's free plan](https://uptimerobot.com/) is sufficent for basic monitoring (every 5 minutes). +[UptimeRobot's free plan](https://uptimerobot.com/) is sufficient for basic monitoring (every 5 minutes).\ Create a monitor of type _Keyword_ with this information and make it aware of your username and password: -![A screenshot showing the Keyword Monitor: a name, the instance's URL to /health_check.txt and an interval.](docs/uptimerobot_monitor.jpg) +![A screenshot showing the Keyword Monitor: a name, the instance's URL to /health_check.txt, and an interval.](docs/uptimerobot_monitor.jpg) ## Setup for development -Checkout the git repository and ... +Check out the git repository and… -### using Docker +### Using Docker -This approach allows you to play around without installing Ruby on your machine. +This approach allows you to experiment without installing Ruby on your machine. All you need to do is install and run Docker. ```sh @@ -163,10 +160,10 @@ docker run \ --name html2rss-web-dev \ html2rss-web -# Open a interactive TTY with the shell `sh`: +# Open an interactive TTY with the shell `sh`: docker exec -ti html2rss-web-dev sh -# Stop and cleanup container +# Stop and clean up the container docker stop html2rss-web-dev docker rm html2rss-web-dev @@ -174,7 +171,7 @@ docker rm html2rss-web-dev docker rmi html2rss-web ``` -### using installed Ruby +### Using installed Ruby If you're comfortable with installing Ruby directly on your machine, follow these instructions: @@ -183,12 +180,12 @@ If you're comfortable with installing Ruby directly on your machine, follow thes 3. `bundle` 4. `foreman start` -_html2rss-web_ now listens on port **5**000 for requests. +_html2rss-web_ now listens on port **5000** for requests. ## Contribute Contributions are welcome! -Open a pull request with your changes, -open an issue, or +Open a pull request with your changes,\ +open an issue, or\ [join discussions on html2rss](https://github.com/orgs/html2rss/discussions). diff --git a/app.rb b/app.rb index c1c2b61..df6ca52 100644 --- a/app.rb +++ b/app.rb @@ -45,23 +45,50 @@ class App < Roda 'X-XSS-Protection' => '1; mode=block' plugin :error_handler do |error| + handle_error(error) + end + + plugin :public + plugin :render, escape: true, layout: 'layout' + plugin :typecast_params + plugin :basic_auth + + route do |r| + path = RequestPath.new(request) + + r.root { view 'index' } + + r.public + + r.get 'health_check.txt' do + handle_health_check + end + + r.on String, String do |folder_name, config_name_with_ext| + handle_html2rss_configs(path.full_config_name, folder_name, config_name_with_ext) + end + + r.on String do |config_name_with_ext| + handle_local_config_feeds(path.full_config_name, config_name_with_ext) + end + end + + private + + def handle_error(error) # rubocop:disable Metrics/MethodLength case error when Html2rss::Config::ParamsMissing, Roda::RodaPlugins::TypecastParams::Error - @page_title = 'Parameters missing or invalid' - response.status = 422 + set_error_response('Parameters missing or invalid', 422) when Html2rss::AttributePostProcessors::UnknownPostProcessorName, Html2rss::ItemExtractors::UnknownExtractorName, Html2rss::Config::ChannelMissing - @page_title = 'Invalid feed config' - response.status = 422 + set_error_response('Invalid feed config', 422) when ::App::LocalConfig::NotFound, Html2rss::Configs::ConfigNotFound - @page_title = 'Feed config not found' - response.status = 404 + set_error_response('Feed config not found', 404) else - @page_title = 'Internal Server Error' - response.status = 500 + set_error_response('Internal Server Error', 500) end @show_backtrace = ENV.fetch('RACK_ENV', nil) == 'development' @@ -69,46 +96,32 @@ class App < Roda view 'error' end - plugin :public - plugin :render, escape: true, layout: 'layout' - plugin :typecast_params - plugin :basic_auth - - route do |r| - path = RequestPath.new(request) - - r.root do - view 'index' - end - - r.public + def set_error_response(page_title, status) + @page_title = page_title + response.status = status + end - r.get 'health_check.txt' do |_| - HttpCache.expires_now(response) + def handle_health_check + HttpCache.expires_now(response) - with_basic_auth(realm: HealthCheck, - username: HealthCheck::Auth.username, - password: HealthCheck::Auth.password) do - HealthCheck.run - end + with_basic_auth(realm: HealthCheck, + username: HealthCheck::Auth.username, + password: HealthCheck::Auth.password) do + HealthCheck.run end + end - # Route for feeds from the local feeds.yml - r.get String do |_config_name_with_ext| - Html2rssFacade.from_local_config(path.full_config_name, typecast_params) do |config| - response['Content-Type'] = 'text/xml' - - HttpCache.expires(response, config.ttl * 60, cache_control: 'public') - end + def handle_local_config_feeds(full_config_name, _config_name_with_ext) + Html2rssFacade.from_local_config(full_config_name, typecast_params) do |config| + response['Content-Type'] = 'text/xml' + HttpCache.expires(response, config.ttl * 60, cache_control: 'public') end + end - # Route for feeds from html2rss-configs - r.get String, String do |_folder_name, _config_name_with_ext| - Html2rssFacade.from_config(path.full_config_name, typecast_params) do |config| - response['Content-Type'] = 'text/xml' - - HttpCache.expires(response, config.ttl * 60, cache_control: 'public') - end + def handle_html2rss_configs(full_config_name, _folder_name, _config_name_with_ext) + Html2rssFacade.from_config(full_config_name, typecast_params) do |config| + response['Content-Type'] = 'text/xml' + HttpCache.expires(response, config.ttl * 60, cache_control: 'public') end end end diff --git a/app/health_check.rb b/app/health_check.rb index f0c5609..3ffc276 100644 --- a/app/health_check.rb +++ b/app/health_check.rb @@ -1,8 +1,8 @@ # frozen_string_literal: true require 'parallel' - require_relative 'local_config' +require 'securerandom' module App ## @@ -11,18 +11,22 @@ module HealthCheck ## # Contains logic to obtain username and password to be used with HealthCheck endpoint. class Auth - def self.username - @username ||= ENV.delete('HEALTH_CHECK_USERNAME') do - SecureRandom.base64(32).tap do |string| - puts "HEALTH_CHECK_USERNAME env var. missing! Please set it. Using generated value instead: #{string}" - end + class << self + def username + @username ||= fetch_credential('HEALTH_CHECK_USERNAME') end - end - def self.password - @password ||= ENV.delete('HEALTH_CHECK_PASSWORD') do - SecureRandom.base64(32).tap do |string| - puts "HEALTH_CHECK_PASSWORD env var. missing! Please set it. Using generated value instead: #{string}" + def password + @password ||= fetch_credential('HEALTH_CHECK_PASSWORD') + end + + private + + def fetch_credential(env_var) + ENV.delete(env_var) do + SecureRandom.base64(32).tap do |string| + warn "ENV var. #{env_var} missing! Using generated value instead: #{string}" + end end end end @@ -34,12 +38,7 @@ def self.password # @return [String] "success" when all checks passed. def run broken_feeds = errors - - if broken_feeds.any? - broken_feeds.join("\n") - else - 'success' - end + broken_feeds.any? ? broken_feeds.join("\n") : 'success' end ## @@ -48,10 +47,16 @@ def errors [].tap do |errors| Parallel.each(LocalConfig.feed_names, in_threads: 4) do |feed_name| Html2rss.feed_from_yaml_config(LocalConfig::CONFIG_FILE, feed_name.to_s).to_s - rescue StandardError => e - errors << "[#{feed_name}] #{e.class}: #{e.message}" + rescue StandardError => error + errors << "[#{feed_name}] #{error.class}: #{error.message}" end end end + + def format_error(feed_name, error) + "[#{feed_name}] #{error.class}: #{error.message}" + end + + private_class_method :format_error end end diff --git a/app/html2rss_facade.rb b/app/html2rss_facade.rb index 12fd4b2..da4164b 100644 --- a/app/html2rss_facade.rb +++ b/app/html2rss_facade.rb @@ -15,7 +15,7 @@ class Html2rssFacade ## # @param feed_config [Hash] - # @param typecast_params + # @param typecast_params [Object] def initialize(feed_config, typecast_params) @feed_config = feed_config @typecast_params = typecast_params @@ -23,21 +23,19 @@ def initialize(feed_config, typecast_params) ## # @param name [String] the name of a html2rss-configs provided config. - # @param typecast_params - # @return [String] the serializied RSS feed + # @param typecast_params [Object] + # @return [String] the serialized RSS feed def self.from_config(name, typecast_params, &) feed_config = Html2rss::Configs.find_by_name(name) - new(feed_config, typecast_params).feed(&) end ## # @param name [String] the name of a feed in the file `config/feeds.yml` - # @param typecast_params - # @return [String] the serializied RSS feed + # @param typecast_params [Object] + # @return [String] the serialized RSS feed def self.from_local_config(name, typecast_params, &) - feed_config = LocalConfig.find name - + feed_config = LocalConfig.find(name) new(feed_config, typecast_params).feed(&) end @@ -45,19 +43,19 @@ def self.from_local_config(name, typecast_params, &) # @return [String] def feed config = self.class.feed_config_to_config(feed_config, typecast_params) - yield config if block_given? - Html2rss.feed(config).to_s end ## + # @param feed_config [Hash] + # @param typecast_params [Object] + # @param global_config [Hash] # @return [Html2rss::Config] # @raise [Roda::RodaPlugins::TypecastParams::Error] def self.feed_config_to_config(feed_config, typecast_params, global_config: LocalConfig.global) dynamic_params = Html2rss::Config::Channel.required_params_for_config(feed_config[:channel]) .to_h { |name| [name, typecast_params.str!(name)] } - Html2rss::Config.new(feed_config, global_config, dynamic_params) end end diff --git a/app/http_cache.rb b/app/http_cache.rb index cafa775..ceefc6f 100644 --- a/app/http_cache.rb +++ b/app/http_cache.rb @@ -10,23 +10,21 @@ module HttpCache ## # Sets Expires and Cache-Control headers to cache for `seconds`. - # @param response [#[]] + # @param response [Hash] # @param seconds [Integer] - # @param cache_control [String] + # @param cache_control [String, nil] def expires(response, seconds, cache_control: nil) response['Expires'] = (Time.now + seconds).httpdate - response['Cache-Control'] = if cache_control - "max-age=#{seconds},#{cache_control}" - else - "max-age=#{seconds}" - end + cache_value = "max-age=#{seconds}" + cache_value += ",#{cache_control}" if cache_control + response['Cache-Control'] = cache_value end ## # Sets Expires and Cache-Control headers to invalidate existing cache and # prevent caching. - # @param response [#[]] + # @param response [Hash] def expires_now(response) response['Expires'] = '0' response['Cache-Control'] = 'private,max-age=0,no-cache,no-store,must-revalidate' diff --git a/app/local_config.rb b/app/local_config.rb index 0296460..ac71c11 100644 --- a/app/local_config.rb +++ b/app/local_config.rb @@ -1,6 +1,7 @@ # frozen_string_literal: true require 'yaml' + module App ## # Provides helper methods to deal with the local config file at `CONFIG_FILE`. @@ -15,19 +16,19 @@ class NotFound < RuntimeError; end ## # @param name [String, Symbol, #to_sym] - # @return [Hash] + # @return [Hash] def find(name) - feeds&.fetch(name.to_sym, false) || raise(NotFound, "Did not find local feed config at '#{name}'") + feeds.fetch(name.to_sym) { raise NotFound, "Did not find local feed config at '#{name}'" } end ## - # @return [Hash] + # @return [Hash] def feeds - yaml[:feeds] || {} + yaml.fetch(:feeds, {}) end ## - # @return [Hash] + # @return [Hash] def global yaml.reject { |key| key == :feeds } end @@ -39,9 +40,11 @@ def feed_names end ## - # @return [Hash] + # @return [Hash] def yaml - YAML.safe_load(File.open(CONFIG_FILE), symbolize_names: true).freeze + YAML.safe_load_file(CONFIG_FILE, symbolize_names: true).freeze + rescue Errno::ENOENT => error + raise NotFound, "Configuration file not found: #{error.message}" end end end diff --git a/app/request_path.rb b/app/request_path.rb index 6d75047..db6fcf0 100644 --- a/app/request_path.rb +++ b/app/request_path.rb @@ -10,11 +10,11 @@ class RequestPath # @param request [Rack::Request, #path] def initialize(request) @full_path = request.path[1..] + parts = @full_path.split('/') - if @full_path.count('/').zero? + if parts.size == 1 @name_with_ext = @full_path else - parts = @full_path.split('/') @folder_name = parts[0..-2] @name_with_ext = parts[-1] end @@ -23,13 +23,13 @@ def initialize(request) ## # @return [String] def full_config_name - [folder_name, config_name].compact.join('/') + [@folder_name, config_name].compact.join('/') end ## # @return [String] def config_name - parts[...-1].join('.') + parts[..-2].join('.') end ##