Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature: Ignore lists #62

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions doc/mpdscribble.1
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,56 @@ Your Last.fm password, either cleartext or its MD5 sum.
The file where mpdscribble should store its journal in case you do not
have a connection to the scrobbler. This option used to be called
"cache". It is optional.
.TP
.B ignore = FILE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need an indirection with another file? This complicates the thing both for the code and for users.
Also this file format looks rather arbitrary and cannot be extended or changed later. One has to remember what those columns are.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a separate file for the following reasons:

  • Avoid complicating the config file parser itself.
  • Managing the lists is easier when they are separate files. Users may want to interact with these lists through other utilities, for example a shell script that gets the current song playing from mpd and writes the details into a list.
  • Makes sharing the lists across multiple scrobblers simply work.

I came up with the file format to make the parser implementation trivial / easily verifiable. I wanted to avoid having weird audio metadata end up breaking the parser. I considered two options: Very simple easy to parse format and write a small parser myself, or use a standardized machine readable format and implement it using a library. I'm happy to discuss alternatives.
The format also allows easy construction of list entries from audio metadata. In most cases, a simple mpc current -f "%artist%;%album%;%title%" >> ignorelist should just work. For the rare cases the metadata itself contains a semicolon, the escaping rules are simple to apply via regex substitution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Max

I understand your concern about the inflexibility of the file format and to address this I would propose a "ignore_file_tags" (naming is not final) option that is global for all scrobblers.
This option would be set to a space delimited string of tag names, which will then correspond to the columns of the ignore file. The current behavior would be represented by the option ignore_file_tags = artist album title.

At the moment, the Record structure only exposes artist, album, title and track tags, so the flexibility is quite limited. However, making the format configurable allows us to add more tags in the future without breaking users existing configurations while also allowing users to only match the tags they care about in their ignore file, avoiding useless extra columns.

Please let me know if you have concerns with this approach. I will implement this after your green light.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea sounds like it will grow into a complexity monster. And you're going to split the specification between the configuration file and the ignore file - the ignore file will no longer be self-contained, it depends on external data (the tag list specified in the configuration file).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback.
I think based on this we could opt for one of the following options:

  • Make the first line of the ignore file a mandatory format specification. E.g. artist;album;title as first line would set the format to what is hard coded now. I would avoid an optional format specification line as this would then require some marker to distinguish the specification line from an ignore line. Such a marker in turn requires an escaping mechanism to avoid clashing with metadata.
  • Drop the configurable columns altogether. New tags can always be supported with additional columns at the end. This would not break existing configurations, as trailing semicolons are already not required. The current parser treats Queen, Queen; and Queen;; identically. Downside is potentially worse user experience if at some point many more tags are supported. Users may have to deal with an annoying number of empty columns in their ignore files.
  • Scratch the custom format and parser, move to something standardized. I do like the minimal format we have at the moment, I think it's easy to deal with and doesn't pull in any dependencies. But if you prefer to use a standardized format, I can implement that (but will require a new dependency).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do something like:

ignore artist="Queen" album="A Night at the Opera"
ignore artist="Queen" title="Bohemian Rhapsody"

... right in the main configuration file. Simpler and super flexible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the dedicated file is useful for:

  • Automated management by tools/scripts (see my first comment).
  • Making it easy to use an ignore list on a subset of scrobblers, without duplicating the lists themselves.

The reason I did not use the format you proposed in the ignore file is simply parser complexity, having to handle quoting properly. But that is obviously not a blocker.
Considering the aforementioned benefits of the dedicated file, would it be an acceptable compromise to keep the dedicated ignore file, but use a tag="value" format as you suggested?

With the cost of additional complexity, we could also support both methods (in-place in the main config and dedicated ignore files). I'm not positive that that's worth it, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoting is difficult, of course - but your CSV-like syntax doesn't solve this - look at how you had to implement backslash-quoting. No difference here.

I think it's okay to have this external file. I don't quite like the idea of having two files, but it's a situation where all possible solutions are bad, and maybe you're right and it's the least-bad one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. I will implement the tag="value" format for the ignore files then.
Thanks for taking the time to discuss this, it is appreciated.

Include an ignore file for this scrobbler to exclude tracks from scrobbling.

.SH IGNORE FILE FORMAT
Tracks can be ignored by listing them in an \fBignore file\fP.
Each line in the file specifies a pattern to match tracks you wish to exclude from scrobbling.
The format is simple and flexible, allowing you to match by artist, album, title, track number or a combination of these fields.
.SS File Format
A tag match is specified as tagname="value", where tagname is replaced with one of the supported tags (artist, album, title, track).
Values \fBmust\fP be quoted and spaces are not allowed surrounding the equal sign.

Each line consists of one or more tag matches separated by spaces:
.nf
.in +4
tag1="value1" tag2="value2" ...
.in
.fi

Tags can appear in any order and blank lines are ignored.

A backslash (\e) is interpreted as escape character and may be used to escape literal double quotes within a value.
To write a literal backslash, use a double backslash (\e\e).

Each line is limited to 4096 characters including the newline character.
Superflous characters are silently ignored.

.SS Examples
If a tag is omitted, any value for that field is matched:
.TP
.B title="Bohemian Rhapsody"
Matches any track titled \fIBohemian Rhapsody\fP.
.TP
.B artist="Queen"
Matches any track by \fIQueen\fP, regardless of album or title.
.TP
.B artist="Queen" album="A Night at the Opera"
Matches any track by \fIQueen\fP from \fIA Night at the Opera\fP, regardless of title.
.TP
.B artist="Queen" album="A Night at the Opera" title="Bohemian Rhapsody"
Matches a specific track by \fIQueen\fP, from \fIA Night at the Opera\fP, titled \fIBohemian Rhapsody\fP.
.TP
.B artist="Queen" album="A Night at the Opera" track="01"
Matches the first track on the album \fIA Night at the Opera\fP by \fIQueen\fP.
Note that track tags are interpreted as text and not numbers, meaning "01" is not the same as "1".
.TP
.B artist="Clark \e"Plazmataz\e" Powell"
Matches tracks by \fIClark "Plazmataz" Powell\fP.


.SH FILES
.I /etc/mpdscribble.conf
.RS
Expand Down
2 changes: 2 additions & 0 deletions doc/mpdscribble.conf
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ password =
# The file where mpdscribble should store its Last.fm journal in case
# you do not have a connection to the Last.fm server.
journal = /var/cache/mpdscribble/lastfm.journal
# Optional ignore file, see manpage for details!
#ignore = /etc/mpdscribble_lastfm.ignore

#[libre.fm]
#url = http://turtle.libre.fm/
Expand Down
1 change: 1 addition & 0 deletions meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ executable(
'src/MpdObserver.cxx',
'src/Log.cxx',
'src/XdgBaseDirectory.cxx',
'src/IgnoreList.cxx',

include_directories: inc,
dependencies: [
Expand Down
6 changes: 6 additions & 0 deletions src/Config.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#include <forward_list>
#include <string>
#include <map>

enum file_location { file_etc, file_home, file_unknown, };

Expand All @@ -17,7 +18,10 @@ NullableString(const std::string &s) noexcept
return s.empty() ? nullptr : s.c_str();
}


struct Config {
using IgnoreListMap = std::map<std::string, IgnoreList>;

/** don't daemonize the mpdscribble process */
bool no_daemon = false;

Expand All @@ -41,6 +45,8 @@ struct Config {
int verbose = -1;
enum file_location loc = file_unknown;

// Key=file path, value=loaded ignore list
IgnoreListMap ignore_lists;
std::forward_list<ScrobblerConfig> scrobblers;
};

Expand Down
41 changes: 41 additions & 0 deletions src/IgnoreList.cxx
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// SPDX-License-Identifier: GPL-2.0-or-later
// Copyright The Music Player Daemon Project

#include "IgnoreList.hxx"

#include <cassert>
#include <algorithm>

[[gnu::pure]]
static constexpr bool
MatchIgnoreIfSpecified(std::string_view ignore, std::string_view value)
{
return ignore.empty() || ignore == value;
}

bool
IgnoreListEntry::matches_record(const Record& record) const noexcept
{
/*
The below logic would always return true if the entry is empty.
This condition should never be true, as we don't push empty entries.
*/
assert(!artist.empty() || !album.empty() || !title.empty());

/*
Note the mismatch of 'title' and 'track' field names with the Record structure.
This is not a bug - the Record structure does not use the expected field names.
*/
return MatchIgnoreIfSpecified(artist, record.artist) &&
MatchIgnoreIfSpecified(album, record.album) &&
MatchIgnoreIfSpecified(title, record.track) &&
MatchIgnoreIfSpecified(track, record.number);
}

bool
IgnoreList::matches_record(const Record& record) const noexcept
{
return std::any_of(entries.begin(),
entries.end(),
[&record](const auto& entry) { return entry.matches_record(record); });
}
29 changes: 29 additions & 0 deletions src/IgnoreList.hxx
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
// SPDX-License-Identifier: GPL-2.0-or-later
// Copyright The Music Player Daemon Project

#ifndef IGNORE_LIST_HXX
#define IGNORE_LIST_HXX

#include <string>
#include <vector>

#include "Record.hxx"

struct IgnoreListEntry {

std::string artist;
std::string album;
std::string title;
std::string track;

[[nodiscard]] bool matches_record(const Record& record) const noexcept;
};

struct IgnoreList {
std::vector<IgnoreListEntry> entries;

[[nodiscard]] bool matches_record(const Record& record) const noexcept;
};


#endif
158 changes: 156 additions & 2 deletions src/ReadConfig.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

#include <stdlib.h>
#include <string.h>
#include <unordered_map>

#ifndef _WIN32
#include <sys/stat.h>
Expand Down Expand Up @@ -231,10 +232,151 @@ load_unsigned(const IniFile &file, const char *name, unsigned *value_r)
return true;
}

static std::unordered_map<std::string, std::string>
parse_ignore_list_line(std::string_view input)
{
IgnoreListEntry ignore_list_entry;

/*
Format: tag1="value1" tag2="value2" ...
Backslash escaping is supported.
*/

enum class ParserState {
ExpectTagStart,
InTag,
ExpectQuote,
InValue,
InEscapeSequence
} state = ParserState::ExpectTagStart;

std::string current_tag;
std::string current_value;
std::unordered_map<std::string, std::string> result;

for (size_t i = 0; i < input.length(); ++i) {
char c = input[i];

switch (state) {
case ParserState::ExpectTagStart:
if (std::isspace(c)) continue;
if (std::isalpha(c)) {
current_tag = c;
state = ParserState::InTag;
} else {
throw FormatRuntimeError("Error at position %d: expected tag start, got: '%c'", i, c);
}
break;

case ParserState::InTag:
if (std::isalpha(c)) {
current_tag += c;
} else if (c == '=') {
state = ParserState::ExpectQuote;
} else {
throw FormatRuntimeError("Error at position %d: invalid tag character, got: '%c'", i, c);
}
break;

case ParserState::ExpectQuote:
if (c == '"') {
current_value.clear();
state = ParserState::InValue;
} else {
throw FormatRuntimeError("Error at position %d: expected quote, got: '%c'", i, c);
}
break;

case ParserState::InValue:
if (c == '\\') {
state = ParserState::InEscapeSequence;
} else if (c == '"') {
if (result.contains(current_tag)) {
throw FormatRuntimeError("Error at position %d: tag '%s' is duplicated", i, current_tag.c_str());
}
result.emplace(std::move(current_tag), std::move(current_value));
state = ParserState::ExpectTagStart;
} else {
current_value += c;
}
break;

case ParserState::InEscapeSequence:
current_value += c;
state = ParserState::InValue;
break;
}
}

if (state != ParserState::ExpectTagStart) {
throw FormatRuntimeError("Unexpected end of line");
}

return result;
}

static IgnoreList*
load_ignore_list(const std::string& path, Config::IgnoreListMap& ignore_lists)
{

FILE *file = fopen(path.c_str(), "r");
if (file == nullptr) {
throw FormatRuntimeError("Cannot load ignore file: cannot open '%s' for reading", path.c_str());
}

AtScopeExit(file) { fclose(file); };

IgnoreList ignore_list;

{
char line_buf[4096];
size_t line_num = 0;
while (fgets(line_buf, sizeof(line_buf), file)) {
std::string_view line(line_buf);
if (line.back() == '\n') {
line.remove_suffix(1);
}

line_num++;
if (line.empty()) {
continue;
}

try {
auto parsed_line = parse_ignore_list_line(line);

if (parsed_line.empty()) {
continue;
}

IgnoreListEntry entry{};

for (auto& [tag, value] : parsed_line) {
#define set_tag_entry(tagname) if (tag == #tagname) { entry.tagname = std::move(value); continue; }
set_tag_entry(artist)
set_tag_entry(album)
set_tag_entry(title)
set_tag_entry(track)
#undef set_tag_entry
throw FormatRuntimeError("Unsupported tag: '%s'", tag.c_str());
}

ignore_list.entries.emplace_back(std::move(entry));
} catch (const std::runtime_error& error) {
throw FormatRuntimeError("Error loading ignore list '%s': Error parsing line %d: %s",
path.c_str(), line_num, error.what());
}
}
}

return &(ignore_lists[path] = std::move(ignore_list));
}

static ScrobblerConfig
load_scrobbler_config(const Config &config,
const std::string &section_name,
const IniSection &section)
const IniSection &section,
Config::IgnoreListMap& ignore_lists)
{
ScrobblerConfig scrobbler;

Expand Down Expand Up @@ -270,6 +412,17 @@ load_scrobbler_config(const Config &config,
scrobbler.journal = get_default_cache_path(config);
}

std::string ignore_list = GetStdString(section, "ignore");
if (!ignore_list.empty()) {
if (auto existing_ignore_list = ignore_lists.find(ignore_list); existing_ignore_list != ignore_lists.end()) {
scrobbler.ignore_list = &existing_ignore_list->second;
} else {
scrobbler.ignore_list = load_ignore_list(ignore_list, ignore_lists);
}
} else {
scrobbler.ignore_list = nullptr;
}

return scrobbler;
}

Expand Down Expand Up @@ -300,7 +453,8 @@ load_config_file(Config &config, const char *path)

config.scrobblers.emplace_front(load_scrobbler_config(config,
section.first,
section.second));
section.second,
config.ignore_lists));
}
}

Expand Down
8 changes: 8 additions & 0 deletions src/Scrobbler.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,10 @@ Scrobbler::ScheduleNowPlaying(const Record &song) noexcept
/* there's no "now playing" support for files */
return;

if (config.ignore_list && config.ignore_list->matches_record(song)) {
return;
}

now_playing = song;

if (state == State::READY && !submit_timer.IsPending())
Expand Down Expand Up @@ -515,6 +519,10 @@ Scrobbler::Submit() noexcept
void
Scrobbler::Push(const Record &song) noexcept
{
if (config.ignore_list && config.ignore_list->matches_record(song)) {
return;
}

if (file != nullptr) {
fprintf(file, "%s %s - %s\n",
log_date(),
Expand Down
Loading