Skip to content
This repository has been archived by the owner on Oct 25, 2022. It is now read-only.

Reduce Spotify API calls #53

Open
Shpigford opened this issue Sep 1, 2018 · 10 comments
Open

Reduce Spotify API calls #53

Shpigford opened this issue Sep 1, 2018 · 10 comments
Labels
help wanted Extra attention is needed optimization

Comments

@Shpigford
Copy link
Collaborator

A major bottleneck in our data processing is Spotify.

Checking for new music is very resource intensive as the only way (that I'm aware of) to do it is to loop through every single artist, then loop through every one of that artist's albums to see if any are new.

This means a single artist can generate dozens if not hundreds of individual jobs and calls to the Spotify API.

Even if we can't figure out a workaround with the Spotify API itself, maybe there's a clever way to decide when an artist actually needs updated.

i.e. An artist that hasn't released anything in 30 years has a relatively low chance of releasing something now...yet we still check them every. single. day.

@Shpigford Shpigford added help wanted Extra attention is needed optimization labels Sep 1, 2018
@Shpigford
Copy link
Collaborator Author

Currently what I've got in mind...

This would be part of the BuildArtistJob that gets run each day.

https://github.com/Shpigford/droptune/blob/master/app/jobs/build_artist_job.rb

# Figure out the last date the artist released an album
last_release_date = artist.albums.order('release_date desc').first.release_date

# If that date is over X years, then set the interval for pinging Spotify...
case
  when last_release_date < 30.years.ago
    days = 30
  when last_release_date < 20.years.ago
    days = 20
  when last_release_date < 10.years.ago
    days = 10
  when last_release_date < 5.years.ago
    days = 5
  else
    days = 1
end

# Ping spotify only if `spotify_last_updated_at` is blank or if it's been more than the interval we set above
BuildArtistSpotifyJob.perform_async(artist_id) if artist.spotify_last_updated_at.blank? or artist.spotify_last_updated_at < days.day.ago

@danielcompton
Copy link

danielcompton commented Sep 2, 2018

Are you using Conditional Requests? That would cut down on the work each request needs to do, and presumably would give faster responses for the requests you do need to make.

Also, https://developer.spotify.com/documentation/web-api/reference/artists/get-artists-albums/ looks like you can do one (or maybe multiple paginated) requests per artist. If the sort order was stable then you could do some tricks about requesting only the offset where you expect new albums to reside.

The header for that API says

Get Spotify catalog information about an artist’s albums. Optional parameters can be specified in the query string to filter and sort the response

But I didn’t see any sorting parameters (but I’m on mobile so might have missed something).

@danielcompton
Copy link

to loop through every single artist, then loop through every one of that artist's albums to see if any are new.

Are you saving which albums you have seen after each API call, or do you check if each album was released after the last time you updated? Saving seen albums in a database would save a lot of detail lookups for each album and would reduce down to a handful of queries per artist.

I’m not sure what scale of querying you’re doing but adding some jitter to the next check time would prevent thundering herds of rechecks every 24 hours (if you’re not doing that already).

@pnomolos
Copy link

pnomolos commented Sep 2, 2018

What about using https://developer.spotify.com/documentation/web-api/reference/browse/get-list-new-releases/ (perhaps iterating over each of the available markets to make sure you get them all)?

Never mind: I see in Twitter comments that it’s manually curated.

Double-edit: you could start here as a way of not having to check a bunch of artists who have releases on this list.

P.P.P.S. Your less-than signs should be greater-than signs ;)

@Shpigford
Copy link
Collaborator Author

Shpigford commented Sep 2, 2018

P.P.P.S. Your less-than signs should be greater-than signs ;)

@pnomolos < & > signs mixed with times and "days ago" gets freaking insane and none of it makes sense. At the moment what's above seems to work. ¯_(ツ)_/¯

@Shpigford
Copy link
Collaborator Author

Are you using Conditional Requests? That would cut down on the work each request needs to do, and presumably would give faster responses for the requests you do need to make.

@danielcompton Oooo, I hadn't seen Conditional Requests! Looking in to them now.

Are you saving which albums you have seen after each API call, or do you check if each album was released after the last time you updated?

We permanently save all of the data we get from Spotify, but the problem is that Spotify has very few mechanisms for filtering the API calls...it's sort of a "get it all or get nothing" type of thing.

I don't currently do any paginating of Spotify results as it is so not much to do in regards to reducing those types of calls. 😕

@pnomolos
Copy link

pnomolos commented Sep 2, 2018

@Shpigford

@pnomolos < & > signs mixed with times and "days ago" gets freaking insane and none of it makes sense. At the moment what's above seems to work. ¯_(ツ)_/¯

Ah yes, you’re right. I was working relative to today in my head, instead of “closest to zero”. Thoughts on using new release list to remove artists from the list that needs to be checked for new releases?

@Shpigford
Copy link
Collaborator Author

@pnomolos

Thoughts on using new release list to remove artists from the list that needs to be checked for new releases?

Spotify's New Release list isn't thorough enough for it to make a dent. At best it'd remove maybe a few dozen or maybe 100 artists out of 100's of thousands.

@pnomolos
Copy link

pnomolos commented Sep 4, 2018

@Shpigford It's outside of the current ecosystem you're using, but perhaps https://www.allmusic.com/newreleases/all can help? Looks like there's 500-600 artists there. I'd imagine you could step back say 6 months and assume anyone who's released an album in that time frame doesn't need to be checked for a new one. That should cull at least several thousand artists out once you have the initial check on them.

@Shpigford
Copy link
Collaborator Author

@pnomolos Problem in that scenario are singles. There are artists who put out singles every few weeks (especially leading up to a full release). 😕

@Shpigford Shpigford reopened this Sep 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed optimization
Projects
None yet
Development

No branches or pull requests

3 participants