-
Notifications
You must be signed in to change notification settings - Fork 16
Reduce Spotify API calls #53
Comments
Currently what I've got in mind... This would be part of the BuildArtistJob that gets run each day. https://github.com/Shpigford/droptune/blob/master/app/jobs/build_artist_job.rb # Figure out the last date the artist released an album
last_release_date = artist.albums.order('release_date desc').first.release_date
# If that date is over X years, then set the interval for pinging Spotify...
case
when last_release_date < 30.years.ago
days = 30
when last_release_date < 20.years.ago
days = 20
when last_release_date < 10.years.ago
days = 10
when last_release_date < 5.years.ago
days = 5
else
days = 1
end
# Ping spotify only if `spotify_last_updated_at` is blank or if it's been more than the interval we set above
BuildArtistSpotifyJob.perform_async(artist_id) if artist.spotify_last_updated_at.blank? or artist.spotify_last_updated_at < days.day.ago |
Are you using Conditional Requests? That would cut down on the work each request needs to do, and presumably would give faster responses for the requests you do need to make. Also, https://developer.spotify.com/documentation/web-api/reference/artists/get-artists-albums/ looks like you can do one (or maybe multiple paginated) requests per artist. If the sort order was stable then you could do some tricks about requesting only the offset where you expect new albums to reside. The header for that API says
But I didn’t see any sorting parameters (but I’m on mobile so might have missed something). |
Are you saving which albums you have seen after each API call, or do you check if each album was released after the last time you updated? Saving seen albums in a database would save a lot of detail lookups for each album and would reduce down to a handful of queries per artist. I’m not sure what scale of querying you’re doing but adding some jitter to the next check time would prevent thundering herds of rechecks every 24 hours (if you’re not doing that already). |
What about using https://developer.spotify.com/documentation/web-api/reference/browse/get-list-new-releases/ (perhaps iterating over each of the available markets to make sure you get them all)? Never mind: I see in Twitter comments that it’s manually curated. Double-edit: you could start here as a way of not having to check a bunch of artists who have releases on this list. P.P.P.S. Your less-than signs should be greater-than signs ;) |
@pnomolos < & > signs mixed with times and "days ago" gets freaking insane and none of it makes sense. At the moment what's above seems to work. ¯_(ツ)_/¯ |
@danielcompton Oooo, I hadn't seen Conditional Requests! Looking in to them now.
We permanently save all of the data we get from Spotify, but the problem is that Spotify has very few mechanisms for filtering the API calls...it's sort of a "get it all or get nothing" type of thing. I don't currently do any paginating of Spotify results as it is so not much to do in regards to reducing those types of calls. 😕 |
Ah yes, you’re right. I was working relative to today in my head, instead of “closest to zero”. Thoughts on using new release list to remove artists from the list that needs to be checked for new releases? |
Spotify's New Release list isn't thorough enough for it to make a dent. At best it'd remove maybe a few dozen or maybe 100 artists out of 100's of thousands. |
@Shpigford It's outside of the current ecosystem you're using, but perhaps https://www.allmusic.com/newreleases/all can help? Looks like there's 500-600 artists there. I'd imagine you could step back say 6 months and assume anyone who's released an album in that time frame doesn't need to be checked for a new one. That should cull at least several thousand artists out once you have the initial check on them. |
@pnomolos Problem in that scenario are singles. There are artists who put out singles every few weeks (especially leading up to a full release). 😕 |
A major bottleneck in our data processing is Spotify.
Checking for new music is very resource intensive as the only way (that I'm aware of) to do it is to loop through every single artist, then loop through every one of that artist's albums to see if any are new.
This means a single artist can generate dozens if not hundreds of individual jobs and calls to the Spotify API.
Even if we can't figure out a workaround with the Spotify API itself, maybe there's a clever way to decide when an artist actually needs updated.
i.e. An artist that hasn't released anything in 30 years has a relatively low chance of releasing something now...yet we still check them every. single. day.
The text was updated successfully, but these errors were encountered: