Skip to content

Variables passed to the Search function

DirtyRacer1337 edited this page Aug 3, 2020 · 12 revisions

The search function passes in a lot of variables, and I wanted to get down on paper what each one is, and what it does. This is as much for my own deeper understanding of the code as it is to share with others:

def search(self, results, media, lang):

This is found in __init__.py. I don't see it called or referenced anywhere else, so I think this is basic structure that Plex uses to call any metadata agent's search function. It is likely defined by Plex and cannot be altered.

results = PAsearchSites.??????????.search(results, encodedTitle, searchTitle, siteNum, lang, searchDate)

This function call is found a little further down into __init__.search() function, and it basically references another function, also named search, but in a different file. This is how Phoenix created a dedicated search function for each site that is supported. A lot of things get passed to these individual search functions, and is the main reason for making this page.

Here is my understanding of each variable passed, in no particular order:

results = this is the main variable that is returned to Plex. It appears to be a list object containing: [ID (made up of the result URL and site number), result name, score out of 100, language] in a particular format

searchTitle = at this point the title then gets passed to a function PAsearchSites.getSearchSettings() where it gets processed further and returned as searchTitle: periods and dashes are replaced with a single space. It then tries to find the site name at the beginning of the title and remove it. There is a special few lines of code here specifically for sites in the Babes network. It will go through each word one by one, and if it contains an apostrophe it drops that whole word out of the title. Lastly, it tries to identify if there's a date at the beginning of the title (remember, this is after the site has been removed...), saves that date off to another variable and then strips it out. Theoretically all we're left with at this point is the actual scene name or actress name or whatever search term(s) we're going to use in the search...

encodedTitle = this is simply the searchTitle after being cleaned up and then run through urllib.quote(), basically making it URL-ready (replacing spaces with %20 and so forth) searchSiteID = In the same function that processes the searchTitle above, it also matches the site name (when looking for site names to remove from the beginning) and when it finds a match, it returns the ID (that is, the number of the site in that big ol' list of searchSites[]) as searchSiteID. If no match is found, it returns 9999.

searchDate = same function as above, when it searches for a date at the beginning of the title, it saves the date (if found) as searchDate. If it finds a date with a 2-digit year, it will return it as a 4-digit year (ie. 18-12-25 becomes 2018-12-25). If it finds spaces, it will return it with dashes (ie. 2018 12 25 becomes 2018-12-25).

lang = this is passed in to the top level search function along with result, and I never see it manipulated, only added to result and returned to Plex.

siteNum = I'm not sure I fully understand the need for this. In the PAsearchSites.getSearchSettings() function discussed above, the way it matches the site name at the beginning of the title is by looping through all the sites listed in the searchSites[] variable and comparing those words to the words in title. If it finds a match, it returns the ID as searchSiteID. This appears to do the same thing, looping through all the sites again and just checking if the searchSiteID goes with that search function. Couldn't we just take this second loop out, match everything by the searchSiteID that we've already found and save a bunch of CPU cycles? At best, I can see the siteNum loop being used when searchSiteID == 9999, but we shouldn't have to do this again if we already found a searchSiteID... Somebody help me out with the logic here.

Alright, so now some personal thoughts:

  1. I don't see any reason to pass binary variables (like the True/False flag for searchAll or searchByDateActor). When you're coding the search function inside your specific site file, instead of if searchAll: You can just as easily do if searchSiteID == 9999:

and get the exact same result. One less variable to pass around. The same concept applies to searchByDateActor and using searchDate. I will probably work to eliminate those variables just to tidy up the code.

  1. I see now how searchByDateActor is being used, and I can probably increase the matching scores on a bunch of sites I've added recently. Basically if that passes as True, the search function scores the releaseDate of the search result vs. the searchDate passed into the search function. Well I've been going back and making sure all my search results pull the releaseDate, because I wanted it displayed in the result, but I haven't been using it to adjust the score in any way. I'll rework my code to do that... In summary, there doesn't appear to be any difference between a date+actor search or a title search in the actual search functionality, the only difference is in the result scoring...

  2. I'll probably tinker with removing siteNum as well. I feel it just adds extra cycles to the code that aren't necessary. And as we add more and more sites to the supported list, having the code be as efficient as possible will mean quicker returns when searching for results. I'm already seeing long wait times when I try searching a site and misspell the site name (where it's basically doing a searchAll search). Side note, I've also noticed I never get results from searchAll searches, and I think it's because it hits LesbianX and dies due to the SSLv3 error.

  3. I found a neat piece of code in the BangBros search function that grabs 2 pages worth of results. Some websites' search function returns 4 results per page, some returns 20. If a website's per-page returns are low, we can grab results from a few extra pages to pad that out some.