Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Muckrock Scraper #111

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Empty file added source_collectors/__init__.py
Empty file.
2 changes: 0 additions & 2 deletions source_collectors/muckrock/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,6 @@ pip install -r requirements.txt

### 2. Clone Muckrock database & search locally

~~- `download_muckrock_foia.py` `search_local_foia_json.py`~~ (deprecated)

- scripts to clone the MuckRock foia requests collection for fast local querying (total size <2GB at present)

- `create_foia_data_db.py` creates and populates a SQLite database (`foia_data.db`) with all MuckRock foia requests. Various errors outside the scope of this script may occur; a counter (`last_page_fetched.txt`) is created to keep track of the most recent page fetched and inserted into the database. If the program exits prematurely, simply run `create_foia_data_db.py` again to continue where you left off. A log file is created to capture errors for later reference.
Expand Down
Empty file.
65 changes: 65 additions & 0 deletions source_collectors/muckrock/classes/FOIADBSearcher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import os

Check warning on line 1 in source_collectors/muckrock/classes/FOIADBSearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIADBSearcher.py#L1 <100>

Missing docstring in public module
Raw output
./source_collectors/muckrock/classes/FOIADBSearcher.py:1:1: D100 Missing docstring in public module
import sqlite3

import pandas as pd

from source_collectors.muckrock.constants import FOIA_DATA_DB

check_results_table_query = """
SELECT name FROM sqlite_master
WHERE (type = 'table')
AND (name = 'results')
"""

search_foia_query = """
SELECT * FROM results
WHERE (title LIKE ? OR tags LIKE ?)
AND (status = 'done')
"""


class FOIADBSearcher:

Check warning on line 21 in source_collectors/muckrock/classes/FOIADBSearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIADBSearcher.py#L21 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/FOIADBSearcher.py:21:1: D101 Missing docstring in public class

def __init__(self, db_path = FOIA_DATA_DB):

Check warning on line 23 in source_collectors/muckrock/classes/FOIADBSearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIADBSearcher.py#L23 <107>

Missing docstring in __init__
Raw output
./source_collectors/muckrock/classes/FOIADBSearcher.py:23:1: D107 Missing docstring in __init__

Check failure on line 23 in source_collectors/muckrock/classes/FOIADBSearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIADBSearcher.py#L23 <251>

unexpected spaces around keyword / parameter equals
Raw output
./source_collectors/muckrock/classes/FOIADBSearcher.py:23:31: E251 unexpected spaces around keyword / parameter equals

Check failure on line 23 in source_collectors/muckrock/classes/FOIADBSearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIADBSearcher.py#L23 <251>

unexpected spaces around keyword / parameter equals
Raw output
./source_collectors/muckrock/classes/FOIADBSearcher.py:23:33: E251 unexpected spaces around keyword / parameter equals
self.db_path = db_path
if not os.path.exists(self.db_path):
raise FileNotFoundError("foia_data.db does not exist.\nRun create_foia_data_db.py first to create and populate it.")


def search(self, search_string: str) -> pd.DataFrame | None:

Check failure on line 29 in source_collectors/muckrock/classes/FOIADBSearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIADBSearcher.py#L29 <303>

too many blank lines (2)
Raw output
./source_collectors/muckrock/classes/FOIADBSearcher.py:29:5: E303 too many blank lines (2)
"""
Searches the foia_data.db database for FOIA request entries matching the provided search string.

Args:
search_string (str): The string to search for in the `title` and `tags` of the `results` table.

Returns:
Union[pandas.DataFrame, None]:
- pandas.DataFrame: A DataFrame containing the matching entries from the database.
- None: If an error occurs during the database operation.

Raises:
sqlite3.Error: If any database operation fails, prints error and returns None.
Exception: If any unexpected error occurs, prints error and returns None.
"""
try:
with sqlite3.connect(self.db_path) as conn:
results_table = pd.read_sql_query(check_results_table_query, conn)
if results_table.empty:
print("The `results` table does not exist in the database.")
return None

df = pd.read_sql_query(
sql=search_foia_query,
con=conn,
params=[f"%{search_string}%", f"%{search_string}%"]
)

except sqlite3.Error as e:
print(f"Sqlite error: {e}")
return None
except Exception as e:
print(f"An unexpected error occurred: {e}")
return None

return df

Check warning on line 65 in source_collectors/muckrock/classes/FOIADBSearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIADBSearcher.py#L65 <292>

no newline at end of file
Raw output
./source_collectors/muckrock/classes/FOIADBSearcher.py:65:18: W292 no newline at end of file
58 changes: 58 additions & 0 deletions source_collectors/muckrock/classes/FOIASearcher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
from typing import Optional

Check warning on line 1 in source_collectors/muckrock/classes/FOIASearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIASearcher.py#L1 <100>

Missing docstring in public module
Raw output
./source_collectors/muckrock/classes/FOIASearcher.py:1:1: D100 Missing docstring in public module

from source_collectors.muckrock.classes.muckrock_fetchers import FOIAFetcher
from tqdm import tqdm

class FOIASearcher:
"""
Used for searching FOIA data from MuckRock
"""

def __init__(self, fetcher: FOIAFetcher, search_term: Optional[str] = None):

Check warning on line 11 in source_collectors/muckrock/classes/FOIASearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIASearcher.py#L11 <107>

Missing docstring in __init__
Raw output
./source_collectors/muckrock/classes/FOIASearcher.py:11:1: D107 Missing docstring in __init__
self.fetcher = fetcher
self.search_term = search_term

def fetch_page(self) -> dict | None:
"""
Fetches the next page of results using the fetcher.
"""
data = self.fetcher.fetch_next_page()
if data is None or data.get("results") is None:
return None
return data

def filter_results(self, results: list[dict]) -> list[dict]:
"""
Filters the results based on the search term.
Override or modify as needed for custom filtering logic.
"""
if self.search_term:
return [result for result in results if self.search_term.lower() in result["title"].lower()]
return results

def update_progress(self, pbar: tqdm, results: list[dict]) -> int:
"""
Updates the progress bar and returns the count of results processed.
"""
num_results = len(results)
pbar.update(num_results)
return num_results

def search_to_count(self, max_count: int) -> list[dict]:
"""
Fetches and processes results up to a maximum count.
"""
count = max_count
all_results = []
with tqdm(total=max_count, desc="Fetching results", unit="result") as pbar:
while count > 0:
data = self.fetch_page()
if not data:
break

results = self.filter_results(data["results"])
all_results.extend(results)
count -= self.update_progress(pbar, results)

return all_results

Check warning on line 58 in source_collectors/muckrock/classes/FOIASearcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/FOIASearcher.py#L58 <391>

blank line at end of file
Raw output
./source_collectors/muckrock/classes/FOIASearcher.py:58:1: W391 blank line at end of file
38 changes: 38 additions & 0 deletions source_collectors/muckrock/classes/SQLiteClient.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import logging

Check warning on line 1 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L1 <100>

Missing docstring in public module
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:1:1: D100 Missing docstring in public module
import sqlite3


class SQLClientError(Exception):

Check warning on line 5 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L5 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:5:1: D101 Missing docstring in public class
pass


class SQLiteClient:

Check warning on line 9 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L9 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:9:1: D101 Missing docstring in public class

def __init__(self, db_path: str) -> None:

Check warning on line 11 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L11 <107>

Missing docstring in __init__
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:11:1: D107 Missing docstring in __init__
self.conn = sqlite3.connect(db_path)

def execute_query(self, query: str, many=None):

Check warning on line 14 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L14 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:14:1: D102 Missing docstring in public method

try:
if many is not None:
self.conn.executemany(query, many)
else:
self.conn.execute(query)
self.conn.commit()
except sqlite3.Error as e:
print(f"SQLite error: {e}")
error_msg = f"Failed to execute query due to SQLite error: {e}"
logging.error(error_msg)
self.conn.rollback()
raise SQLClientError(error_msg)

class SQLiteClientContextManager:

Check warning on line 29 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L29 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:29:1: D101 Missing docstring in public class

def __init__(self, db_path: str) -> None:

Check warning on line 31 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L31 <107>

Missing docstring in __init__
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:31:1: D107 Missing docstring in __init__
self.client = SQLiteClient(db_path)

def __enter__(self):

Check warning on line 34 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L34 <105>

Missing docstring in magic method
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:34:1: D105 Missing docstring in magic method
return self.client

def __exit__(self, exc_type, exc_value, traceback):

Check warning on line 37 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L37 <105>

Missing docstring in magic method
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:37:1: D105 Missing docstring in magic method

Check warning on line 37 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L37 <100>

Unused argument 'exc_type'
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:37:24: U100 Unused argument 'exc_type'

Check warning on line 37 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L37 <100>

Unused argument 'exc_value'
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:37:34: U100 Unused argument 'exc_value'

Check warning on line 37 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L37 <100>

Unused argument 'traceback'
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:37:45: U100 Unused argument 'traceback'
self.client.conn.close()

Check warning on line 38 in source_collectors/muckrock/classes/SQLiteClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/SQLiteClient.py#L38 <292>

no newline at end of file
Raw output
./source_collectors/muckrock/classes/SQLiteClient.py:38:33: W292 no newline at end of file
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from source_collectors.muckrock.constants import BASE_MUCKROCK_URL

Check warning on line 1 in source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py#L1 <100>

Missing docstring in public module
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py:1:1: D100 Missing docstring in public module
from source_collectors.muckrock.classes.muckrock_fetchers.MuckrockFetcher import FetchRequest, MuckrockFetcher


class AgencyFetchRequest(FetchRequest):

Check warning on line 5 in source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py#L5 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py:5:1: D101 Missing docstring in public class
agency_id: int

class AgencyFetcher(MuckrockFetcher):

Check warning on line 8 in source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py#L8 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py:8:1: D101 Missing docstring in public class

def build_url(self, request: AgencyFetchRequest) -> str:

Check warning on line 10 in source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py#L10 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py:10:1: D102 Missing docstring in public method
return f"{BASE_MUCKROCK_URL}/agency/{request.agency_id}/"

def get_agency(self, agency_id: int):

Check warning on line 13 in source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py#L13 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py:13:1: D102 Missing docstring in public method
return self.fetch(AgencyFetchRequest(agency_id=agency_id))

Check warning on line 14 in source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py#L14 <292>

no newline at end of file
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/AgencyFetcher.py:14:67: W292 no newline at end of file
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
from source_collectors.muckrock.classes.muckrock_fetchers.MuckrockFetcher import MuckrockFetcher, FetchRequest

Check warning on line 1 in source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py#L1 <100>

Missing docstring in public module
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py:1:1: D100 Missing docstring in public module
from source_collectors.muckrock.constants import BASE_MUCKROCK_URL

FOIA_BASE_URL = f"{BASE_MUCKROCK_URL}/foia"


class FOIAFetchRequest(FetchRequest):

Check warning on line 7 in source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py#L7 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py:7:1: D101 Missing docstring in public class
page: int
page_size: int


class FOIAFetcher(MuckrockFetcher):

Check warning on line 12 in source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py#L12 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py:12:1: D101 Missing docstring in public class

def __init__(self, start_page: int = 1, per_page: int = 100):
"""
Constructor for the FOIAFetcher class.

Args:
start_page (int): The page number to start fetching from (default is 1).
per_page (int): The number of results to fetch per page (default is 100).
"""
self.current_page = start_page
self.per_page = per_page

def build_url(self, request: FOIAFetchRequest) -> str:

Check warning on line 25 in source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py#L25 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py:25:1: D102 Missing docstring in public method
return f"{FOIA_BASE_URL}?page={request.page}&page_size={request.page_size}&format=json"

def fetch_next_page(self) -> dict | None:
"""
Fetches data from a specific page of the MuckRock FOIA API.
"""
page = self.current_page
self.current_page += 1
request = FOIAFetchRequest(page=page, page_size=self.per_page)
return self.fetch(request)

Check warning on line 36 in source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py#L36 <391>

blank line at end of file
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIAFetcher.py:36:1: W391 blank line at end of file
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from datasets import tqdm

Check warning on line 1 in source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py#L1 <100>

Missing docstring in public module
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py:1:1: D100 Missing docstring in public module

from source_collectors.muckrock.constants import BASE_MUCKROCK_URL
from source_collectors.muckrock.classes.muckrock_fetchers.MuckrockFetcher import FetchRequest
from source_collectors.muckrock.classes.muckrock_fetchers.MuckrockLoopFetcher import MuckrockLoopFetcher

class FOIALoopFetchRequest(FetchRequest):

Check warning on line 7 in source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py#L7 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py:7:1: D101 Missing docstring in public class
jurisdiction: int

class FOIALoopFetcher(MuckrockLoopFetcher):

Check warning on line 10 in source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py#L10 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py:10:1: D101 Missing docstring in public class

def __init__(self, initial_request: FOIALoopFetchRequest):

Check warning on line 12 in source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py#L12 <107>

Missing docstring in __init__
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py:12:1: D107 Missing docstring in __init__
super().__init__(initial_request)
self.pbar_records = tqdm(
desc="Fetching FOIA records",
unit="record",
)
self.num_found = 0
self.results = []

def process_results(self, results: list[dict]):

Check warning on line 21 in source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py#L21 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py:21:1: D102 Missing docstring in public method
self.results.extend(results)

def build_url(self, request: FOIALoopFetchRequest):

Check warning on line 24 in source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py#L24 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py:24:1: D102 Missing docstring in public method
return f"{BASE_MUCKROCK_URL}/foia/?status=done&jurisdiction={request.jurisdiction}"

def report_progress(self):

Check warning on line 27 in source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py#L27 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/FOIALoopFetcher.py:27:1: D102 Missing docstring in public method
old_num_found = self.num_found
self.num_found = len(self.results)
difference = self.num_found - old_num_found
self.pbar_records.update(difference)
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from source_collectors.muckrock.constants import BASE_MUCKROCK_URL

Check warning on line 1 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py#L1 <100>

Missing docstring in public module
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py:1:1: D100 Missing docstring in public module
from source_collectors.muckrock.classes.muckrock_fetchers.MuckrockFetcher import FetchRequest, MuckrockFetcher


class JurisdictionByIDFetchRequest(FetchRequest):

Check warning on line 5 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py#L5 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py:5:1: D101 Missing docstring in public class
jurisdiction_id: int

class JurisdictionByIDFetcher(MuckrockFetcher):

Check warning on line 8 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py#L8 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py:8:1: D101 Missing docstring in public class

def build_url(self, request: JurisdictionByIDFetchRequest) -> str:

Check warning on line 10 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py#L10 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py:10:1: D102 Missing docstring in public method
return f"{BASE_MUCKROCK_URL}/jurisdiction/{request.jurisdiction_id}/"

def get_jurisdiction(self, jurisdiction_id: int) -> dict:

Check warning on line 13 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py#L13 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionByIDFetcher.py:13:1: D102 Missing docstring in public method
return self.fetch(request=JurisdictionByIDFetchRequest(jurisdiction_id=jurisdiction_id))
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
from tqdm import tqdm

Check warning on line 1 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py#L1 <100>

Missing docstring in public module
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py:1:1: D100 Missing docstring in public module

from source_collectors.muckrock.constants import BASE_MUCKROCK_URL
from source_collectors.muckrock.classes.muckrock_fetchers.MuckrockFetcher import FetchRequest
from source_collectors.muckrock.classes.muckrock_fetchers.MuckrockLoopFetcher import MuckrockLoopFetcher


class JurisdictionLoopFetchRequest(FetchRequest):

Check warning on line 8 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py#L8 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py:8:1: D101 Missing docstring in public class
level: str
parent: int
town_names: list

class JurisdictionLoopFetcher(MuckrockLoopFetcher):

Check warning on line 13 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py#L13 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py:13:1: D101 Missing docstring in public class

def __init__(self, initial_request: JurisdictionLoopFetchRequest):

Check warning on line 15 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py#L15 <107>

Missing docstring in __init__
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py:15:1: D107 Missing docstring in __init__
super().__init__(initial_request)
self.town_names = initial_request.town_names
self.pbar_jurisdictions = tqdm(
total=len(self.town_names),
desc="Fetching jurisdictions",
unit="jurisdiction",
position=0,
leave=False
)
self.pbar_page = tqdm(
desc="Processing pages",
unit="page",
position=1,
leave=False
)
self.num_found = 0
self.jurisdictions = {}

def build_url(self, request: JurisdictionLoopFetchRequest) -> str:

Check warning on line 34 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py#L34 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py:34:1: D102 Missing docstring in public method
return f"{BASE_MUCKROCK_URL}/jurisdiction/?level={request.level}&parent={request.parent}"

def process_results(self, results: list[dict]):

Check warning on line 37 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py#L37 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py:37:1: D102 Missing docstring in public method
for item in results:
if item["name"] in self.town_names:
self.jurisdictions[item["name"]] = item["id"]

def report_progress(self):

Check warning on line 42 in source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py#L42 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/JurisdictionLoopFetcher.py:42:1: D102 Missing docstring in public method
old_num_found = self.num_found
self.num_found = len(self.jurisdictions)
difference = self.num_found - old_num_found
self.pbar_jurisdictions.update(difference)
self.pbar_page.update(1)
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import abc

Check warning on line 1 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L1 <100>

Missing docstring in public module
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:1:1: D100 Missing docstring in public module
from abc import ABC
from dataclasses import dataclass

Check warning on line 3 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L3 <401>

'dataclasses.dataclass' imported but unused
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:3:1: F401 'dataclasses.dataclass' imported but unused

import requests
from pydantic import BaseModel

class MuckrockNoMoreDataError(Exception):

Check warning on line 8 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L8 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:8:1: D101 Missing docstring in public class
pass

class MuckrockServerError(Exception):

Check warning on line 11 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L11 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:11:1: D101 Missing docstring in public class
pass

class FetchRequest(BaseModel):

Check warning on line 14 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L14 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:14:1: D101 Missing docstring in public class
pass

class MuckrockFetcher(ABC):

Check warning on line 17 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L17 <101>

Missing docstring in public class
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:17:1: D101 Missing docstring in public class

def fetch(self, request: FetchRequest):

Check warning on line 19 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L19 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:19:1: D102 Missing docstring in public method
url = self.build_url(request)
response = requests.get(url)
try:
response.raise_for_status()
except requests.exceptions.HTTPError as e:
print(f"Failed to get records on request `{url}`: {e}")
# If code is 404, raise NoMoreData error
if e.response.status_code == 404:
raise MuckrockNoMoreDataError
if 500 <= e.response.status_code < 600:
raise MuckrockServerError




return None

Check failure on line 35 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L35 <303>

too many blank lines (4)
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:35:13: E303 too many blank lines (4)

return response.json()

@abc.abstractmethod
def build_url(self, request: FetchRequest) -> str:

Check warning on line 40 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L40 <102>

Missing docstring in public method
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:40:1: D102 Missing docstring in public method

Check warning on line 40 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L40 <100>

Unused argument 'request'
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:40:25: U100 Unused argument 'request'
pass

Check warning on line 42 in source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py#L42 <391>

blank line at end of file
Raw output
./source_collectors/muckrock/classes/muckrock_fetchers/MuckrockFetcher.py:42:1: W391 blank line at end of file
Loading
Loading