forked from benjaminvatterj/JobMarketTracker
-
Notifications
You must be signed in to change notification settings - Fork 0
/
custom_settings.py
113 lines (95 loc) · 3.73 KB
/
custom_settings.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
import pandas as pd
from JMTracker.auxiliary import (
validate_unique_id, validator_generator, validate_extension
)
"""
An example customized settings file. Uncomment and modify lines according
to your preferences and needs.
"""
settings = {
# Color theme for GUI from PySimpleGui
# To see the full list of available themes, execute the following in a
# python console:
# import PySimpleGUI as sg
# sg.list_of_look_and_feel_values()
# 'gui_theme': 'Material1',
# Decide whether custom input settings are overriden or updated. False will
# mean that any inputs you setup below will be added to the list of input
# sources. True will make it override the default list of inputs. If you
# override and set no sources, the system will likely crash and papa will
# be very mad at you.
# 'custom_overrides_default': False,
}
# == Input Type Configuration === #
"""
The following options determine where we get information from and how process
it.
Required Columns:
-----------------
The system requires the following columns in any input file:
- origin_id: an integer indexing the listings. This should be a unique
identifier of a posting within a system.
- title: the title of the posting
- location: the location of the posting
- institution: the name of the institution
- deadline: the application deadline in year-month-day
- url: the url to the full posting
Optional Columns:
-----------------
Besides these, the system also stores any column given a name in the renaming
rules, unless explicitly included in the "to_drop" list. The following columns
are also shown to users, if they exist:
- section, division, department, keywords, full_text
Column Generators:
------------------
If, after renaming some of the required or optional columns are missing, the system
will see if a generator missing exist. This generator should have a key
equal to {column name}_generator and should be a function that takes a row
of the data and the dataframe of postings already included (or None if its the
first usage) and return a value for the row.
Important:
----------
there's a series of column names that are protected and will be overwritten
even if present in the data:
"origin", "status", "updated", 'date_received', 'reveiewed', 'update_notes',
'notes', 'origin_id'
Also, 'origin_id' must be unique within each source, such that the tuple
('origin', 'origin_id') uniquely identifies a posting.
See the example below for more details or look at the original
setting file within the project folder
input_option_settings = [
{
# Name for the origin
'origin': 'CustomDocs',
# Download url
'download_url': '..',
# Expected extension for download hint
'expected_extension': 'csv',
# A validator for the path given for the file to load
'url_validator': validator_generator(
[validate_extension], 'CustomDocs', [('csv', 'CustomDocs')]
),
# Input file name to move ans store
'input_file_name': 'latest_nu.csv',
# A validator for the data
'validator': validator_generator(
[validate_unique_id], 'CustomDocs', [('ID', 'CustomDocs')]
),
# Download instructions
'download_instructions': 'Download as CSV!',
# Custom loader
'loader': pd.read_csv,
# Renaming rules to match the requirements of the system
'renaming_rules': {
'ID': 'origin_id',
'Title': 'title',
'Institution': 'institution',
'Department': 'department',
'Deadline': 'deadline',
'URL': 'url'
},
# Location generator if not provided.
'location_generator': (lambda x, y: 'unknown')
},
]
"""