The LearningHouse Service provides machine learning algorithms based on the scikit-learn python library as a RESTful API. Its purpose is to offer smart home enthusiasts an easy way to teach their homes.
If you have any questions, please contact us on Discord.
Please share your ideas on what you want to teach your home, suggestions or problems by opening an issue. We are really looking forward to your feedback.
Install and update using pip.
pip install -U learninghouse
Install and update using docker
docker pull ghcr.io/learninghouseservice/learninghouse:latest
mkdir -p brains
The brains
directory holds the model configuration as a json file. The models are the brains of your learning house.
There will be one subdirectory per brain, where all files relevant for a brain will be stored.
The brain subdirectory needs a config.json
file holding the basic configuration. The service will store a training_data.csv
file holding
all data from your sensors and an object dump of the trained model to a file called trained.pkl
.
The service is configured by environment variables. The following options can be set:
Environment Variable | default (production/development) | description |
---|---|---|
LEARNINGHOUSE_ENVIRONMENT | production | Choose the default environment settings: production or development. |
LEARNINGHOUSE_HOST | 127.0.0.1 | Set the address that the service should bind to. (use 0.0.0.0 for all available) |
LEARNINGHOUSE_PORT | 5000 | Set the port on which the service should listen. |
LEARNINGHOUSE_BASE_URL | Not set | Set the base URL for external access, for example, the hostname of your Docker host. |
LEARNINGHOUSE_CONFIG_DIRECTORY | ./brains | Define the directory where all configuration data goes. |
LEARNINGHOUSE_OPENAPI_FILE | /learninghouse_api.json | Provide the file URL path to the OpenAPI JSON file. |
LEARNINGHOUSE_DOCS_URL | /docs | Define the URL path for the interactive API documentation. If you leave it empty, the documentation will be disabled. |
LEARNINGHOUSE_JWT_SECRET | Generated on startup | For administration authentication, a JWT is generated after login. This JWT is signed with a secret. By default, it is generated on startup, which will invalidate existing JWTs on each restart. |
LEARNINGHOUSE_JWT_EXPIRE_MINUTES | 10 | The refresh token of JWTs will expire after a given amount of minutes. |
LEARNINGHOUSE_LOGGING_LEVEL | INFO | Set logging level to DEBUG, INFO, WARNING, ERROR, CRITICAL |
LEARNINGHOUSE_DEBUG | (False/True) | The debugger will be automatically activated in the development environment. For security reasons, it is recommended not to activate it in production. |
LEARNINGHOUSE_RELOAD | (False/True) | The source will be automatically reloaded in the development environment. For security reasons, it is recommended not to activate it in production. |
You can download .env.example and rename it to .env
. Inside, you can modify the default configuration values to meet your needs in this file.
Copy the .env.example file to .env and modify it according to your needs.
Then, simply run learninghouse
to start the service. By default, the service will listen on http://localhost:5000/.
docker run --name learninghouse --rm -v brains:/learninghouse/brains -p 5000:5000 -e "TZ=Europe/Berlin" ghcr.io/learninghouseservice/learninghouse:latest
For configuration purposes, there is a small user interface that can be found at http://localhost:5000/ui.
The service is protected by different authentication and authorization mechanisms. For administration, you can log in via the UI.
On the first run, the service is set to use the fallback password learninghouse
for the administrator account. Until this is changed, all other endpoints will be deactivated.
You can change the password on the initial login screen of the UI.
Security notice: Unless you use a proxy setup for SSL security of your connection, only use a separate password for your learninghouse.
You can use your administration access for training and prediction endpoints, but we also recommend using an API key mechanism for application access. There are two roles for API key authorization: user
for the prediction endpoint and trainer
for the training and prediction endpoints.
You can add more API keys via the UI.
Your API key will only be displayed once and cannot be requested again. So save it for your usage. If you forget it, you will have to delete this API key and recreate it.
You have to provide this API key for all requests, either as a query parameter ?api_key=YOURSECRETKEY
or as a header field X-LEARNINGHOUSE-API-KEY: YOURSECRETKEY
.
You can also test the API key by logging in to the UI.
Send data from all sensors to the learningHouse Service, especially when training your brains. The service will save all data fields, even if they are not currently used as a feature
. The service will choose the best feature set each time you train a brain.
In general, sensor data can be divided into two different types. Numerical data
can be processed directly by your models, while Categorical data
needs to be preprocessed by the service in order to be used as a feature
. Categorical data
can be identified using a simple rule:
- Non-numerical values, or
- Numerical values that can be described using terms.
Here are some examples of categorical data:
- pressure_trend: Values of 'falling', 'rising', 'consistent'
- month_of_year: 1 ('January'), 2 ('February'), ...
- weather_condition: 'sunny', 'cloudy'
- switch: 'ON', 'OFF'
To enable the service to use the data from your sensors as features
for your brain, you need to provide the service with information about the data type. You can add each sensor you want to use via the UI.
For example, add the following sensors:
Name | type |
---|---|
azimuth | numerical |
elevation | numerical |
rain_gauge | numerical |
pressure | numerical |
pressure_trend_1h | categorical |
temperature_outside | numerical |
temperature_trend_1h | categorical |
light_state | categorical |
The brain determines whether it is dark enough to switch on the light. It utilizes a machine learning algorithm called RandomForestClassifier.
To add a new brain via the UI, use your administration account and provide the following parameters.
Field | Value |
---|---|
Name | darkness |
Typed | Classifier |
Dependent encode | True |
Test size | 0.2 |
Estimators | 100 |
Max depth | 5 |
The LearningHouse Service can predict values using an estimator. An estimator can be of type classifier
, which is best suited for categorical outputs, such as true and false. If you want to predict a numerical value, such as the setpoint of a heating equipment, use the type regressor
instead.
For both types, the LearningHouse Service uses a machine learning algorithm called random forest estimation. This algorithm builds a "forest" of decision trees with your features
and takes the mean of the predictions of all of them to give you the best result. For more details, see the API description of scikit-learn.
Estimator type | API Reference |
---|---|
RandomForestRegressor | https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor |
RandomForestClassifier | https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier |
You can adjust the number of decision trees by using the estimators
(default: 100) option. You can also adjust the maximum depth of each tree by using the max depth
(default: 5) option. Both options are optional. Try resizing these values to optimize the accuracy of your model.
The dependent
variable is the one that must be included in the training data and is predicted by the trained brain. It is the same as the name
variable.
The dependent
variable must be a number. If it is not a number, but a string or boolean (true/false) as shown in the example, set dependent encode
to yes.
The LearningHouse service only uses a portion of your training data to train the brain. The remaining portion, specified by test size
, is used to score the accuracy of your brain.
You can specify the test size
as a percentage using floating point numbers between 0.01 and 0.99, or as an absolute number of data points using integer numbers.
For example, a test size
of 20% (0.2) should be sufficient to start with.
An accuracy score between 80% and 90% is considered good. Scores below 80% indicate that the brain is underfitted, while scores above 90% indicate that the brain is overfitted. Both cases can result in poor predictions for new data points. You can try adjusting the estimator
configuration to improve the score.
Training of the brain will start when there are at least 10 data points.
You can also change the configuration of sensors and brains using the API. Please refer to the interactive API documentation when the service is running.
When the service is running, you can access an interactive API documentation by calling the URL http://localhost:5000/docs.
To train, send a PUT request to the service:
You need administration JWT or API key role trainer
for this request (see Security)
# URL is http://<host>:5000/api/brain/:name/training
curl --location --request PUT 'http://localhost:5000/api/brain/darkness/training' \
--header 'Content-Type: application/json' \
--header 'X-LEARNINGHOUSE-API-KEY: YOURSECRETKEY' \
--data-raw '{
"dependent_value": true,
"sensors_data": {
"azimuth": 321.4441223144531,
"elevation": -19.691608428955078,
"rain_gauge": 0.0,
"pressure": 971.0,
"pressure_trend_1h": "falling",
"temperature_outside": 23.0,
"temperature_trend_1h": "rising",
"light_state": false
}
}'
You can send either a field timestamp
with your dataset containing a UNIX-Timestamp or the service will add this information with its current time. The service generates some further time-relevant fields inside the training dataset that you can also use as features
. These are month_of_year
, day_of_month
, day_of_week
, hour_of_day
, and minute_of_hour
.
If one of your sensors is not working at the moment and therefore not sending a value, the service will add a value using the following rules. For categorical data
, all categorical columns will be set to zero. For numerical data
, the mean of all known training set values (see Test size) for this feature
will be assumed.
To train the brain with existing data, for example after a service update, use a POST request without data:
You need an administrator JWT or API key with the role trainer
for this request (see Security).
# URL is http://host:5000/api/brain/:name/training
curl --location \
--header 'X-LEARNINGHOUSE-API-KEY: YOURSECRETKEY' \
--request POST 'http://localhost:5000/api/brain/darkness/training'
To obtain information about a trained brain, use a GET request:
You will need an administrator JWT or API key with the role of trainer
or user
for this request (see Security).
# URL is http://host:5000/api/brain/:name/info
curl --location \
--header 'X-LEARNINGHOUSE-API-KEY: YOURSECRETKEY' \
--request GET 'http://localhost:5000/brain/darkness/info'
To predict a new data set with your brain, send a POST request:
You need an administrator JWT or API key with the role trainer
or user
for this request (see Security).
# URL is http://host:5000/api/brain/:name/prediction
curl --location --request POST 'http://localhost:5000/api/brain/darkness/prediction' \
--header 'Content-Type: application/json' \
--header 'X-LEARNINGHOUSE-API-KEY: YOURSECRETKEY' \
--data-raw '{
"azimuth": 321.4441223144531,
"elevation": -19.691608428955078,
"rain_gauge": 0.0,
"pressure_trend_1h": "falling"
}'
If one of your sensors used as a feature
in the brain is not working at the moment and is not sending a value, the service will handle this by using the following rules. For categorical data
, all categorical columns will be set to zero. For numerical data
, the mean of all known training set values (see Test size) for this feature
will be assumed.