Hive metastore is responsible for storing all the metadata about the database tables we create in Presto and Hive. By default, the metastore stores this information in a local embedded Derby database in a PersistentVolume attached to the pod.
Generally the default configuration of Hive metastore works for small clusters, but users may wish to improve performance or move storage requirements out of cluster by using a dedicated SQL database for storing the Hive metastore data.
Hive, by default requires one Persistent Volume to operate.
hive-metastore-db-data
is the main PVC required by default.
This PVC is used by Hive metastore to store metadata about tables, such as table name, columns, and location.
Hive metastore is used by Presto and Hive server to lookup table metadata when processing queries.
In practice, it is possible to remove this requirement by using MySQL or PostgreSQL for the Hive metastore database.
To install, Hive metastore requires that dynamic volume provisioning be enabled via a Storage Class, a persistent volume of the correct size must be manually pre-created, or that you use a pre-existing MySQL or PostgreSQL database.
To configure and specify a StorageClass
for the hive-metastore-db-data PVC, specify the StorageClass
in your MeteringConfig.
A example StorageClass
section is included in metastore-storage.yaml.
Uncomment the spec.hive.spec.metastore.storage.class
sections and replace the null
in class: null
value with the name of the StorageClass to use.
Leaving the value null
will cause Metering to use the default StorageClass for the cluster.
Use metastore-storage.yaml as a template and adjust the size: "5Gi"
value to the desired capacity for the following sections:
spec.hive.spec.metastore.storage.size
By default to make installation easier Metering configures Hive to use an embedded Java database called Derby, however this is unsuited for larger environments or metering installations with a lot of reports and metrics being collected. Currently two alternative options are available, MySQL and PostgreSQL, both of which have been tested with operator metering.
There are 4 configuration options you can use to control the database used by Hive metastore: url
, driver
, username
, and password
.
Using MySQL:
spec:
hive:
spec:
config:
db:
url: "jdbc:mysql://mysql.example.com:3306/hive_metastore"
driver: "com.mysql.jdbc.Driver"
username: "REPLACEME"
password: "REPLACEME"
You can pass additional JDBC parameters using the spec.hive.spec.config.db.url
, for more details see the MySQL Connector/J documentation.
spec:
hive:
spec:
config:
db:
url: "jdbc:postgresql://postgresql.example.com:5432/hive_metastore"
driver: "org.postgresql.Driver"
username: "REPLACEME"
password: "REPLACEME"
You can pass additional JDBC parameters using the url
, for more details see the PostgreSQL JDBC driver documentation.