diff --git a/docs/modules/hdfs/pages/getting_started/installation.adoc b/docs/modules/hdfs/pages/getting_started/installation.adoc index aa84de9d..b94e30fe 100644 --- a/docs/modules/hdfs/pages/getting_started/installation.adoc +++ b/docs/modules/hdfs/pages/getting_started/installation.adoc @@ -1,21 +1,22 @@ = Installation -On this page you will install the Stackable HDFS operator and its dependency, the Zookeeper operator, as well as the commons and secret operators which are required by all Stackable operators. +On this page you will install the Stackable HDFS operator and its dependency, the Zookeeper operator, as well as the +commons and secret operators which are required by all Stackable operators. == Stackable Operators There are 2 ways to run Stackable Operators -1. Using xref:stackablectl::index.adoc[] - -2. Using Helm +. Using xref:management:stackablectl:index.adoc[] +. Using Helm === stackablectl -stackablectl is the command line tool to interact with Stackable operators and our recommended way to install operators. -Follow the xref:stackablectl::installation.adoc[installation steps] for your platform. +`stackablectl` is the command line tool to interact with Stackable operators and our recommended way to install +operators. Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform. -After you have installed stackablectl run the following command to install all operators necessary for the HDFS cluster: +After you have installed `stackablectl`, run the following command to install all operators necessary for the HDFS +cluster: [source,bash] ---- @@ -31,7 +32,8 @@ The tool will show [INFO ] Installing hdfs operator ---- -TIP: Consult the xref:stackablectl::quickstart.adoc[] to learn more about how to use stackablectl. For example, you can use the `-k` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind]. +TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`. For +example, you can use the `--cluster kind` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind]. === Helm @@ -47,8 +49,10 @@ Then install the Stackable Operators: include::example$getting_started/getting_started.sh[tag=helm-install-operators] ---- -Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the HDFS cluster (as well as the CRDs for the required operators). You are now ready to deploy HDFS in Kubernetes. +Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the HDFS cluster (as well as the CRDs +for the required operators). You are now ready to deploy HDFS in Kubernetes. == What's next -xref:getting_started/first_steps.adoc[Set up an HDFS cluster] and its dependencies and xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works]. \ No newline at end of file +xref:getting_started/first_steps.adoc[Set up an HDFS cluster] and its dependencies and +xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works]. \ No newline at end of file diff --git a/docs/modules/hdfs/pages/index.adoc b/docs/modules/hdfs/pages/index.adoc index a33eac4c..2d05280b 100644 --- a/docs/modules/hdfs/pages/index.adoc +++ b/docs/modules/hdfs/pages/index.adoc @@ -2,20 +2,28 @@ :description: The Stackable Operator for Apache HDFS is a Kubernetes operator that can manage Apache HDFS clusters. Learn about its features, resources, dependencies and demos, and see the list of supported HDFS versions. :keywords: Stackable Operator, Hadoop, Apache HDFS, Kubernetes, k8s, operator, engineer, big data, metadata, storage, cluster, distributed storage -The Stackable Operator for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[Apache HDFS] (Hadoop Distributed File System) is used to set up HFDS in high-availability mode. HDFS is a distributed file system designed to store and manage massive amounts of data across multiple machines in a fault-tolerant manner. The Operator depends on the xref:zookeeper:index.adoc[] to operate a ZooKeeper cluster to coordinate the active and standby NameNodes. +The Stackable Operator for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[Apache HDFS] +(Hadoop Distributed File System) is used to set up HFDS in high-availability mode. HDFS is a distributed file system +designed to store and manage massive amounts of data across multiple machines in a fault-tolerant manner. The Operator +depends on the xref:zookeeper:index.adoc[] to operate a ZooKeeper cluster to coordinate the active and standby NameNodes. == Getting started -Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable HDFS and ZooKeeper Operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set up correctly. +Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable +HDFS and ZooKeeper Operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set +up correctly. -Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your HDFS configuration to your needs, or have a look at the <> for some example setups. +Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your HDFS configuration to +your needs, or have a look at the <> for some example setups. == Operator model -The Operator manages the _HdfsCluster_ custom resource. The cluster implements three xref:home:concepts:roles-and-role-groups.adoc[roles]: +The Operator manages the _HdfsCluster_ custom resource. The cluster implements three +xref:home:concepts:roles-and-role-groups.adoc[roles]: * DataNode - responsible for storing the actual data. -* JournalNode - responsible for keeping track of HDFS blocks and used to perform failovers in case the active NameNode fails. For details see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html +* JournalNode - responsible for keeping track of HDFS blocks and used to perform failovers in case the active NameNode + fails. For details see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html * NameNode - responsible for keeping track of HDFS blocks and providing access to the data. @@ -24,30 +32,38 @@ image::hdfs_overview.drawio.svg[A diagram depicting the Kubernetes resources cre The operator creates the following K8S objects per role group defined in the custom resource. * Service - ClusterIP used for intra-cluster communication. -* ConfigMap - HDFS configuration files like `core-site.xml`, `hdfs-site.xml` and `log4j.properties` are defined here and mounted in the pods. +* ConfigMap - HDFS configuration files like `core-site.xml`, `hdfs-site.xml` and `log4j.properties` are defined here and + mounted in the pods. * StatefulSet - where the replica count, volume mounts and more for each role group is defined. -In addition, a `NodePort` service is created for each pod labeled with `hdfs.stackable.tech/pod-service=true` that exposes all container ports to the outside world (from the perspective of K8S). +In addition, a `NodePort` service is created for each pod labeled with `hdfs.stackable.tech/pod-service=true` that +exposes all container ports to the outside world (from the perspective of K8S). -In the custom resource you can specify the number of replicas per role group (NameNode, DataNode or JournalNode). A minimal working configuration requires: +In the custom resource you can specify the number of replicas per role group (NameNode, DataNode or JournalNode). A +minimal working configuration requires: * 2 NameNodes (HA) * 1 JournalNode * 1 DataNode (should match at least the `clusterConfig.dfsReplication` factor) -The Operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the HDFS instance. The discovery ConfigMap contains the `core-site.xml` file and the `hdfs-site.xml` file. +The Operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the HDFS instance. The +discovery ConfigMap contains the `core-site.xml` file and the `hdfs-site.xml` file. == Dependencies -HDFS depends on ZooKeeper for coordination between nodes. You can run a ZooKeeper cluster with the xref:zookeeper:index.adoc[]. Additionally, the xref:commons-operator:index.adoc[] and xref:secret-operator:index.adoc[] are needed. +HDFS depends on ZooKeeper for coordination between nodes. You can run a ZooKeeper cluster with the +xref:zookeeper:index.adoc[]. Additionally, the xref:commons-operator:index.adoc[] and +xref:secret-operator:index.adoc[] are needed. == [[demos]]Demos Two demos that use HDFS are available. -**xref:stackablectl::demos/hbase-hdfs-load-cycling-data.adoc[]** loads a dataset of cycling data from S3 into HDFS and then uses HBase to analyze the data. +**xref:demos:hbase-hdfs-load-cycling-data.adoc[]** loads a dataset of cycling data from S3 into HDFS and then uses HBase +to analyze the data. -**xref:stackablectl::demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc[]** showcases the integration between HDFS and Jupyter. New York Taxi data is stored in HDFS and analyzed in a Jupyter notebook. +**xref:demos:jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc[]** showcases the integration between HDFS and +Jupyter. New York Taxi data is stored in HDFS and analyzed in a Jupyter notebook. == Supported Versions