From 4afbb6d120f913678b895bd069718d91c8b42d5d Mon Sep 17 00:00:00 2001 From: Seonmi-Lee Date: Sat, 24 Sep 2016 19:34:46 +0900 Subject: [PATCH 1/6] Test Seonmi Test --- docs/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/README.md b/docs/README.md index 4dc810edf18..a4e1c1e4b27 100644 --- a/docs/README.md +++ b/docs/README.md @@ -61,3 +61,5 @@ If you wish to help us and contribute to Zeppelin Documentation, please look at ``` 3. copy `zeppelin/docs/_site` to `asf-zeppelin/site/docs/[VERSION]` 4. ```svn commit``` + +LeeSeonmi \ No newline at end of file From 4a9b6263fdae18284bb55a6d7077da67b934ea38 Mon Sep 17 00:00:00 2001 From: Seonmi-Lee Date: Sat, 24 Sep 2016 19:42:29 +0900 Subject: [PATCH 2/6] test2 ssun test2 --- docs/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.md b/docs/README.md index a4e1c1e4b27..31d85c15c71 100644 --- a/docs/README.md +++ b/docs/README.md @@ -62,4 +62,4 @@ If you wish to help us and contribute to Zeppelin Documentation, please look at 3. copy `zeppelin/docs/_site` to `asf-zeppelin/site/docs/[VERSION]` 4. ```svn commit``` -LeeSeonmi \ No newline at end of file +Lee-Seonmi \ No newline at end of file From 2cb4e6f105b6a94b6ec806cb8ce27f8255f86179 Mon Sep 17 00:00:00 2001 From: "DESKTOP-4GVV4F1\\gkes" Date: Thu, 29 Sep 2016 20:37:46 +0900 Subject: [PATCH 3/6] translate Zeppelin Tutorial Document to Korean --- docs/quickstart/tutorial.md | 63 ++++++----- docs/quickstart/tutorial_en.md | 198 +++++++++++++++++++++++++++++++++ 2 files changed, 229 insertions(+), 32 deletions(-) create mode 100644 docs/quickstart/tutorial_en.md diff --git a/docs/quickstart/tutorial.md b/docs/quickstart/tutorial.md index 4947f3ce8a0..5b52ba413be 100644 --- a/docs/quickstart/tutorial.md +++ b/docs/quickstart/tutorial.md @@ -19,21 +19,21 @@ limitations under the License. --> {% include JB/setup %} -# Zeppelin Tutorial +# 제플린 튜토리얼
-This tutorial walks you through some of the fundamental Zeppelin concepts. We will assume you have already installed Zeppelin. If not, please see [here](../install/install.html) first. +이 튜토리얼은 핵심 제플린 개념의 일부를 소개합니다. 튜토리얼에 들어가기 전에 제플린을 먼저 설치해야합니다.그렇지 않으면 [이 곳](../install/install.html)을 먼저 참조합니다. -Current main backend processing engine of Zeppelin is [Apache Spark](https://spark.apache.org). If you're new to this system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin. +현재 제플린의 주요 백엔드 처리 엔진은 [Apache Spark](https://spark.apache.org)입니다. 이 시스템이 처음이라면, 스파크가 제플린을 최대한 활용하기 위해 데이터를 어떻게 처리하는 지에 대한 방안을 가지고 시작하길 원할 것입니다. -## Tutorial with Local File +## 로컬 파일을 이용한 튜토리얼 -### Data Refine +### 데이터 정제 -Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip). +튜토리얼을 시작하기 전에, [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip)을 먼저 다운로드 받습니다. -First, to transform csv format data into RDD of `Bank` objects, run following script. This will also remove header using `filter` function. +우선, csv 형식 데이터를 Bank 객체의 RRD로 변환하기 위해 아래 스크립트를 실행한다. 또한 filter 함수를 사용해서 헤더를 제거합니다. ```scala @@ -41,7 +41,7 @@ val bankText = sc.textFile("yourPath/bank/bank-full.csv") case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer) -// split each line, filter out header (starts with "age"), and map it into Bank case class +// 각 라인을 분리하여 "age"로 시작하는 헤더를 걸러내고, 'Bank' case class로 매핑합니다. val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map( s=>Bank(s(0).toInt, s(1).replaceAll("\"", ""), @@ -51,38 +51,37 @@ val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map( ) ) -// convert to DataFrame and create temporal table +// DataFrame으로 변환하고 임시 테이블을 테이블을 생성합니다. bank.toDF().registerTempTable("bank") ``` -### Data Retrieval +### 데이터 검색 -Suppose we want to see age distribution from `bank`. To do this, run: +bank의 나이 분포를 확인하려면, 아래를 실행합니다. ```sql %sql select age, count(1) from bank where age < 30 group by age order by age ``` -You can make input box for setting age condition by replacing `30` with `${maxAge=30}`. +`30`을 `${maxAge=30}`으로 대체해서 나이 조건을 설정하는 입력 상자를 만들 수 있습니다. ```sql %sql select age, count(1) from bank where age < ${maxAge=30} group by age order by age ``` -Now we want to see age distribution with certain marital status and add combo box to select marital status. Run: +혼인 여부를 포함한 나이 분포를 확인하고, 혼인 여부를 선택할 선택 박스를 추가하려면, 아래를 실행합니다.ㄴ ```sql %sql select age, count(1) from bank where marital="${marital=single,single|divorced|married}" group by age order by age ```
-## Tutorial with Streaming Data +## 스트리밍 데이터를 이용한 튜토리얼 -### Data Refine +### 데이터 정제 -Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script. - -This will create a RDD of `Tweet` objects and register these stream data as a table: +이 튜토리얼은 트위터의 샘플 트윗 스트림을 기반으로 하기때문에, 트위터 계정으로 인증이 되어야합니다. 인증하기 위해, [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup)을 참조합니다. API 키를 받은 후, 아래 스크립트에 자격 증명 관련 값(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`)을 API 키로 채워야합니다. +아래 스크립트는 Tweet 객체의 RDD를 생성하고, 스트림 데이터를 테이블로 등록합니다. ```scala import org.apache.spark.streaming._ @@ -95,7 +94,7 @@ import org.apache.log4j.Logger import org.apache.log4j.Level import sys.process.stringSeqToProcess -/** Configures the Oauth Credentials for accessing Twitter */ +/** 트위터에 접근하기 위한 Oauth 자격 증명을 구성 */ def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) { val configs = new HashMap[String, String] ++= Seq( "apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret) @@ -111,7 +110,7 @@ def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: println() } -// Configure Twitter credentials +// 트위터 자격증명 구성 val apiKey = "xxxxxxxxxxxxxxxxxxxxxxxxx" val apiSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" val accessToken = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" @@ -127,9 +126,9 @@ case class Tweet(createdAt:Long, text:String) twt.map(status=> Tweet(status.getCreatedAt().getTime()/1000, status.getText()) ).foreachRDD(rdd=> - // Below line works only in spark 1.3.0. - // For spark 1.1.x and spark 1.2.x, - // use rdd.registerTempTable("tweets") instead. + // B아래 코드는 spark 1.3.0에서만 작동합니다. + // spark 1.1.x and spark 1.2.x 에서는 + // rdd.registerTempTable("tweets") 을 사용해야합니다. rdd.toDF().registerAsTable("tweets") ) @@ -138,24 +137,24 @@ twt.print ssc.start() ``` -### Data Retrieval +### 데이터 검색 -For each following script, every time you click run button you will see different result since it is based on real-time data. +아래 각 스크립트는 실시간 데이터를 기반으로하기때문에 실행 버튼을 클릭할 때마다 다른 결과값을 출력합니다. -Let's begin by extracting maximum 10 tweets which contain the word **girl**. +단어 **girl**을 포함하는 최대 10개의 트윗을 추출해봅시다. ```sql %sql select * from tweets where text like '%girl%' limit 10 ``` -This time suppose we want to see how many tweets have been created per sec during last 60 sec. To do this, run: +지난 60초 동안 초당 얼마나 많은 트윗이 생성되었는지 확인해봅시다. ```sql %sql select createdAt, count(1) from tweets group by createdAt order by createdAt ``` -You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes( positive, negative, neutral ) towards the parameter. +또한, 사용자 정의 함수를 만들어서 스파크 SQL에서 사용할 수도 있습니다. `sentiment`라는 함수를 만들어서 연습해봅시다. 이 함수는 파라미터에 대하여 세 가지 속성(긍정, 부정, 중립) 중 하나를 반환합니다. ```scala def sentiment(s:String) : String = { @@ -184,14 +183,14 @@ def sentiment(s:String) : String = { "neutral" } -// Below line works only in spark 1.3.0. -// For spark 1.1.x and spark 1.2.x, -// use sqlc.registerFunction("sentiment", sentiment _) instead. +// 아래 코드는 spark 1.3.0에서만 작동합니다. +// spark 1.1.x and spark 1.2.x 에서는 +// sqlc.registerFunction("sentiment", sentiment _) 을 사용해야합니다. sqlc.udf.register("sentiment", sentiment _) ``` -To check how people think about girls using `sentiment` function we've made above, run this: +위에서 만든 `sentiment` 함수를 사용하여 사람들이 'girl'에 대해 어떻게 생각하는지 확인하기위해 아래를 실행합니다. ```sql %sql select sentiment(text), count(1) from tweets where text like '%girl%' group by sentiment(text) diff --git a/docs/quickstart/tutorial_en.md b/docs/quickstart/tutorial_en.md new file mode 100644 index 00000000000..4947f3ce8a0 --- /dev/null +++ b/docs/quickstart/tutorial_en.md @@ -0,0 +1,198 @@ +--- +layout: page +title: "Apache Zeppelin Tutorial" +description: "This tutorial page contains a short walk-through tutorial that uses Apache Spark backend. Please note that this tutorial is valid for Spark 1.3 and higher." +group: quickstart +--- + +{% include JB/setup %} + +# Zeppelin Tutorial + +
+ +This tutorial walks you through some of the fundamental Zeppelin concepts. We will assume you have already installed Zeppelin. If not, please see [here](../install/install.html) first. + +Current main backend processing engine of Zeppelin is [Apache Spark](https://spark.apache.org). If you're new to this system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin. + +## Tutorial with Local File + +### Data Refine + +Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip). + +First, to transform csv format data into RDD of `Bank` objects, run following script. This will also remove header using `filter` function. + +```scala + +val bankText = sc.textFile("yourPath/bank/bank-full.csv") + +case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer) + +// split each line, filter out header (starts with "age"), and map it into Bank case class +val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map( + s=>Bank(s(0).toInt, + s(1).replaceAll("\"", ""), + s(2).replaceAll("\"", ""), + s(3).replaceAll("\"", ""), + s(5).replaceAll("\"", "").toInt + ) +) + +// convert to DataFrame and create temporal table +bank.toDF().registerTempTable("bank") +``` + +### Data Retrieval + +Suppose we want to see age distribution from `bank`. To do this, run: + +```sql +%sql select age, count(1) from bank where age < 30 group by age order by age +``` + +You can make input box for setting age condition by replacing `30` with `${maxAge=30}`. + +```sql +%sql select age, count(1) from bank where age < ${maxAge=30} group by age order by age +``` + +Now we want to see age distribution with certain marital status and add combo box to select marital status. Run: + +```sql +%sql select age, count(1) from bank where marital="${marital=single,single|divorced|married}" group by age order by age +``` + +
+## Tutorial with Streaming Data + +### Data Refine + +Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script. + +This will create a RDD of `Tweet` objects and register these stream data as a table: + +```scala +import org.apache.spark.streaming._ +import org.apache.spark.streaming.twitter._ +import org.apache.spark.storage.StorageLevel +import scala.io.Source +import scala.collection.mutable.HashMap +import java.io.File +import org.apache.log4j.Logger +import org.apache.log4j.Level +import sys.process.stringSeqToProcess + +/** Configures the Oauth Credentials for accessing Twitter */ +def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) { + val configs = new HashMap[String, String] ++= Seq( + "apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret) + println("Configuring Twitter OAuth") + configs.foreach{ case(key, value) => + if (value.trim.isEmpty) { + throw new Exception("Error setting authentication - value for " + key + " not set") + } + val fullKey = "twitter4j.oauth." + key.replace("api", "consumer") + System.setProperty(fullKey, value.trim) + println("\tProperty " + fullKey + " set as [" + value.trim + "]") + } + println() +} + +// Configure Twitter credentials +val apiKey = "xxxxxxxxxxxxxxxxxxxxxxxxx" +val apiSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +val accessToken = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +val accessTokenSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret) + +import org.apache.spark.streaming.twitter._ +val ssc = new StreamingContext(sc, Seconds(2)) +val tweets = TwitterUtils.createStream(ssc, None) +val twt = tweets.window(Seconds(60)) + +case class Tweet(createdAt:Long, text:String) +twt.map(status=> + Tweet(status.getCreatedAt().getTime()/1000, status.getText()) +).foreachRDD(rdd=> + // Below line works only in spark 1.3.0. + // For spark 1.1.x and spark 1.2.x, + // use rdd.registerTempTable("tweets") instead. + rdd.toDF().registerAsTable("tweets") +) + +twt.print + +ssc.start() +``` + +### Data Retrieval + +For each following script, every time you click run button you will see different result since it is based on real-time data. + +Let's begin by extracting maximum 10 tweets which contain the word **girl**. + +```sql +%sql select * from tweets where text like '%girl%' limit 10 +``` + +This time suppose we want to see how many tweets have been created per sec during last 60 sec. To do this, run: + +```sql +%sql select createdAt, count(1) from tweets group by createdAt order by createdAt +``` + + +You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes( positive, negative, neutral ) towards the parameter. + +```scala +def sentiment(s:String) : String = { + val positive = Array("like", "love", "good", "great", "happy", "cool", "the", "one", "that") + val negative = Array("hate", "bad", "stupid", "is") + + var st = 0; + + val words = s.split(" ") + positive.foreach(p => + words.foreach(w => + if(p==w) st = st+1 + ) + ) + + negative.foreach(p=> + words.foreach(w=> + if(p==w) st = st-1 + ) + ) + if(st>0) + "positivie" + else if(st<0) + "negative" + else + "neutral" +} + +// Below line works only in spark 1.3.0. +// For spark 1.1.x and spark 1.2.x, +// use sqlc.registerFunction("sentiment", sentiment _) instead. +sqlc.udf.register("sentiment", sentiment _) + +``` + +To check how people think about girls using `sentiment` function we've made above, run this: + +```sql +%sql select sentiment(text), count(1) from tweets where text like '%girl%' group by sentiment(text) +``` \ No newline at end of file From b39c496aa9f2134662bc7e36a7c0778ac01f68e7 Mon Sep 17 00:00:00 2001 From: "DESKTOP-4GVV4F1\\gkes" Date: Thu, 29 Sep 2016 20:37:46 +0900 Subject: [PATCH 4/6] translate Zeppelin Tutorial Document to Korean --- docs/quickstart/tutorial.md | 63 ++++++----- docs/quickstart/tutorial_en.md | 198 +++++++++++++++++++++++++++++++++ 2 files changed, 229 insertions(+), 32 deletions(-) create mode 100644 docs/quickstart/tutorial_en.md diff --git a/docs/quickstart/tutorial.md b/docs/quickstart/tutorial.md index 4947f3ce8a0..c25f1aaf76e 100644 --- a/docs/quickstart/tutorial.md +++ b/docs/quickstart/tutorial.md @@ -19,21 +19,21 @@ limitations under the License. --> {% include JB/setup %} -# Zeppelin Tutorial +# 제플린 튜토리얼
-This tutorial walks you through some of the fundamental Zeppelin concepts. We will assume you have already installed Zeppelin. If not, please see [here](../install/install.html) first. +이 튜토리얼은 핵심 제플린 개념의 일부를 소개합니다. 튜토리얼에 들어가기 전에 제플린을 먼저 설치해야합니다.그렇지 않으면 [이 곳](../install/install.html)을 먼저 참조합니다. -Current main backend processing engine of Zeppelin is [Apache Spark](https://spark.apache.org). If you're new to this system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin. +현재 제플린의 주요 백엔드 처리 엔진은 [Apache Spark](https://spark.apache.org)입니다. 이 시스템이 처음이라면, 스파크가 제플린을 최대한 활용하기 위해 데이터를 어떻게 처리하는 지에 대한 방안을 가지고 시작하길 원할 것입니다. -## Tutorial with Local File +## 로컬 파일을 이용한 튜토리얼 -### Data Refine +### 데이터 정제 -Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip). +튜토리얼을 시작하기 전에, [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip)을 먼저 다운로드 받습니다. -First, to transform csv format data into RDD of `Bank` objects, run following script. This will also remove header using `filter` function. +우선, csv 형식 데이터를 Bank 객체의 RRD로 변환하기 위해 아래 스크립트를 실행한다. 또한 filter 함수를 사용해서 헤더를 제거합니다. ```scala @@ -41,7 +41,7 @@ val bankText = sc.textFile("yourPath/bank/bank-full.csv") case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer) -// split each line, filter out header (starts with "age"), and map it into Bank case class +// 각 라인을 분리하여 "age"로 시작하는 헤더를 걸러내고, 'Bank' case class로 매핑합니다. val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map( s=>Bank(s(0).toInt, s(1).replaceAll("\"", ""), @@ -51,38 +51,37 @@ val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map( ) ) -// convert to DataFrame and create temporal table +// DataFrame으로 변환하고 임시 테이블을 테이블을 생성합니다. bank.toDF().registerTempTable("bank") ``` -### Data Retrieval +### 데이터 검색 -Suppose we want to see age distribution from `bank`. To do this, run: +bank의 나이 분포를 확인하려면, 아래를 실행합니다. ```sql %sql select age, count(1) from bank where age < 30 group by age order by age ``` -You can make input box for setting age condition by replacing `30` with `${maxAge=30}`. +`30`을 `${maxAge=30}`으로 대체해서 나이 조건을 설정하는 입력 상자를 만들 수 있습니다. ```sql %sql select age, count(1) from bank where age < ${maxAge=30} group by age order by age ``` -Now we want to see age distribution with certain marital status and add combo box to select marital status. Run: +혼인 여부를 포함한 나이 분포를 확인하고, 혼인 여부를 선택할 선택 박스를 추가하려면, 아래를 실행합니다. ```sql %sql select age, count(1) from bank where marital="${marital=single,single|divorced|married}" group by age order by age ```
-## Tutorial with Streaming Data +## 스트리밍 데이터를 이용한 튜토리얼 -### Data Refine +### 데이터 정제 -Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script. - -This will create a RDD of `Tweet` objects and register these stream data as a table: +이 튜토리얼은 트위터의 샘플 트윗 스트림을 기반으로 하기때문에, 트위터 계정으로 인증이 되어야합니다. 인증하기 위해, [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup)을 참조합니다. API 키를 받은 후, 아래 스크립트에 자격 증명 관련 값(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`)을 API 키로 채워야합니다. +아래 스크립트는 Tweet 객체의 RDD를 생성하고, 스트림 데이터를 테이블로 등록합니다. ```scala import org.apache.spark.streaming._ @@ -95,7 +94,7 @@ import org.apache.log4j.Logger import org.apache.log4j.Level import sys.process.stringSeqToProcess -/** Configures the Oauth Credentials for accessing Twitter */ +/** 트위터에 접근하기 위한 Oauth 자격 증명을 구성 */ def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) { val configs = new HashMap[String, String] ++= Seq( "apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret) @@ -111,7 +110,7 @@ def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: println() } -// Configure Twitter credentials +// 트위터 자격증명 구성 val apiKey = "xxxxxxxxxxxxxxxxxxxxxxxxx" val apiSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" val accessToken = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" @@ -127,9 +126,9 @@ case class Tweet(createdAt:Long, text:String) twt.map(status=> Tweet(status.getCreatedAt().getTime()/1000, status.getText()) ).foreachRDD(rdd=> - // Below line works only in spark 1.3.0. - // For spark 1.1.x and spark 1.2.x, - // use rdd.registerTempTable("tweets") instead. + // B아래 코드는 spark 1.3.0에서만 작동합니다. + // spark 1.1.x and spark 1.2.x 에서는 + // rdd.registerTempTable("tweets") 을 사용해야합니다. rdd.toDF().registerAsTable("tweets") ) @@ -138,24 +137,24 @@ twt.print ssc.start() ``` -### Data Retrieval +### 데이터 검색 -For each following script, every time you click run button you will see different result since it is based on real-time data. +아래 각 스크립트는 실시간 데이터를 기반으로하기때문에 실행 버튼을 클릭할 때마다 다른 결과값을 출력합니다. -Let's begin by extracting maximum 10 tweets which contain the word **girl**. +단어 **girl**을 포함하는 최대 10개의 트윗을 추출해봅시다. ```sql %sql select * from tweets where text like '%girl%' limit 10 ``` -This time suppose we want to see how many tweets have been created per sec during last 60 sec. To do this, run: +지난 60초 동안 초당 얼마나 많은 트윗이 생성되었는지 확인해봅시다. ```sql %sql select createdAt, count(1) from tweets group by createdAt order by createdAt ``` -You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes( positive, negative, neutral ) towards the parameter. +또한, 사용자 정의 함수를 만들어서 스파크 SQL에서 사용할 수도 있습니다. `sentiment`라는 함수를 만들어서 연습해봅시다. 이 함수는 파라미터에 대하여 세 가지 속성(긍정, 부정, 중립) 중 하나를 반환합니다. ```scala def sentiment(s:String) : String = { @@ -184,14 +183,14 @@ def sentiment(s:String) : String = { "neutral" } -// Below line works only in spark 1.3.0. -// For spark 1.1.x and spark 1.2.x, -// use sqlc.registerFunction("sentiment", sentiment _) instead. +// 아래 코드는 spark 1.3.0에서만 작동합니다. +// spark 1.1.x and spark 1.2.x 에서는 +// sqlc.registerFunction("sentiment", sentiment _) 을 사용해야합니다. sqlc.udf.register("sentiment", sentiment _) ``` -To check how people think about girls using `sentiment` function we've made above, run this: +위에서 만든 `sentiment` 함수를 사용하여 사람들이 'girl'에 대해 어떻게 생각하는지 확인하기위해 아래를 실행합니다. ```sql %sql select sentiment(text), count(1) from tweets where text like '%girl%' group by sentiment(text) diff --git a/docs/quickstart/tutorial_en.md b/docs/quickstart/tutorial_en.md new file mode 100644 index 00000000000..4947f3ce8a0 --- /dev/null +++ b/docs/quickstart/tutorial_en.md @@ -0,0 +1,198 @@ +--- +layout: page +title: "Apache Zeppelin Tutorial" +description: "This tutorial page contains a short walk-through tutorial that uses Apache Spark backend. Please note that this tutorial is valid for Spark 1.3 and higher." +group: quickstart +--- + +{% include JB/setup %} + +# Zeppelin Tutorial + +
+ +This tutorial walks you through some of the fundamental Zeppelin concepts. We will assume you have already installed Zeppelin. If not, please see [here](../install/install.html) first. + +Current main backend processing engine of Zeppelin is [Apache Spark](https://spark.apache.org). If you're new to this system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin. + +## Tutorial with Local File + +### Data Refine + +Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip). + +First, to transform csv format data into RDD of `Bank` objects, run following script. This will also remove header using `filter` function. + +```scala + +val bankText = sc.textFile("yourPath/bank/bank-full.csv") + +case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer) + +// split each line, filter out header (starts with "age"), and map it into Bank case class +val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map( + s=>Bank(s(0).toInt, + s(1).replaceAll("\"", ""), + s(2).replaceAll("\"", ""), + s(3).replaceAll("\"", ""), + s(5).replaceAll("\"", "").toInt + ) +) + +// convert to DataFrame and create temporal table +bank.toDF().registerTempTable("bank") +``` + +### Data Retrieval + +Suppose we want to see age distribution from `bank`. To do this, run: + +```sql +%sql select age, count(1) from bank where age < 30 group by age order by age +``` + +You can make input box for setting age condition by replacing `30` with `${maxAge=30}`. + +```sql +%sql select age, count(1) from bank where age < ${maxAge=30} group by age order by age +``` + +Now we want to see age distribution with certain marital status and add combo box to select marital status. Run: + +```sql +%sql select age, count(1) from bank where marital="${marital=single,single|divorced|married}" group by age order by age +``` + +
+## Tutorial with Streaming Data + +### Data Refine + +Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script. + +This will create a RDD of `Tweet` objects and register these stream data as a table: + +```scala +import org.apache.spark.streaming._ +import org.apache.spark.streaming.twitter._ +import org.apache.spark.storage.StorageLevel +import scala.io.Source +import scala.collection.mutable.HashMap +import java.io.File +import org.apache.log4j.Logger +import org.apache.log4j.Level +import sys.process.stringSeqToProcess + +/** Configures the Oauth Credentials for accessing Twitter */ +def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) { + val configs = new HashMap[String, String] ++= Seq( + "apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret) + println("Configuring Twitter OAuth") + configs.foreach{ case(key, value) => + if (value.trim.isEmpty) { + throw new Exception("Error setting authentication - value for " + key + " not set") + } + val fullKey = "twitter4j.oauth." + key.replace("api", "consumer") + System.setProperty(fullKey, value.trim) + println("\tProperty " + fullKey + " set as [" + value.trim + "]") + } + println() +} + +// Configure Twitter credentials +val apiKey = "xxxxxxxxxxxxxxxxxxxxxxxxx" +val apiSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +val accessToken = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +val accessTokenSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret) + +import org.apache.spark.streaming.twitter._ +val ssc = new StreamingContext(sc, Seconds(2)) +val tweets = TwitterUtils.createStream(ssc, None) +val twt = tweets.window(Seconds(60)) + +case class Tweet(createdAt:Long, text:String) +twt.map(status=> + Tweet(status.getCreatedAt().getTime()/1000, status.getText()) +).foreachRDD(rdd=> + // Below line works only in spark 1.3.0. + // For spark 1.1.x and spark 1.2.x, + // use rdd.registerTempTable("tweets") instead. + rdd.toDF().registerAsTable("tweets") +) + +twt.print + +ssc.start() +``` + +### Data Retrieval + +For each following script, every time you click run button you will see different result since it is based on real-time data. + +Let's begin by extracting maximum 10 tweets which contain the word **girl**. + +```sql +%sql select * from tweets where text like '%girl%' limit 10 +``` + +This time suppose we want to see how many tweets have been created per sec during last 60 sec. To do this, run: + +```sql +%sql select createdAt, count(1) from tweets group by createdAt order by createdAt +``` + + +You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes( positive, negative, neutral ) towards the parameter. + +```scala +def sentiment(s:String) : String = { + val positive = Array("like", "love", "good", "great", "happy", "cool", "the", "one", "that") + val negative = Array("hate", "bad", "stupid", "is") + + var st = 0; + + val words = s.split(" ") + positive.foreach(p => + words.foreach(w => + if(p==w) st = st+1 + ) + ) + + negative.foreach(p=> + words.foreach(w=> + if(p==w) st = st-1 + ) + ) + if(st>0) + "positivie" + else if(st<0) + "negative" + else + "neutral" +} + +// Below line works only in spark 1.3.0. +// For spark 1.1.x and spark 1.2.x, +// use sqlc.registerFunction("sentiment", sentiment _) instead. +sqlc.udf.register("sentiment", sentiment _) + +``` + +To check how people think about girls using `sentiment` function we've made above, run this: + +```sql +%sql select sentiment(text), count(1) from tweets where text like '%girl%' group by sentiment(text) +``` \ No newline at end of file From 22f2e4fca8208cbd789884ef2896187a16c6227b Mon Sep 17 00:00:00 2001 From: "DESKTOP-4GVV4F1\\gkes" Date: Fri, 30 Sep 2016 00:56:05 +0900 Subject: [PATCH 5/6] transrate 'What is Apache Zeppelin?' document to Korean --- docs/index.md | 174 ++++++++++++++++++++++---------------------- docs/index_en.md | 184 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 272 insertions(+), 86 deletions(-) create mode 100644 docs/index_en.md diff --git a/docs/index.md b/docs/index.md index 8c2ce95cc95..8845da2c0ee 100644 --- a/docs/index.md +++ b/docs/index.md @@ -21,15 +21,15 @@ limitations under the License.
-

Multi-purpose Notebook

+

다목적 Notebook

- The Notebook is the place for all your needs + 이 Notebook은 당신이 원하는 모든 것을 위한 장소입니다.

    -
  • Data Ingestion
  • -
  • Data Discovery
  • -
  • Data Analytics
  • -
  • Data Visualization & Collaboration
  • +
  • 데이터 처리
  • +
  • 데이터 검색
  • +
  • 데이터 분석
  • +
  • 데이터 시각화 & 협업
@@ -38,31 +38,32 @@ limitations under the License.

-## Multiple Language Backend -[Apache Zeppelin interpreter](./manual/interpreters.html) concept allows any language/data-processing-backend to be plugged into Zeppelin. -Currently Apache Zeppelin supports many interpreters such as Apache Spark, Python, JDBC, Markdown and Shell. +## 다양한 언어의 백엔드 +[아파치 제플린 인터프리터](./manual/interpreters.html) 구성은 어떤 백엔드 데이터 처리 언어도 제플린에 연결될 수 있도록 합니다. +현재 아파치 제플린은 아파치 스파크, 파이썬, JDBC, 마크다운, 쉘과 같은 많은 인터프리터를 지원합니다. -Adding new language-backend is really simple. Learn [how to create your own interpreter](./development/writingzeppelininterpreter.html#make-your-own-interpreter). +새로운 백엔드 언어를 추가하는 것은 아주 쉽습니다. [당신만의 인터프리터를 만드는 방법](./development/writingzeppelininterpreter.html#make-your-own-interpreter)을 알아봅시다. -#### Apache Spark integration -Especially, Apache Zeppelin provides built-in [Apache Spark](http://spark.apache.org/) integration. You don't need to build a separate module, plugin or library for it. +#### 아파치 스파크 통합 +아파치 제플린은 특별히 내장 [아파치 스파크](http://spark.apache.org/) 통합을 제공합니다. 그래서 별도의 모듈이나 플러그인을 구성할 필요가 없습니다. -Apache Zeppelin with Spark integration provides +스파크와 통합된 아파치 제플린은 아래 항목을 제공합니다. -- Automatic SparkContext and SQLContext injection -- Runtime jar dependency loading from local filesystem or maven repository. Learn more about [dependency loader](./interpreter/spark.html#dependencyloading). -- Canceling job and displaying its progress +- SparkContext와 SQLContext 자동 주입 +- maven 저장소나 로컬 파일 시스템으로부터 runtime jar dependency를 로딩. [dependency loader](./interpreter/spark.html#dependencyloading)에 대해 배워봅시다. +- job 취소 및 진행 상황 출력 -For the further information about Apache Spark in Apache Zeppelin, please see [Spark interpreter for Apache Zeppelin](./interpreter/spark.html). +아파치 제플린의 아파치 스파크에 대한 더 많은 정보는 [아파치 제플린에 대한 스파크 인터프리터](./interpreter/spark.html)를 참조할 수 있습니다.
-## Data visualization +## 데이터 시각화 + +아파치 제플린에는 기본적인 몇 가지 차트가 포함되어 있습니다. 시각화는 스파크 SQL 쿼리에 제한되어 있지 않으며, 모든 백엔드 언어로부터의 모든 결과물은 시각회될 수 있습니다. -Some basic charts are already included in Apache Zeppelin. Visualizations are not limited to Spark SQL query, any output from any language backend can be recognized and visualized.
@@ -73,9 +74,9 @@ Some basic charts are already included in Apache Zeppelin. Visualizations are no
-### Pivot chart +### 피벗 차트 -Apache Zeppelin aggregates values and displays them in pivot chart with simple drag and drop. You can easily create chart with multiple aggregated values including sum, count, average, min, max. +아파치 제플린은 간단하게 드래그 앤 드롭으로 값을 종합하고 피벗 차트에 출력합니다. 총합, 카운트, 평균, 최소, 최대를 포함한 다양한 집계 값을 쉽게 차트로 만들 수 있습니다.
@@ -83,22 +84,23 @@ Apache Zeppelin aggregates values and displays them in pivot chart with simple d
-Learn more about [display systems](#display-system) in Apache Zeppelin. +아파치 제플린의 [출력 시스템](#display-system)을 알아봅시다.
-## Dynamic forms +## 동적 양식 + +아파치 제플린으로 notebook에서 몇 가지 입력 양식을 동적으로 생성할 수 있습니다. -Apache Zeppelin can dynamically create some input forms in your notebook.
-Learn more about [Dynamic Forms](./manual/dynamicform.html). +[동적 양식](./manual/dynamicform.html)에 대해 더 알아봅시다.
-## Collaborate by sharing your Notebook & Paragraph -Your notebook URL can be shared among collaborators. Then Apache Zeppelin will broadcast any changes in realtime, just like the collaboration in Google docs. +## Notebook & Paragraph 공유를 통한 협업 +공동 협력자들 사이에 notebook URL을 공유할 수 있습니다. 그러면 아파치 제플린은 구글 문서 도구로 협업하는 것처럼 실시간으로 변경 사항을 방송합니다.
@@ -106,79 +108,79 @@ Your notebook URL can be shared among collaborators. Then Apache Zeppelin will b
-Apache Zeppelin provides an URL to display the result only, that page does not include any menus and buttons inside of notebooks. -You can easily embed it as an iframe inside of your website in this way. -If you want to learn more about this feature, please visit [this page](./manual/publish.html). +아파치 제플린은 notebooks 안의 버튼이나 메뉴를 포함하지 않는 결과 페이지만 출력하는 URL을 제공합니다. +이 방법으로 웹 사이트의 iframe으로 쉽게 삽입할 수 있습니다. +이 기능에 대해 더 알고싶으면 [이 페이지](./manual/publish.html)를 참조할 수 있습니다.
-## 100% Opensource +## 100% 오픈소스 -Apache Zeppelin is Apache2 Licensed software. Please check out the [source repository](http://git.apache.org/zeppelin.git) and [how to contribute](https://zeppelin.apache.org/contribution/contributions.html). -Apache Zeppelin has a very active development community. -Join to our [Mailing list](https://zeppelin.apache.org/community.html) and report issues on [Jira Issue tracker](https://issues.apache.org/jira/browse/ZEPPELIN). +아파치 제플린은 아파치2 라이센스 소프트웨어입니다. [소스 저장소](http://git.apache.org/zeppelin.git)와 [기여하는 방법](https://zeppelin.apache.org/contribution/contributions.html)을 확인할 수 있습니다. +아파치 제플린 개발자 커뮤니티는 매우 활동적입니다. +[메일링 리스트](https://zeppelin.apache.org/community.html)에 가입하고 [Jira Issue tracker](https://issues.apache.org/jira/browse/ZEPPELIN)에 이슈를 보고합니다. -## What is the next ? +## 다음은 무엇입니까? -####Quick Start +####빠른 시작 -* Getting Started - * [Quick Start](./install/install.html) for basic instructions on installing Apache Zeppelin - * [Configuration](./install/install.html#apache-zeppelin-configuration) lists for Apache Zeppelin - * [Explore Apache Zeppelin UI](./quickstart/explorezeppelinui.html): basic components of Apache Zeppelin home - * [Tutorial](./quickstart/tutorial.html): a short walk-through tutorial that uses Apache Spark backend -* Basic Feature Guide - * [Dynamic Form](./manual/dynamicform.html): a step by step guide for creating dynamic forms - * [Publish your Paragraph](./manual/publish.html) results into your external website - * [Customize Zeppelin Homepage](./manual/notebookashomepage.html) with one of your notebooks -* More - * [Upgrade Apache Zeppelin Version](./install/upgrade.html): a manual procedure of upgrading Apache Zeppelin version +* 시작하기 + * 아파치 제플린 [설치](./install/install.html) 기본 지침 + * 아파치 제플린을 위한 [구성](./install/install.html#apache-zeppelin-configuration) 목록 + * [아파치 제플린 UI 경험](./quickstart/explorezeppelinui.html): 아파치 제플린 홈의 기본 컴포넌트 + * [튜토리얼](./quickstart/tutorial.html): 아파치 스파크 백엔드를 사용한 간단한 튜토리얼 +* 기본 기능 가이드 + * [동적 양식](./manual/dynamicform.html): 동적 양식을 만드는 단계별 가이드 + * 외부 웹 사이트에 [Paragraph 결과 공개](./manual/publish.html) + * 너의 notebooks 중 하나로 [제플린 홈페이지 꾸미기](./manual/notebookashomepage.html) +* 더 많은 정보 + * [제플린 버전 업그레이드](./install/upgrade.html): 아파치 제플린을 수동으로 업그레이드하는 방법 -####Interpreter +####인터프리터 -* [Interpreters in Apache Zeppelin](./manual/interpreters.html): what is interpreter group? how can you set interpreters in Apache Zeppelin? -* Usage - * [Interpreter Installation](./manual/interpreterinstallation.html): Install not only community managed interpreters but also 3rd party interpreters - * [Interpreter Dependency Management](./manual/dependencymanagement.html) when you include external libraries to interpreter -* Available Interpreters: currently, about 20 interpreters are available in Apache Zeppelin. +* [아파치 제플린의 인터프리터](./manual/interpreters.html): 인터프리터 그룹은 무엇인가? 아파치 제플린에 어떻게 인터프리터를 설정할 수 있는가? +* 사용법 + * [인터프리터 설치](./manual/interpreterinstallation.html): 커뮤니티에서 관리하는 인터프리터를 비롯하여 제3 인터프리터 설치 + * 인터프리터에 외부 라이브러리를 포함시킬 때 [인터프리터 종속성 관리](./manual/dependencymanagement.html) +* 사용가능한 인터프리터: 현재 아파치 제플린에서는 약 20 개의 인터프리터를 사용할 수 있습니다. -####Display System +####출력 시스템 -* Basic Display System: [Text](./displaysystem/basicdisplaysystem.html#text), [HTML](./displaysystem/basicdisplaysystem.html#html), [Table](./displaysystem/basicdisplaysystem.html#table) is available -* Angular API: a description about avilable backend and frontend AngularJS API with examples +* 기본 출력 시스템: [텍스트](./displaysystem/basicdisplaysystem.html#text), [HTML](./displaysystem/basicdisplaysystem.html#html), [테이블](./displaysystem/basicdisplaysystem.html#table)을 사용할 수 있습니다. +* Angular API: 백엔드와 프론트엔드 AngularJS API에 대한 설명과 예제 * [Angular (backend API)](./displaysystem/back-end-angular.html) * [Angular (frontend API)](./displaysystem/front-end-angular.html) -####More +####더 많은 정보 -* Notebook Storage: a guide about saving notebooks to external storage - * [Git Storage](./storage/storage.html#notebook-storage-in-local-git-repository) - * [S3 Storage](./storage/storage.html#notebook-storage-in-s3) - * [Azure Storage](./storage/storage.html#notebook-storage-in-azure) - * [ZeppelinHub Storage](./storage/storage.html#storage-in-zeppelinhub) -* REST API: available REST API list in Apache Zeppelin - * [Interpreter API](./rest-api/rest-interpreter.html) +* Notebook 저장소: 외부 저장소에 notebooks을 저장하는 방법 + * [Git 저장소](./storage/storage.html#notebook-storage-in-local-git-repository) + * [S3 저장소](./storage/storage.html#notebook-storage-in-s3) + * [Azure 저장소](./storage/storage.html#notebook-storage-in-azure) + * [ZeppelinHub 저장소](./storage/storage.html#storage-in-zeppelinhub) +* REST API: 아파치 제플린에서 사용할 수 있는 REST API 목록 + * [인터프리터 API](./rest-api/rest-interpreter.html) * [Notebook API](./rest-api/rest-notebook.html) - * [Configuration API](./rest-api/rest-configuration.html) - * [Credential API](./rest-api/rest-credential.html) -* Security: available security support in Apache Zeppelin - * [Authentication for NGINX](./security/authentication.html) - * [Shiro Authentication](./security/shiroauthentication.html) - * [Notebook Authorization](./security/notebook_authorization.html) - * [Data Source Authorization](./security/datasource_authorization.html) -* Advanced - * [Apache Zeppelin on Vagrant VM](./install/virtual_machine.html) - * [Zeppelin on Spark Cluster Mode (Standalone via Docker)](./install/spark_cluster_mode.html#spark-standalone-mode) - * [Zeppelin on Spark Cluster Mode (YARN via Docker)](./install/spark_cluster_mode.html#spark-on-yarn-mode) - * [Zeppelin on Spark Cluster Mode (Mesos via Docker)](./install/spark_cluster_mode.html#spark-on-mesos-mode) -* Contribute - * [Writing Zeppelin Interpreter](./development/writingzeppelininterpreter.html) - * [Writing Zeppelin Application (Experimental)](./development/writingzeppelinapplication.html) - * [How to contribute (code)](./development/howtocontribute.html) - * [How to contribute (documentation website)](./development/howtocontributewebsite.html) - -#### External Resources - * [Mailing List](https://zeppelin.apache.org/community.html) - * [Apache Zeppelin Wiki](https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Home) - * [StackOverflow tag `apache-zeppelin`](http://stackoverflow.com/questions/tagged/apache-zeppelin) + * [구성 API](./rest-api/rest-configuration.html) + * [자격 증명 API](./rest-api/rest-credential.html) +* 보안: 아파치 제플린에서 사용할 수 있는 보안 지원 + * [NGINX를 사용한 인증](./security/authentication.html) + * [Shiro 인증](./security/shiroauthentication.html) + * [Notebook 인증](./security/notebook_authorization.html) + * [데이터 소스 인증](./security/datasource_authorization.html) +* 고급 + * [Vagrant VM에서 아파치 제플린](./install/virtual_machine.html) + * [스파크 클러스터 모드에서 제플린 (Docker를 통한 Standalone)](./install/spark_cluster_mode.html#spark-standalone-mode) + * [스파크 클러스터 모드에서 제플린 (Docker를 통한 YARN)](./install/spark_cluster_mode.html#spark-on-yarn-mode) + * [스파크 클러스터 모드에서 제플린 (Docker를 통한 Mesos)](./install/spark_cluster_mode.html#spark-on-mesos-mode) +* 기여하기 + * [제플린 인터프리터 작성](./development/writingzeppelininterpreter.html) + * [제플린 어플리케이션 작성 (실습)](./development/writingzeppelinapplication.html) + * [기여하는 방법 (코드)](./development/howtocontribute.html) + * [기여하는 방법 (웹 사이트 문서)](./development/howtocontributewebsite.html) + +#### 외부 리소스 + * [메일링 리스트](https://zeppelin.apache.org/community.html) + * [아파치 제플린 위키](https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Home) + * [StackOverflow 태그 `apache-zeppelin`](http://stackoverflow.com/questions/tagged/apache-zeppelin) diff --git a/docs/index_en.md b/docs/index_en.md new file mode 100644 index 00000000000..8c2ce95cc95 --- /dev/null +++ b/docs/index_en.md @@ -0,0 +1,184 @@ +--- +layout: page +title: +description: +group: +--- + +{% include JB/setup %} +
+
+
+

Multi-purpose Notebook

+

+ The Notebook is the place for all your needs +

+
    +
  • Data Ingestion
  • +
  • Data Discovery
  • +
  • Data Analytics
  • +
  • Data Visualization & Collaboration
  • +
+
+
+ +
+
+ +
+## Multiple Language Backend +[Apache Zeppelin interpreter](./manual/interpreters.html) concept allows any language/data-processing-backend to be plugged into Zeppelin. +Currently Apache Zeppelin supports many interpreters such as Apache Spark, Python, JDBC, Markdown and Shell. + + + +Adding new language-backend is really simple. Learn [how to create your own interpreter](./development/writingzeppelininterpreter.html#make-your-own-interpreter). + +#### Apache Spark integration +Especially, Apache Zeppelin provides built-in [Apache Spark](http://spark.apache.org/) integration. You don't need to build a separate module, plugin or library for it. + + + +Apache Zeppelin with Spark integration provides + +- Automatic SparkContext and SQLContext injection +- Runtime jar dependency loading from local filesystem or maven repository. Learn more about [dependency loader](./interpreter/spark.html#dependencyloading). +- Canceling job and displaying its progress + +For the further information about Apache Spark in Apache Zeppelin, please see [Spark interpreter for Apache Zeppelin](./interpreter/spark.html). + +
+## Data visualization + +Some basic charts are already included in Apache Zeppelin. Visualizations are not limited to Spark SQL query, any output from any language backend can be recognized and visualized. + +
+
+ +
+
+ +
+
+ +### Pivot chart + +Apache Zeppelin aggregates values and displays them in pivot chart with simple drag and drop. You can easily create chart with multiple aggregated values including sum, count, average, min, max. + +
+
+ +
+
+ +Learn more about [display systems](#display-system) in Apache Zeppelin. + +
+## Dynamic forms + +Apache Zeppelin can dynamically create some input forms in your notebook. +
+
+ +
+
+Learn more about [Dynamic Forms](./manual/dynamicform.html). + +
+## Collaborate by sharing your Notebook & Paragraph +Your notebook URL can be shared among collaborators. Then Apache Zeppelin will broadcast any changes in realtime, just like the collaboration in Google docs. + +
+
+ +
+
+ +Apache Zeppelin provides an URL to display the result only, that page does not include any menus and buttons inside of notebooks. +You can easily embed it as an iframe inside of your website in this way. +If you want to learn more about this feature, please visit [this page](./manual/publish.html). + +
+## 100% Opensource + + + +Apache Zeppelin is Apache2 Licensed software. Please check out the [source repository](http://git.apache.org/zeppelin.git) and [how to contribute](https://zeppelin.apache.org/contribution/contributions.html). +Apache Zeppelin has a very active development community. +Join to our [Mailing list](https://zeppelin.apache.org/community.html) and report issues on [Jira Issue tracker](https://issues.apache.org/jira/browse/ZEPPELIN). + +## What is the next ? + +####Quick Start + +* Getting Started + * [Quick Start](./install/install.html) for basic instructions on installing Apache Zeppelin + * [Configuration](./install/install.html#apache-zeppelin-configuration) lists for Apache Zeppelin + * [Explore Apache Zeppelin UI](./quickstart/explorezeppelinui.html): basic components of Apache Zeppelin home + * [Tutorial](./quickstart/tutorial.html): a short walk-through tutorial that uses Apache Spark backend +* Basic Feature Guide + * [Dynamic Form](./manual/dynamicform.html): a step by step guide for creating dynamic forms + * [Publish your Paragraph](./manual/publish.html) results into your external website + * [Customize Zeppelin Homepage](./manual/notebookashomepage.html) with one of your notebooks +* More + * [Upgrade Apache Zeppelin Version](./install/upgrade.html): a manual procedure of upgrading Apache Zeppelin version + +####Interpreter + +* [Interpreters in Apache Zeppelin](./manual/interpreters.html): what is interpreter group? how can you set interpreters in Apache Zeppelin? +* Usage + * [Interpreter Installation](./manual/interpreterinstallation.html): Install not only community managed interpreters but also 3rd party interpreters + * [Interpreter Dependency Management](./manual/dependencymanagement.html) when you include external libraries to interpreter +* Available Interpreters: currently, about 20 interpreters are available in Apache Zeppelin. + +####Display System + +* Basic Display System: [Text](./displaysystem/basicdisplaysystem.html#text), [HTML](./displaysystem/basicdisplaysystem.html#html), [Table](./displaysystem/basicdisplaysystem.html#table) is available +* Angular API: a description about avilable backend and frontend AngularJS API with examples + * [Angular (backend API)](./displaysystem/back-end-angular.html) + * [Angular (frontend API)](./displaysystem/front-end-angular.html) + +####More + +* Notebook Storage: a guide about saving notebooks to external storage + * [Git Storage](./storage/storage.html#notebook-storage-in-local-git-repository) + * [S3 Storage](./storage/storage.html#notebook-storage-in-s3) + * [Azure Storage](./storage/storage.html#notebook-storage-in-azure) + * [ZeppelinHub Storage](./storage/storage.html#storage-in-zeppelinhub) +* REST API: available REST API list in Apache Zeppelin + * [Interpreter API](./rest-api/rest-interpreter.html) + * [Notebook API](./rest-api/rest-notebook.html) + * [Configuration API](./rest-api/rest-configuration.html) + * [Credential API](./rest-api/rest-credential.html) +* Security: available security support in Apache Zeppelin + * [Authentication for NGINX](./security/authentication.html) + * [Shiro Authentication](./security/shiroauthentication.html) + * [Notebook Authorization](./security/notebook_authorization.html) + * [Data Source Authorization](./security/datasource_authorization.html) +* Advanced + * [Apache Zeppelin on Vagrant VM](./install/virtual_machine.html) + * [Zeppelin on Spark Cluster Mode (Standalone via Docker)](./install/spark_cluster_mode.html#spark-standalone-mode) + * [Zeppelin on Spark Cluster Mode (YARN via Docker)](./install/spark_cluster_mode.html#spark-on-yarn-mode) + * [Zeppelin on Spark Cluster Mode (Mesos via Docker)](./install/spark_cluster_mode.html#spark-on-mesos-mode) +* Contribute + * [Writing Zeppelin Interpreter](./development/writingzeppelininterpreter.html) + * [Writing Zeppelin Application (Experimental)](./development/writingzeppelinapplication.html) + * [How to contribute (code)](./development/howtocontribute.html) + * [How to contribute (documentation website)](./development/howtocontributewebsite.html) + +#### External Resources + * [Mailing List](https://zeppelin.apache.org/community.html) + * [Apache Zeppelin Wiki](https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Home) + * [StackOverflow tag `apache-zeppelin`](http://stackoverflow.com/questions/tagged/apache-zeppelin) From 19c85cdfc3082a6cb8087f7045dfe29add08c80e Mon Sep 17 00:00:00 2001 From: "DESKTOP-4GVV4F1\\gkes" Date: Sat, 1 Oct 2016 00:58:03 +0900 Subject: [PATCH 6/6] transrate Install document to Korean --- docs/install/install.md | 150 ++++++++++++++++++++-------------------- 1 file changed, 75 insertions(+), 75 deletions(-) diff --git a/docs/install/install.md b/docs/install/install.md index 4d81fe5ce7f..a886a9be514 100644 --- a/docs/install/install.md +++ b/docs/install/install.md @@ -19,20 +19,20 @@ limitations under the License. --> {% include JB/setup %} -# Quick Start -Welcome to your first trial to explore Apache Zeppelin! -This page will help you to get started and here is the list of topics covered. +# 빠른 시작 +아파치 제플린을 탐험하는 첫번째 관문에 오신 것을 환영합니다! +이 페이지는 제플린을 시작하는 데 도움일 주고, 앞으로 다룰 주제의 목록입니다.
-## Installation +## 설치 -Apache Zeppelin officially supports and is tested on next environments. +아파치 제플린은 공식적으로 지원하고, 다음 환경에서 테스트 되었습니다. - - + + @@ -44,26 +44,26 @@ Apache Zeppelin officially supports and is tested on next environments.
NameValue이름
Oracle JDK
-There are two options to install Apache Zeppelin on your machine. One is [downloading pre-built binary package](#downloading-binary-package) from the archive. -You can download not only the latest stable version but also the older one if you need. -The other option is [building from the source](#building-from-source). -Although it can be unstable somehow since it is on development status, you can explore newly added feature and change it as you want. +아파치 제플린을 설치하기 위한 두 가지 옵션이 있습니다. 첫번째는 아카이브에서 [사전 구축된 바이너리 패키지 다운로드](#downloading-binary-package)해야합니다. +최신 안정화 버전과 필요하다면 더 이전 버전을 다운로드 받을 수 있습니다. +두번째는 [소스 빌드](#building-from-source)입니다. +개발 상태이기떄문에 불안정할 수 있지만, 새로 추가된 기능을 체험할 수 있고, 원한다면 바꿀 수도 있습니다. -### Downloading Binary Package +### 바이너리 패키지 다운로드 -If you want to install Apache Zeppelin with a stable binary package, please visit [Apache Zeppelin download Page](http://zeppelin.apache.org/download.html). +안정화된 바이너리 패키지를 설치하고 싶으면 [아파치 제플린 다운로드 페이지](http://zeppelin.apache.org/download.html)에서 다운받을 수 있습니다. -If you have downloaded `netinst` binary, [install additional interpreters](../manual/interpreterinstallation.html) before you start Zeppelin. Or simply run `./bin/install-interpreter.sh --all`. +`netinst` binary를 다운로드 받으면, 제플린을 시작하기 전에 [추가 인터프리터 설치](../manual/interpreterinstallation.html)가 필요합니다. 혹은 간단히 `./bin/install-interpreter.sh --all`로 설치할 수 있습니다. -After unpacking, jump to [Starting Apache Zeppelin with Command Line](#starting-apache-zeppelin-with-command-line) section. +압축 해제 후, [커맨드 라인으로 아파치 제플린 시작하기](#starting-apache-zeppelin-with-command-line) 섹션으로 이동합니다. -### Building from Source -If you want to build from the source, the software below needs to be installed on your system. +### 소스 빌드 +소스로 빌드하고싶다면, 시스템에 아래 요구사항이 만족되어야합니다. - - + + @@ -75,22 +75,22 @@ If you want to build from the source, the software below needs to be installed o
NameValue이름
Git
-If you don't have it installed yet, please check [Before Build](https://github.com/apache/zeppelin/blob/master/README.md#before-build) section and follow step by step instructions from there. +아직 설치되지 않았다면, [빌드 사전 작업](https://github.com/apache/zeppelin/blob/master/README.md#before-build) 섹션의 단계별 지침을 따라야합니다. -####1. Clone Apache Zeppelin repository +####1. 아파치 제플린 저장소 복제 ``` git clone https://github.com/apache/zeppelin.git ``` -####2. Build source with options -Each interpreters requires different build options. For the further information about options, please see [Build](https://github.com/apache/zeppelin#build) section. +####2. 옵션으로 소스 빌드 +각 인터프리터는 다른 빌드 환경이 필요하다. 옵션에 대한 더 자세한 정보는 [빌드](https://github.com/apache/zeppelin#build) 섹션을 참고할 수 있다. ``` mvn clean package -DskipTests [Options] ``` -Here are some examples with several options +몇 가지 옵션을 이용한 몇 가지 예제입니다. ``` # build with spark-2.0, scala-2.11 @@ -110,37 +110,37 @@ mvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pven mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests ``` -For the further information about building with source, please see [README.md](https://github.com/apache/zeppelin/blob/master/README.md) in Zeppelin repository. +소스로 빌드하는 더 자세한 정보는 제플린 저장소의 [README.md](https://github.com/apache/zeppelin/blob/master/README.md)를 확인하세요. -## Starting Apache Zeppelin with Command Line -#### Start Zeppelin +## 커맨드 라인으로 아파치 제플린 시작하기 +#### 제플린 시작 ``` bin/zeppelin-daemon.sh start ``` -If you are using Windows +Windows를 사용한다면 ``` bin\zeppelin.cmd ``` -After successful start, visit [http://localhost:8080](http://localhost:8080) with your web browser. +성공적으로 시작됐으면, 웹 브라우저에서 [http://localhost:8080](http://localhost:8080)으로 접속합니다. -#### Stop Zeppelin +#### 제플린 정지하 ``` bin/zeppelin-daemon.sh stop ``` -#### (Optional) Start Apache Zeppelin with a service manager +#### (선택 사항) 서비스 매니저로 아파치 제플린 시작 -> **Note :** The below description was written based on Ubuntu Linux. +> **참고 :** 아래 설명은 Ubuntu Linx를 기반으로 작성되었습니다. -Apache Zeppelin can be auto started as a service with an init script, such as services managed by **upstart**. +아파치 제플린은 **upstart**에 의해 관리되는 서비스같은 초기화 스크립트를 통해 서비스로 자동 시작될 수 있습니다. -The following is an example of upstart script to be saved as `/etc/init/zeppelin.conf` -This also allows the service to be managed with commands such as +아래는 `/etc/init/zeppelin.conf`로 저장될 upstart 스크립트의 예제입니다. +이것은 서비스가 아래와 같은 명령어로 관리될 수 있도록합니다. ``` sudo service zeppelin start @@ -148,7 +148,7 @@ sudo service zeppelin stop sudo service zeppelin restart ``` -Other service managers could use a similar approach with the `upstart` argument passed to the `zeppelin-daemon.sh` script. +다른 서비스 매니저는 `upstart` 인수를 `zeppelin-daemon.sh` 스크립트에 전달해서 사용할 수 있습니다. ``` bin/zeppelin-daemon.sh upstart @@ -174,73 +174,73 @@ chdir /usr/share/zeppelin exec bin/zeppelin-daemon.sh upstart ``` -## What is the next? -Congratulation on your successful Apache Zeppelin installation! Here are two next steps you might need. +## 다음은 무엇입니까? +아파치 제플린 설치 성공을 축하합니다! 당신에게 필요할 두 가지 다음 단계가 있습니다. -#### If you are new to Apache Zeppelin - * For an in-depth overview of Apache Zeppelin UI, head to [Explore Apache Zeppelin UI](../quickstart/explorezeppelinui.html). - * After getting familiar with Apache Zeppelin UI, have fun with a short walk-through [Tutorial](../quickstart/tutorial.html) that uses Apache Spark backend. - * If you need more configuration setting for Apache Zeppelin, jump to the next section: [Apache Zeppelin Configuration](#apache-zeppelin-configuration). +#### 제플린을 처음 사용한다면 + * [아파치 제플린 UI 탐험](../quickstart/explorezeppelinui.html)에서 아파치 제플린 UI의 면밀한 개요를 볼 수 있습니다. + * 아파치 제플린 UI에 익숙해진 후에, 아파치 스파크 백엔드를 사용하는 [Tutorial](../quickstart/tutorial.html)에서 간단한 연습을 즐기세요. + * 아파치 제플린 구성 설정을 더 하고 싶으면, [아파치 제플린 구성](#apache-zeppelin-configuration)을 참조하세요. -#### If you need more information about Spark or JDBC interpreter setting - * Apache Zeppelin provides deep integration with [Apache Spark](http://spark.apache.org/). For the further informtation, see [Spark Interpreter for Apache Zeppelin](../interpreter/spark.html). - * Also, you can use generic JDBC connections in Apache Zeppelin. Go to [Generic JDBC Interpreter for Apache Zeppelin](../interpreter/jdbc.html). +#### 스파크와 JDBC 인터프리터 설정에 대해 더 많은 정보가 필요하다면 + * 아파치 제플린은 [아파치 스파크](http://spark.apache.org/)와 깊은 통합을 제공합니다. 더 많은 정보가 필요하면 [아파치 제플린을 위한 스파크 인터프리터](../interpreter/spark.html)를 참조하세요. + * 또한, 아파치 제플린에서 일반 JDBC 연결을 사용할 수 있습니다. [아파치 제플린을 위한 일반 JDBC 연결](../interpreter/jdbc.html)을 참조하세요. -#### If you are in multi-user environment - * You can set permissions for your notebooks and secure data resource in multi-user environment. Go to **More** -> **Security** section. +#### 다중 사용자 환경이라면 + * 다중 사용자 환경에서 당신 notebooks을 위한 권한과 데이터 리소스에 대한 보안을 설정할 수 있습니다. **More** -> **Security** 섹션을 참고하세요. -## Apache Zeppelin Configuration +## 아파치 제플린 구성 -You can configure Apache Zeppelin with both **environment variables** in `conf/zeppelin-env.sh` (`conf\zeppelin-env.cmd` for Windows) and **Java properties** in `conf/zeppelin-site.xml`. If both are defined, then the **environment variables** will take priority. +`conf/zeppelin-env.sh` (`conf\zeppelin-env.cmd` for Windows)의 **환경 변수** 와 `conf/zeppelin-site.xml`의 **자바 프로퍼티** 로 아파치 제플린을 구성할 수 있습니다. 둘 다 정의됐으면 환경 변수가 우선 순위가 높습니다. - - + + - + - + - + - + - + - + - + @@ -300,85 +300,85 @@ You can configure Apache Zeppelin with both **environment variables** in `conf/z - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -389,19 +389,19 @@ You can configure Apache Zeppelin with both **environment variables** in `conf/z - + - +
zeppelin-env.sh zeppelin-site.xmlDefault valueDescription기본값설명
ZEPPELIN_PORT zeppelin.server.port 8080Zeppelin server port제플린 서버 포트
ZEPPELIN_MEM N/A -Xmx1024m -XX:MaxPermSize=512mJVM mem optionsJVM mem 옵션
ZEPPELIN_INTP_MEM N/A ZEPPELIN_MEMJVM mem options for interpreter process인터피리터 프로세스에 대한 JVM mem 옵션
ZEPPELIN_JAVA_OPTS N/A JVM optionsJVM 옵션
ZEPPELIN_ALLOWED_ORIGINS zeppelin.server.allowed.origins *Enables a way to specify a ',' separated list of allowed origins for rest and websockets.
i.e. http://localhost:8080
REST 및 웹 소켓의 ','로 구분된 목록을 지정.
i.e. http://localhost:8080
N/A zeppelin.anonymous.allowed trueAnonymous user is allowed by default.익명의 사용자를 기본적으로 허용
ZEPPELIN_SERVER_CONTEXT_PATH zeppelin.server.context.path /A context path of the web application웹 어플리케이션의 컨텍스트 경로
ZEPPELIN_SSLZEPPELIN_NOTEBOOK_HOMESCREEN zeppelin.notebook.homescreen A notebook id displayed in Apache Zeppelin homescreen
i.e. 2A94M5J1Z
아파치 제플린 홈화면에 출력될 notebook ID.
i.e. 2A94M5J1Z
ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE zeppelin.notebook.homescreen.hide falseThis value can be "true" when to hide the notebook id set by ZEPPELIN_NOTEBOOK_HOMESCREEN on the Apache Zeppelin homescreen.
For the further information, please read Customize your Zeppelin homepage.
아파치 제플린 홈화면에 notebook ID를 숨기고 싶을 때 "true"로 설정합니다.
상세한 정보는 제플린 홈페이지 꾸미지를 참조하세요.
ZEPPELIN_WAR_TEMPDIR zeppelin.war.tempdir webappsA location of jetty temporary directoryjetty 임시 디렉토리 경로
ZEPPELIN_NOTEBOOK_DIR zeppelin.notebook.dir notebookThe root directory where notebook directories are savednotebook 디렉토리가 저장된 루트 디렉토리
ZEPPELIN_NOTEBOOK_S3_BUCKET zeppelin.notebook.s3.bucket zeppelinS3 Bucket where notebook files will be savednoteook 파일이 저장될 S3 Bucket
ZEPPELIN_NOTEBOOK_S3_USER zeppelin.notebook.s3.user userA user name of S3 bucket
i.e. bucket/user/notebook/2A94M5J1Z/note.json
S3 bucket의 사용자 이름
i.e. bucket/user/notebook/2A94M5J1Z/note.json
ZEPPELIN_NOTEBOOK_S3_ENDPOINT zeppelin.notebook.s3.endpoint s3.amazonaws.comEndpoint for the bucketbucket에 대한 엔드 포인트
ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID zeppelin.notebook.s3.kmsKeyID AWS KMS Key ID to use for encrypting data in S3 (optional)S3의 데이터를 암호화할 때 사용할 AWS KMS Key ID (선택 사항)
ZEPPELIN_NOTEBOOK_S3_EMP zeppelin.notebook.s3.encryptionMaterialsProvider Class name of a custom S3 encryption materials provider implementation to use for encrypting data in S3 (optional) 3에서 암호화할 때 사용할 사용자 S3 암호 재료 제공 도구의 클래스명 (선택 사항)
ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING zeppelin.notebook.azure.connectionString The Azure storage account connection string
i.e.
DefaultEndpointsProtocol=https;
AccountName=<accountName>;
AccountKey=<accountKey>
Azure 스토리지 계정 연결 문자열
i.e.
DefaultEndpointsProtocol=https;
AccountName=<accountName>;
AccountKey=<accountKey>
ZEPPELIN_NOTEBOOK_AZURE_SHARE zeppelin.notebook.azure.share zeppelinShare where the notebook files will be savednotebook 파일이 저장될 Share
ZEPPELIN_NOTEBOOK_AZURE_USER zeppelin.notebook.azure.user userAn optional user name of Azure file share
i.e. share/user/notebook/2A94M5J1Z/note.json
Azure file share의 선택적인 사용자 이름
i.e. share/user/notebook/2A94M5J1Z/note.json
ZEPPELIN_NOTEBOOK_STORAGE zeppelin.notebook.storage org.apache.zeppelin.notebook.repo.VFSNotebookRepoComma separated list of notebook storage콤마로 구분된 notebook 저장소 목록
ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC zeppelin.notebook.one.way.sync falseIf there are multiple notebook storages, should we treat the first one as the only source of truth?여러 notebook 스토리지가 있다면, 첫번째 것을 유일한 소스로 다뤄야 하는가?
ZEPPELIN_INTERPRETERS Comma separated interpreter configurations [Class]
- NOTE: This property is deprecated since Zeppelin-0.6.0 and will not be supported from Zeppelin-0.7.0 + 참고: 이 속성은 제플린-0.6.0 이후 사용되지 않으며, 제플린 0.7.0에서는 지원되지 않습니다.
ZEPPELIN_INTERPRETER_DIR zeppelin.interpreter.dir interpreterInterpreter directory인터프리터 디렉토리
ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE zeppelin.websocket.max.text.message.size 1024000Size in characters of the maximum text message to be received by websocket.웹소켓에 의해 수신받을 최대 텍스트 메시지의 크기