This small Spark project provides the sample code which we've talked about in the Spark On
blog post series at 47D Blog.
- Twitter Credentials to connect to the Twitter API. Read more about it here.
- In this README.md file you will see the IP address
192.168.99.100
. If you are using docker-machine,docker-machine ip <machine-name>
command should return the specific host’s IP address. You must replace192.168.99.100
for the IP address in your case. - The whole infrastructure has been tested on an Apple Macbook Pro (2,7 GHz Intel Core i5, 16 GB 1867 MHz DDR3).
To start off, we need to define a few environment variables in this config file.
We've defined a bash script to deploy all of the cluster dependencies, including the Spark Streaming Application, which means, we can run it in this way:
scripts/deploy.sh
By default, the infrastructure deployed will be:
- Spark Cluster:
- 1 Spark Master
- 2 Spark Worker nodes
- Cassandra Cluster:
- 2 Cassandra Docker Containers
- 1 Docker Container with DataStax Opscenter
- Kafka Cluster:
- 1 Docker node Zookeper
- 3 Docker containers running as Kafka brokers
- Hadoop HDFS Cluster:
- 1 Docker container running as namenode
- 1 Docker container running as datanode
- 1 Docker container for our Streaming App
For instance, to increase the Spark Workers available:
docker-compose scale spark_worker=5
If everything is functioning correctly, we can start the Twitter Streaming as follows:
curl -X "POST" "http://192.168.99.100:9090/twitter-streaming" \
-H "Content-Type: application/json" \
-d $'{
"recreateDatabaseSchema": true,
"filters": [
"lambda",
"scala",
"akka",
"spray",
"play2",
"playframework",
"spark",
"java",
"python",
"cassandra",
"bigdata",
"47 Degrees",
"47Degrees",
"47Deg",
"programming",
"chicharrones",
"cat",
"dog"
]
}'
For instance, you could use Simple WebSocket Client for Google Chrome, opening the connection in this URL ws://192.168.99.100:9090/trending-topics .
We can stop the streaming gracefully, before stopping the cluster:
curl -X "DELETE" "http://192.168.99.100:9090/twitter-streaming"
And then, from the shell:
cd scripts
docker-compose stop
docker-compose rm
Start, stop and fetch the Spark Streaming Context status in the application. Note: once you have stopped the context you can not start again.
-
Response 200 (application/json)
{ "message": "The streaming has been created, but not been started yet" }
This action allows you to stop the Spark Streaming Context.
-
Response 200 (application/json)
{ "message": "Started" }
-
Response 400
This action allows you to start the Spark Streaming Context.
-
Response 200 (application/json)
{ "message": "The streaming has been stopped" }
-
Response 400
## WS Filtered Twitter Word Tracks [WS /trending-topics]
Open a websocket in order to show each new filtered track word is found.
#License
Copyright (C) 2015 47 Degrees, LLC http://47deg.com [email protected]
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.