Class of Data managment Layer - Cassandra

Download Cassandra - 1 node

wget http://ftp.cixug.es/apache/cassandra/3.5/apache-cassandra-3.5-bin.tar.gz
tar -xf apache-cassandra-3.5-bin.tar.gz
cd apache-cassandra-3.5
bin/cassandra -f  # The -f stands for foreground

Open a new terminal

cd apache-cassandra-3.5
bin/nodetool ring # nodes joined the ring
bin/cqlsh

Now we can create our keyspace

CREATE KEYSPACE tensorsparkkandra WITH replication = {'class': 'SimpleStrategy' , 'replication_factor': 1};
# We use replciation factor one.. as long we have only 1 node.

Now time to create the schema to store the interesting information from the Twitter Statuses and Images

CREATE TABLE images(
...

References:

Querying Cassandra from Python

Requisites: python, pip, virtualenv

We will install the Python Cassandra driver and Ipython so that we query Cassandra interactively.

virtualenv may25
source ~/may25/bin/activate
pip install cassandra-driver
pip install ipython

First, we import all the libraries and we connect to the cluster

# ipython
from cassandra.cluster import Cluster
cluster = Cluster(['localhost'])
session = cluster.connect("tensorsparkkandra")

Now the database is empty: let's make 1000 inserts! Use session.prepare for sending the body of the query only once to the server so that we can save some bytes on the network.

preparedStatement = session.prepare("INSERT INTO images (imgid, confidence , category ) VALUES (?,?,?)")

categories = ['cat','boat','dog','dress','human']

import random
for i in xrange(1000):
  category = categories[random.randint(0,len(categories)-1)]
  confidence = random.random()
  session.execute(preparedStatement.bind([i,confidence,category]))

Now we can read all the inserted elements.

resultSet = session.execute("SELECT * FROM images")
for row in resultSet:
  print row

What if we want to filter the results on their confidence score?

  
resultSet = session.execute("SELECT * FROM images WHERE confidence > 0.1") # FAILS: the PRIMARY KEY is missing!

imgids = [ i.imgid for i in session.execute("SELECT imgid FROM images")]

for id in imgids:
  res = session.execute("SELECT * FROM images WHERE imgid = %s AND confidence < %s",(id,0.1))
  for row in res:
     print row

Extra - Cassandra multinode

In this exercise, we will use ccm, a script which allows creating and managing Cassandra clusters on localhost.

pip install ccm
wget https://raw.githubusercontent.com/pcmanus/ccm/master/misc/ccm-completion.bash
source ccm-completion.bash  # this is for bash completion - pretty handy

Let's create and start a 3 nodes cluster

ccm create test -v 3.5.0 -n 3 -s
ccm node1 status # nodetool status on nodeone

Now we can create again the keyspace, but this time with replication factor 2

ccm node1 cqlsh

CREATE KEYSPACE tensorsparKkandra WITH replication = {'class': 'SimpleStrategy', 'replication_factor':  2 };
USE tensorsparKkandra;
CREATE TABLE ..

By modifying the previous python code you can generate a random workload on the nodes. In the meantime, you can use jconsole to look at the cluster behavior. For instance, you can look into the MBeans tab at the metric org.apache.cassandra.metrics.ClientRequest.Read to see how the load distributes between nodes. One tip: double clicking on the metrics opens a graph of the value changing over the time.

ccm jconsole # Starts a jconsole for each node. 
ccm node1 stop # You can stop a node and see what happens to the cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Class of Data managment Layer - Cassandra

Download Cassandra - 1 node

Querying Cassandra from Python

Extra - Cassandra multinode

About

Releases

Packages

License

cugni/class25thOfMay

Folders and files

Latest commit

History

Repository files navigation

Class of Data managment Layer - Cassandra

Download Cassandra - 1 node

Querying Cassandra from Python

Extra - Cassandra multinode

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages