Note: Servers may be down cuz I'm really poor.
Q: What's this?
A: Hashtagsbattle is a Web App which displays some analytics, such as hourly trending hashtags, daily hashtags, worldwide activity... based on Twitter and in Real-Time. Inspired by the awesome One Million Tweet Map. (Nb: that's a demo).
Q: How it works?
A:
-
First of, there's a tweets listener built with Tweepy which retrieves tweets sent back by the Twitter's API. It does some basic cleaning and filtering before publishing them to a Pub/Sub topic, which is basically a global-scale messaging buffer/bus. The listener is running on a Google Compute Engine instance, as it is somehow cheap and doesnt requires auto-scaling.
-
Then, there's a little Express server using SocketIO. This application is running on App Engine. There's an endpoint receiving Pub/Sub push messages and emiting events through a web socket. It's using the Supercluster library to do server-side clustering on points and to reduce networking/client-side rendering delay.
-
The heart of my project is the Apache-Beam streaming processing pipeline running on the Cloud Dataflow runner. This pipeline consumes events sent by the source Pub/Sub topic and it does some data transformations (grouping, counting, filtering, batching...) before sending back the pre-aggregated output to another Pub/Sub topic. I'm playing with some windows and some triggers to achieve a quite low-latency.
-
Finally, the output Pub/Sub topic will trigger Cloud Functions instances that are going to do some computation on the data before saving it to Firestore.
The Web-App is built with Stencil and it's deployed to Firebase Hosting.
As you can see, this is fully managed by Google Cloud Platform.
- Use Pub/Sub push method instead of pull (lower latency)
- UI
- Implement the ML layer
- A lot of things ...
Work in progress.
The application is made of 4 components. Almost each component is Dockerized and has it's own CI/CD pipeline using Cloud Build.