Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix benchmark documentation #931

Merged
merged 5 commits into from
Aug 14, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 12 additions & 14 deletions docs/source/benchmarks/text_summarization.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Text summarization benchmark
====
In this benchmark, we compare the performance of text summarization between EvaDB and MindsDB on `CNN-DailyMail News <https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail>`.
In this benchmark, we compare the performance of text summarization between EvaDB and MindsDB on `CNN-DailyMail News <https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail>`_.

1. Prepare dataset
----

.. code-block: bash
.. code-block:: bash
cd benchmark/text_summarization
bash download_dataset.sh
Expand All @@ -17,7 +17,7 @@ In this benchmark, we compare the performance of text summarization between EvaD

Install ray in your EvaDB virtual environment. ``pip install "ray>=1.13.0,<2.5.0"``

.. code-block: bash
.. code-block:: bash
cd benchmark/text_summarization
python text_summarization_with_evadb.py
Expand All @@ -26,12 +26,10 @@ In this benchmark, we compare the performance of text summarization between EvaD
3. Using MindsDB to summarize the CNN DailyMail News
----

.. _sqlite database:

Prepare sqlite database for MindsDB
****

.. code-block: bash
.. code-block:: bash
sqlite3 cnn_news_test.db
> .mode csv
Expand All @@ -41,24 +39,24 @@ Prepare sqlite database for MindsDB
Install MindsDB
****
Follow the `Setup for Source Code via pip <https://docs.mindsdb.com/setup/self-hosted/pip/source>` to install mindsdb.
Follow the `Setup for Source Code via pip <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install mindsdb.

.. note::

At the time of this documentation, we need to manully ``pip install evaluate`` for huggingface model to work in MindsDB.

After the installation, we use mysql cli to connect to MindsDB. Replace the port number as needed.

.. code-block: bash
.. code-block:: bash
mysql -h 127.0.0.1 --port 47335 -u mindsdb -p
Run Experiment
****

Connect the sqlite database we created before: :ref:`sqlite database`.
Connect the sqlite database we created before.

.. code-block: sql
.. code-block:: sql
CREATE DATABASE sqlite_datasource
WITH ENGINE = 'sqlite',
Expand All @@ -68,7 +66,7 @@ Connect the sqlite database we created before: :ref:`sqlite database`.
Create text summarization model and wait for its readiness.

.. code-block: sql
.. code-block:: sql
CREATE MODEL mindsdb.hf_bart_sum_20
PREDICT PRED
Expand All @@ -82,9 +80,9 @@ Create text summarization model and wait for its readiness.
DESCRIBE mindsdb.hf_bart_sum_20;
Use the model to summarize the CNN DailyMail news
Use the model to summarize the CNN DailyMail news.

.. code-block: sql
.. code-block:: sql
CREATE OR REPLACE TABLE sqlite_datasource.cnn_news_summary (
SELECT PRED
Expand All @@ -95,7 +93,7 @@ Use the model to summarize the CNN DailyMail news
4. Experiment results
----
Below are nubmers from a server with 56 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz and two Quadro P6000 GPU
Below are nubmers from a server with 56 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz and two Quadro P6000 GPU.

.. list-table:: Text summarization with ``sshleifer/distilbart-cnn-12-6`` on CNN-DailyMail News

Expand Down