This repository contains ATIS Dataset in Python pickle format and Rasa NLU JSON format (https://rasa.com/docs/nlu/dataformat/#json-format), also this project provide codes to show how extract data from pickle file.
0: flight: BOS i want to fly from boston at 838 am and arrive in denver at 1110 in the morning EOS
BOS O
i O
want O
to O
fly O
from O
boston B-fromloc.city_name
at O
838 B-depart_time.time
am I-depart_time.time
and O
arrive O
in O
denver B-toloc.city_name
at O
1110 B-arrive_time.time
in O
the O
morning B-arrive_time.period_of_day
EOS O
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "i would like to find a flight from charlotte to las vegas that makes a stop in st. louis",
"intent": "flight",
"entities": [
{
"start": 35,
"end": 44,
"value": "charlotte",
"entity": "fromloc.city_name"
},
{
"start": 48,
"end": 57,
"value": "las vegas",
"entity": "toloc.city_name"
},
{
"start": 79,
"end": 88,
"value": "st. louis",
"entity": "stoploc.city_name"
}
]
},
...
]
}
}
Sample Number | Vocabulary Size | Number of Slots | Number of Intents |
---|---|---|---|
4978(Training set)+893(Testing set) | 943 | 129 | 26 |
summary_data.py include codes to read data from raw data file,user can learn how to read data.
Data Format | Training Set | Testing Set |
---|---|---|
Python 3 Pickle Format | atis.train.pkl | atis.test.pkl |
Rasa NLU JSON Format | train.json | test.json |
- The origin data set come from ATIS DataSet by siddhadev,some codes also copied from here。
- NOTE:
ATIS DataSet by siddhadev
comes from MicroSoft CNTK Examples
- NOTE:
- https://github.com/mesnilgr/is13 also provide ATIS dataset, but only provide slots data without intent.