Skip to content

Latest commit

 

History

History
33 lines (25 loc) · 2.69 KB

File metadata and controls

33 lines (25 loc) · 2.69 KB

Techfest-Datathon : IIT-Bombay

Problem Statement:

Given the business potential of 3,915 regions over a period of 6 years(72 months), forecast the business potential of all these regions for the next 15 months.

Link to the detailed problem statement

Our team ranked 6 nationally Competition Leaderboard

Exploring the data:

image

Variation of business potential data for different regions

regional variations

Observing trends, seasonality, and random noise

trends and seasonalities

Looking for regions with highly similar business potential curve

High correlation

Regions with dissimilar business potential curves

low correlation

Looking for Outliers

outlier detection with inter-quartile range

Green region shows values within 25-75% of median. All points outside the green region are assumed to be outliers. Outliers are replaced with the previous value. The lower figure shows the plot with outliers removed(in blue).

Pearson Correlation

We used the Pearson Correlation method to find out how closely related each region is with every other region. The value of Pearson Correlation lies between -1 and 1, with 1 indicating complete positive correlation, -1 indicating complete negative correlation, and 0 indicating no correlation between the data for 2 regions. Using Pearson Correlation, we divided the regions into different groups according to their extent of correlation. We also created a group for regions that were poorly correlated. We selected various regions from each group for training the forecaster. A total of 1,313 regions were selected.

Models Tried

- ARIMA(Autoregressive integrated moving average)

- AutoARIMA

- LSTM (Long-Short Term Memory) Models