Best Free Datasets for Data Science and Machine Learning Projects

Best Free Datasets for Data Science and Machine Learning Projects

Are you a beginner to Data Science and Machine Learning and want to practice more on different Datasets ?

This post will help you in finding different websites where you can easily get free Datasets to practice and develop projects in Data Science and Machine Learning.

Kaggle

Kaggle is a great resource for machine learning datasets. The advantages of using Kaggle is it contains datasets from almost every domain and you can find number of kernels relating to each dataset.

2.png

UCI

The UCI has publically available datasets specifically for machine learning and data analysis. The datasets present are tagged up with categories e.g. Classification, Regression, Recommender-Systems, etc. so you can easily search for a dataset to practice a particular machine learning technique.

3.png

Quandl

Quandl is a library which provides free finance domain’s datasets. Quandl can be imported as a library and it is integrated with Python. After installing it and importing in the code you can easily collect all the dataset and use from the library.

4.png

US Government Open Dataset — DATA.GOV

US Government Open Dataset — DATA.GOV is the website by US government which provide free datasets. Here you can find datasets based on different categories like Agriculture, Climate, Health and many more.

5.png

Indian Government OpenDataset

Indian Government OpenDataset is the website by Indian government which provide free datasets. This website is very similar to the previous one by US government.

6.png

World Bank Dataset

The World Bank Dataset is the open datasets provided by the World Bank. Here you can find many resources related to the datasets like Open Data Catalog, DataBank, Microdata Library and many more.

7.png

Group Lens dataset

Group Lens dataset is an amazing website to fetch datasets related to any kind of recommendation system. This website is headed by the faculty from the department of computer science and engineering at the University of Minnesota. They have build some popular real time systems like MovieLens , LensKit and many more.

8.png

Google Cloud BigQuery public datasets

Google Cloud BigQuery public datasets provides various public datasets by Google Cloud Marketplace. Datasets provided here are not completely free. The first 1TB of data per month is free, after that they have some price associated. In order to access the datasets present you have to create a project in Google Cloud Platform.

9.png

Awesome Public Dataset

Awesome Public Dataset is a github link which provides topic-centered datasets on almost all the topics like Agriculture, Biology, Climate+Weather, Education and many more. Most of the datasets present here are free, however some are not.

10.png

Scikit-learn

Scikit-learn hosts a variety of both toy and real-world datasets. They can be obtained using the general data set API. They are very easy to obtain and manipulate into a state ready to train a model and can be a great way to evaluate different algorithms before using them on real-world data.

To obtain a Toy datasets from Scikit-learn(using the boston house prices as an example):

11.png

To fetch a real world dataset from Scikit-learn:

12.png

Hope this helps in finding the appropriate dataset for your project.

Thank you for reading! 😊