Getting started with Orange Tool

Muskan Jindal
4 min readAug 27, 2021

--

Data mining is used to build prediction models based on historical data. They can help in making decisions and predict future trends. Orange is a framework for data visualization, machine learning, and data mining with a front-end for visual programming. It is a very helpful tool for analyzing big data sets and supports visual programming tools for Data mining.

In this article, I have briefly described the basic functionalities of the Orange tool.

Orange is an open-source scriptable environment for quick prototyping of the latest algorithms and testing patterns. It is a group of python-based modules that exist in the core library.

The analysis is achieved by connecting widgets that perform various functions, such as reading files, displaying statistics on features, constructing models, evaluating, etc. Orange is a great software package for machine learning and data mining.

Orange Widgets — Graphical user interface to orange’s data mining and machine learning techniques. They are the various components present in Orange. The widgets are divided into various categories like Data, Visualize, Model, Evaluate, and so on.

Widgets offer essential functionality, like:

  • Displaying data table and allowing to selection features
  • Data reading
  • Training predictors and comparison of learning algorithms
  • Data element visualization, etc.

How to use workflows in Orange?

Orange Workflows consist of components that read, process and visualize data. Widgets communicate by sending information along with a communication channel. An output from one widget is used as input to another. This makes a workflow.

Let’s start by creating a simple workflow for any dataset. Orange provides a few inbuild datasets, you can use one of those or import one of your choices. I have used the Heart Disease dataset. This data is on the presence of heart disease in patients.

The workflow is designed in such a way that data from the dataset is sent to the data table to view the data in tabular form, to Distributions for creating a distribution, and a Scatter Plot to plot from the dataset.

To create this simple workflow in Orange,

Step 1: Load the dataset using the File widget

Step2: Create links from File to Data Info, Data Table, Distributions, and Scatter Plot.

This is what an Orange workflow looks like, with each widget serving its individual purpose. On the left-hand side, you can see a panel containing all the widgets that you can drop and drag onto the workflow canvas. The links seen between all the widgets are to connect them and to have the data ‘flow’ through the workspace.

To load data — Drag and drop the File widget from the left pane and place it in the canvas. Double click on the File widget and select the desired dataset.

How to do basic data exploration (like data distribution, data information)

To get information about the loaded data we use the Data Info widget. It shows the dataset name, size, features, description, row count, column count, and targets, and data attributes in the dataset.

To view your data in tabular form, use the Data Table widget, drag and drop the widget to the canvas and create a link from the File widget to the Data Table widget.

The data Distribution widget is used to get a graphical representation of the dataset values. Here one can easily view distribution for different features from the dataset. In the below snapshot, you can observe the distribution based on age. Similarly, you can view distributions for gender, cholesterol, etc.

Another one, the Scatter Plot widget to plot different kinds of feature pairs. Here, a scatter plot is plotted for the feature pair of cholesterol and age.

How to load your data in Orange and how to load external data from API in Orange?

As said before, Orange allows you to use either one of the inbuild datasets or any dataset of your choice. The File widget allows you to load data. You can directly choose the dataset you want to use by selecting the file option as the source. Else if you want to load external data you can simply select the URL option for source and paste the external dataset link.

That’s it! You now have a basic understanding of the Orange tool.

--

--

No responses yet