Home Artificial Intelligence How classification and clustering work: the easy way

by Andrew Oliver

Columnist

How classification and clustering work: the easy way

how-to

Feb 01, 20183 mins

AnalyticsMachine LearningSoftware Development

People are often confused about what these are and what the difference is. So here is an explanation using the old-fashioned way: in an Excel spreadsheet

$percentage analytics fraction numbers$

Credit: Thinkstock

Machine learning gets a lot of buzz. The two most talked about classes of algorithms are classification and clustering. Classification is assigning things a label. Clustering is grouping things that look like they go together. Yet people are often confused about what these are and what the difference is.

That confusion is partly because many explanations quickly go into a bunch of formulas. Instead, here is an explanation of clustering and classifying things the old-fashioned way: in an Excel spreadsheet.

How classification works

Let’s say that you want to predict which students will likely graduate and which students will likely drop out. Perhaps you want to flag them so you can assign a counselor. So, you have two labels: risk and low-risk. To do this using classification, you need a training set of students already known to have graduated.

(Please note that I acquired this data the same way a stable genius does: I made it up. Don’t use it for anything but understanding what classification means.)

Forget the algorithm for now. Let’s use this spreadsheet:

In the sheet’s data are some patterns among GPA, number of suspensions, and whether the student has been expelled. Mentally, you can make some correlations and note some exceptions.

download

Classification example in Excel

The source data for the classification example.

So, based on the following data, can you decide who is likely to graduate? If so, congratulations! You’re a classification algorithm.

How clustering works

Now let’s look at clustering. I have no labels for this data set. I just want the computer to effectively find the ones that are like the other ones and group them.

This data also has some patterns in it that you can see: The first and last column are probably meaningless for grouping purposes. However, there are several that have 1 1 1 in the first field. In fact, there are some that have 1 1 1 and then 0 0 0 and then 1 1 1. Now group those rows a cluster.

You can probably find the opposite pattern as well. That is another cluster.

You may also find some smaller matches, like 1 1 1 0 0 0 1 1 (it’s not in the sample data here, so you’re not missing something). Group that one; it can also be a cluster.

download

Clustering example in Excel

The source data for the clustering example

There are various algorithms that do this computationally. Some even do different forms of classification and clustering. However, the basic idea is that this something you can do in Excel.

by Andrew Oliver

Columnist

Andrew C. Oliver is a columnist and software developer with a long history in open source, databases, and cloud computing. He founded Apache POI and served on the board of the Open Source Initiative. Oliver has helped with marketing in startups including JBoss, Lucidworks, and Couchbase. He currently leads marketing for CelerData.

Student AA	1	no	4
Student BB	1	yes	1.5
Student CC	0	no	3.2

Topics

About

Policies

Our Network

More

How classification and clustering work: the easy way

People are often confused about what these are and what the difference is. So here is an explanation using the old-fashioned way: in an Excel spreadsheet

How classification works

How clustering works

More from this author

5 things to consider before you deploy an LLM

How to test your B2B startup idea

When the robots come

Should you leave Twitter for the fediverse?

How to get your computer science degree online

What developers should do during a downturn

Architecting for SaaSification

Developing applications that never delete

Most popular authors

Show me more

Beyond the usual suspects: 5 fresh data science tools to try today

Generative AI won’t fix cloud migration

HR professionals trust AI recommendations

How to use dbm to stash data quickly in Python

How to auto-generate Python type hints with Monkeytype

How to make HTML GUIs in Python with NiceGUI

How classification and clustering work: the easy way

People are often confused about what these are and what the difference is. So here is an explanation using the old-fashioned way: in an Excel spreadsheet

How classification works

How clustering works

Related content

Beyond NoSQL: The case for distributed SQL

9 career pitfalls every software developer should avoid

Developers will decide cloud winners and losers

Rethinking software developer events after COVID-19

More from this author

5 things to consider before you deploy an LLM

How to test your B2B startup idea

When the robots come

Should you leave Twitter for the fediverse?

How to get your computer science degree online

What developers should do during a downturn

Architecting for SaaSification

Developing applications that never delete

Most popular authors

Show me more

Beyond the usual suspects: 5 fresh data science tools to try today

Generative AI won’t fix cloud migration

HR professionals trust AI recommendations

How to use dbm to stash data quickly in Python

How to auto-generate Python type hints with Monkeytype

How to make HTML GUIs in Python with NiceGUI