Friday, March 18, 2016

Topological Data Analysis - Start Here

Posted by Michael on 2:34 PM course , homology , learn , persistent homology , tda , topological data analysis No comments

About these posts on TDA

Topological Data Analysis (TDA) is exactly what it sounds like, using tools from topology to study data. I plan on writing a series of posts that takes us from the basics of topology to the current state of affairs in TDA, this is the first in that series.

What is Topology

From a million miles away topology is the study of shapes, you may also remember that geometry is the study of shapes. So what is the difference between geometry and topology? Well, geometry cares about every little detail of a shape like, does it have corners, what is the curvature at a point, what are the distances between two points, angles, ect. Topology however only cares about the global properties of a shape, what is the basic shape of the object even if we smush it around a little bit? Below is an example, you can see that geometry cares that a square and a circle are different (one has corners ect) however topology only cares that both basically form a loop.

To make this more formal we can say that topology is the study of the properties of shapes that are preserved under continuous deformations. So two shapes are considered the same if we can stretch or bend one to look like the other. It is exactly this flexibility that makes topology such a useful tool.

Motivation for TDA

Topology is a mature mathematical subject with many tools and techniques. The basic idea behind TDA is to use these techniques to learn something about a data set. Data sets may be very high dimensional making them impossible to visualize and hard to find qualitative information about. This is where topology comes in, the dimension of the data is in many ways irrelevant and the tools of topology give new types of "statics" for data sets.

Basic outline for use of TDA

Start with a data set
Build a topological space (shape) out of the data
Compute the (persistent) homology of the space (homology is a computable topological invariant of the space, we'll have alot more to say about homology later)
Use the information from 3 to further investigate your data

This series of posts will essentially flesh out steps 2 through 4 as we discuss the challenges and developments in completing these steps. Note that step 4 is the most important piece of this process and it is not something that TDA can help with. As an analogy just computing the average of a set of numbers does not really mean much unless that average is put into context. Here, the statics coming from using (persistent) homology don't mean much on their own and it is up to the investigator to determine their meaning.

A few final words

When we try to implement the steps above we run into all sorts of computational and theoretical problems. For example, when forming a topological space (shape) out of our data we need to make a choice, an this choice could drastically affect our results. The idea of persistent homology then mitigates this problem by "making all choices at once" in other words investigating the results of using EVERY choice which essentially takes choice out of the equation. In this series of posts we will encounter the problems and their solutions as they come as opposed to just presenting the status quo solution without motivating the problem. I hope this will be an enjoying and illuminating way to learn TDA.

Read the next post on TDA

Friday, March 18, 2016

Topological Data Analysis - Start Here

About these posts on TDA

What is Topology

Motivation for TDA

Basic outline for use of TDA

A few final words

No comments :

Post a Comment

About Me

What is this blog?

Labels