Network Minded

Network Theory, Neuroscience, Topological Data Analysis

Friday, March 18, 2016

TDA - Topological Spaces

Setup

In our last post we saw the following 4 basic steps of using TDA
  1. Start with a Data Set
  2. Make a topological space out of the data
  3. Compute the persistent homology of the space
  4. Investigate the results
In this post we start talking about step 2. The best place to start is understanding what we mean when we say "topological space". We'll start with some intuition and examples and then we'll give the full definition, although the intuition and examples should be enough and the definition is really unnecessary for or purposes.

Intuition

To get some intuition you can think of a topological space as some sort of shape made out of Playdoh or silly putty. If you push or pull the silly putty around without tearing it then you can consider this the same "shape". We can't tear the Playdoh because we want all of our pushing and pulling to be continuous, which means points that are close stay close when we push and pull. If we tear the Playdoh points that were right next to each other are torn apart which is not continuous. So from a topological point of view a square and a circle are the same since we can just push and pull the square (without ripping it) to form a circle. This intuition, along with the quick examples below, are probably all you really need out of this post but if you're curious read on.

The general definition below is so general that it leads to all sorts of exotic topological spaces, and even with some extra assumptions there are still very many mind bending examples of spaces with interesting properties. In fact there is a whole book Counter Examples in Topology which shows that there are many strange examples of spaces with exotic properties, such as the warsaw circle. We won't really be concerned with such exotic examples and we won't be concerned with the general definition either, but it is included for completeness

Quick Examples

1. Torus and Double Torus

2. Circle

3. Sphere

4. Mobius Band

All images taken from Wikipedia

First Naive Definition

Intuitively a topological space $\mathcal{X}$ is a collection of points with some extra information telling us how "close" points are to each other. We need this extra information because recall we want to consider properties that a preserved under continuous transformations and the idea of continuous requires a notion of closeness. So what we use is the idea of a neighborhood of a point, or an open set, around a point (generally the word neighborhood is reserved for a metric space). If two points are close there should be a small open set that contains both of them.

Second Less Naive Definition

We can refine our previous definition to say a topological space $\mathcal{X}$ is a collection of points together with a collection of open sets (recall open sets are like neighborhoods telling us "closeness" information about points). This is essentially the abstract definition except we need to make sure these open sets actually behave the way we want them to. In other words, not any old set can be an open set, we have to define what it means for a set to be open. Recall that open sets are a more general way of declaring a neighborhood of a point or in other words, open sets are a way of defining "closeness" in a topological space. This is a more abstract way of thinking as we don't need a metric to know what "close" means, we only need the open sets. With this in mind mathematicians came to the following definition that abstractly describes the properties such sets must have. This definition won't be important to us but its included for completeness

Full definition

A topology, $\mathcal{T}$ on a set $\mathcal{X}$ is a collection of subsets $U\subset \mathcal{X}$ called open sets, such that
  1. $\mathcal{X}\in \mathcal{T}$
  2. The union of any collection of open sets is open. That is if $U_i\in\mathcal{T}$ then $\bigcup U_i\in \mathcal{T}$
  3. The intersection of any finite collection of open sets is open. That is if $A,B\in \mathcal{T}$ then $A\cap B\in \mathcal{T}$
A topological space then is the pair $(\mathcal{X}, \mathcal{T}).$

Examples

  1. $\mathbb{R}$ with open sets any open interval $(a,b)$ or union of open intervals $(a,b)\cup(c,d)$ ect.
  2. $\mathbb{R}^n$ with open sets being the union and finite intersection of any open balls in space
  3. $\mathbf{S}^1$ the circle. The open sets are just the union of open intervals of the circle

Topological Data Analysis - Start Here

About these posts on TDA

Topological Data Analysis (TDA) is exactly what it sounds like, using tools from topology to study data. I plan on writing a series of posts that takes us from the basics of topology to the current state of affairs in TDA, this is the first in that series.

What is Topology

From a million miles away topology is the study of shapes, you may also remember that geometry is the study of shapes. So what is the difference between geometry and topology? Well, geometry cares about every little detail of a shape like, does it have corners, what is the curvature at a point, what are the distances between two points, angles, ect. Topology however only cares about the global properties of a shape, what is the basic shape of the object even if we smush it around a little bit? Below is an example, you can see that geometry cares that a square and a circle are different (one has corners ect) however topology only cares that both basically form a loop.
To make this more formal we can say that topology is the study of the properties of shapes that are preserved under continuous deformations. So two shapes are considered the same if we can stretch or bend one to look like the other. It is exactly this flexibility that makes topology such a useful tool.

Motivation for TDA

Topology is a mature mathematical subject with many tools and techniques. The basic idea behind TDA is to use these techniques to learn something about a data set. Data sets may be very high dimensional making them impossible to visualize and hard to find qualitative information about. This is where topology comes in, the dimension of the data is in many ways irrelevant and the tools of topology give new types of "statics" for data sets.

Basic outline for use of TDA

  1. Start with a data set
  2. Build a topological space (shape) out of the data
  3. Compute the (persistent) homology of the space (homology is a computable topological invariant of the space, we'll have alot more to say about homology later)
  4. Use the information from 3 to further investigate your data
This series of posts will essentially flesh out steps 2 through 4 as we discuss the challenges and developments in completing these steps. Note that step 4 is the most important piece of this process and it is not something that TDA can help with. As an analogy just computing the average of a set of numbers does not really mean much unless that average is put into context. Here, the statics coming from using (persistent) homology don't mean much on their own and it is up to the investigator to determine their meaning.

A few final words

When we try to implement the steps above we run into all sorts of computational and theoretical problems. For example, when forming a topological space (shape) out of our data we need to make a choice, an this choice could drastically affect our results. The idea of persistent homology then mitigates this problem by "making all choices at once" in other words investigating the results of using EVERY choice which essentially takes choice out of the equation. In this series of posts we will encounter the problems and their solutions as they come as opposed to just presenting the status quo solution without motivating the problem. I hope this will be an enjoying and illuminating way to learn TDA.

Read the next post on TDA