mot.la

Calculating Euclidean Distance in Ruby

Euclidean distance is often used in conjunction with the k-means algorithm. Remember your pythagorean theorem from high school geometry? Euclidean distance is the same theorem but across N dimensions.

For example, you can calculate the distance between two points in 2D, 3D space, 4D space, and more. Let's start with a 2D example.

2D Example

In the line chart above, you can see we have two data points. The first has a value of 1 for X and 3 for Y. The second has a value of 4 for X and 1 for Y.

If we were in high school geometry, we would draw lines to convert this to a triangle. Then we'd label lines A, B, and C.

Once we know those are labeled A and B, we can give them values for the length of their lines. In the chart above, A has a length of 2 and B has a length of 3.

With that information we can calculate the length of line C (the connecting line) with the following pythagorean formula sqrt(A^2 + B^2) = C. Plugging in the numbers we get sqrt(2^2 + 3^2) = C, which becomes sqrt(4 + 9) = C, which subsequently becomes approximately 3.605 = C.

You could have also written the formula like the following.

  point1 = [1,3]
  point2 = [4,1]
  distance = sqrt( (1-4)^2 + (3-1)^2 )

Making sense? You just used Euclidean distance to calculate the distance between two points in 2D space. Let's try it in 3 dimensions next.

3D Example

In 3 dimensions, we can model our vectors as an array of 3 numbers rather than 2 numbers. [X,Y] becomes [X,Y,Z]. Let's build off the top of our last example, but add the 3D value Z.

  point1 = [1,3,2]
  point2 = [4,1,4]

  distance = sqrt( (1-4)^2 + (3-1)^2 + (2-4)^2 )

Conclusion

That's it! All we had to do was add the 3rd dimension to the distance calculation ((2-4)^2). Adding a 4th dimension is the same.

  point1 = [1,3,2,8]
  point2 = [4,1,4,9]

  distance = sqrt( (1-4)^2 + (3-1)^2 + (2-4)^2 + (8-9)^2)

From here, you can follow that pattern to add N dimensions as necessary for your k-means algorithm.