GeoMapping and the Travelling Salesman Problem
In this post, we will be exploring an extremely interesting implementation of the Travelling Salesman Problem. Let’s start with a short background. We at LeanAgri, work towards digitising farmlands by tagging coordinates of a farm. This is helpful for various purposes: Getting an exact area of the farm, Remote monitoring farm area using satellites, Obtaining weather predictions for the specific region etc.
So what’s so hard about it?
It isn’t hard if you are walking and marking the coordinates in order from one end of the farm to other in a cycle.
In a lot of cases, the farms are too big to walk around, or we can see them on a map very clearly. It isn’t needed to walk along, but we can simply mark these corners.
From our data, we have realized that the users are unable to mark these points in order and they need to do it over and over again. A very simple example where there are just 4 points, but it takes 2 attempts for a user to get it right.
So, isn’t the solution to train users to mark them in order always? We at LeanAgri believe that software should be adaptive of the user rather than the other way round. We build solutions which allow users to improve their efficiency of working rather than training them to efficiently use the software we build.
What do we need?
- A solution which intuitively helps users tag all coordinates and mark the farm. The solution should be able to correct order of points which are marked by the user.
- This solution should be extremely fast (nearly realtime) allowing us to correct the polygon if needed.
- We can expect number of points for a farm to go as high as 70 for extremely complex cases.
- This solution should run on a memory limited smartphone.
Cyclic order implementation
The first attempt was to try and arrange all vertices in a cyclic order to create a Polygon. The algorithm is very easy, we take a reference point and arrange all points in an increasing order of angles relative to that point.
Intuitively, it makes sense, and it does work for 60% of the cases we encountered. Alas, it doesn’t work in the cases below (which I must mention are very much real farms and not a figment of my imagination 🙂 )
Looking deeper, we have a Travelling Salesman Problem!
Trying to re-analyze our problem, we are trying to find a best bounding polygon for a set of points, i.e. none of the edges must intersect. Any intersecting edges would produce an invalid polygon (with respect to physical land entities) since it would lead us to having a negative area. (Theoretically, there can be negative areas, eg: a farm around a house, but realistically, we don’t handle any such cases)
Using Triangle inequality, the shortest path when walking along all of these points is guaranteed to have no intersecting lines. Which means we are looking at a Euclidean Travelling Salesman Problem.
Here’s an explanation which I will just quote from this source:
An interesting special case of the TSP is to consider the optimal route passing through a collection of n points (sites) in the Euclidean plane (or more generally, n-dimensional Euclidean space). The weights between the points (sites) involved are taken to be the Euclidean distance between the points. If in an optimal tour some pair of edges AC and BD cross as shown below (Figure 4, top), then this can not be an optimal tour. Why? Consider the tour where the edges A’B’ and D’C’ replace the edges AC and BD in our tour. Let us compare the cost of the tour ACBDA with the tour A’B’C’D’A’ (Figure 4, bottom). Note that the edges AC and BD meet at a point Q which is not one of the original sites involved in the TSP. We know that AQ + QB is larger than A’B’ (because in the triangle AQB, the sum of any two sides has Euclidean length greater than the third side). Similarly DQ + QC is longer than D’C’. Thus, AQ + QB + DQ + QC is longer than A’B’ + D’C’. However AQ + QC = AC and DQ + QB = DB and hence, if we replace AC and DB by A’B’ and D’C’, we get a shorter tour.
Is this the only solution No. A lot of other solutions can also exist for this problem as well(for example the clockwise order example above). Intuitively for a user, the travelling salesman problem solution is going to make the most sense. Please note that humans are extremely good at solving the Travelling Salesman Problem.
Isn’t this a NP hard problem? Yes it is. TSP is solvable in n!
time which for 15 points is going to be ~10^15 iterations! Fortunately, a lot of research is done on this problem to generate approximate heuristic solutions much faster than the brute force solution.
Implementing a 2-opt heuristic
We started with a 2-opt heuristic since it is extremely easy to understand and implement. The running time is good too, n2 * Number of iterations
The worst case time is going to be again n!
, but we can limit the number of iterations to limit the worst case time.
A very simple and intuitive understanding of the 2-opt algorithm is:
- Take a random tour where all vertices are traversed exactly once.
- Interchange 2 edges if we get a better solution. Keep doing this till we keep getting a better path.
A visualization on Youtube might be interesting for getting a better understanding.
Initial path
We start with a Nearest neighbour path, where we pick the closest point from each point in the path as the next point. This is intuitively easy to understand and also likely going to give us better results as compared to a random path.
Distance function
Note that when using Latitudes and Longitudes (Geographic Coordinate system), the distance function needs to take into account earth’s curvature. It is documented in TSPLIB’s (a popular TSP Solver library) documentation.
Result
The result is great, we are able to make geomapping extremely intuitive and unrestrictive for the user. See in the below example how the user is marking points and the final polygon matches our intuition of the bounding polygon.
The entire process is realtime and happening in an Android smartphone.
Testing our heuristic
We want to analyse how good our heuristic works on all the data collected till date by LeanAgri. We have mapped 1000+ farmers all with farms of different shapes and sizes.
We run our TSP algorithm against all the farm mappings collected till date, and we see that around 15% of the results with our TSP setup fails to match.
Changing the initial path
The 2-opt algorithm is highly dependent on the initialization that we pick for our data. I started with using Nearest neighbour based initial path (ie Choose the nearest point as the next point in path), since theoretically, it gives the best result.
You know what gives better results than Nearest neighbour? Remember Humans! We must note that humans have very strong intuition for solving the Travelling salesman problem. Which is why, we expected that the order of points marked would be very close to our solution. And changing the initial path to the one marked by the users, we were able to reduce the number of failures to 10%!
Exploring remaining failures
The remaining failures can be easily classified in 3 cases(in decreasing order of occurrence):
- Manual errors: Majority of the cases. Multiple points marked where a single point should have been marked. In these cases, our TSP algorithm might return a different result but it doesn’t matter since most likely the manual marking is incorrect.
- Extremely high number of points Visually, the polygon is almost the same, but we are just reordering 1 or 2 points in the order. The error introduced in area in such cases is extremely low, meaning we can ignore them.
On the bright-side, see how well our implementation works for an extremely high number of points too :).
- Real errors are less than 1% of the total data. These are possibly due to the initialization we have taken, or in some cases how the points were marked by the user.
Re-iterating that humans like to work with software which works intuitively and they are able to visualize what points to add and where to make the polygon fit correctly, there are a couple of examples below showing with how a human could have fixed these errors in no time.
Future improvements
Lin-Kernighan heuristic is considered to be the most effective implementation for the TSP. It solves some of the problems with 2-opt such as problems with initialization, local optimum etc. It is a lot more complex as compared to 2-opt, but definitely something to go to if the number of wrong mappings increase.
Conclusion
There is no reason with just 1% of failures to not incorporate this into our system which can save time for majority of our farm mappings and provide a great experience to our ground team!
On a lighter note, this is what can possibly happen in future: 😀
Earlier
Ground team: Hi, I am unable to get the correct polygon for this farm.
Technical support: You see, you just need to mark points in order and the polygon would be what you expect.
Ground team: #$@&%*!
Now
Ground team: Hi, I am unable to get the correct polygon for this farm.
Technical support: You see, you need to mark the points such that they are a solution for Travelling Salesman Problem. At the same time, since we are using a heuristic solution, it is possible that you might not get the correct solution even after you choose the optimum points. Your best bet is to change the points marked by you or the order in which they are marked and try to run a 2-opt simulation in your head.
Ground team: #$@&%!#$@&%!#$@&%*!
Technical support: Just kidding, there is a manual override. Just mark them in order.
Ground team: #$@&%!#$@&%!#$@&%*!!#$@&%*!!#$@&%*!
Interested in solving similar challenging problems and impact agriculture with technology? Reach out to me and explore careers at LeanAgri.