A Tale of Algorithms: Edit Distance in Web Development

A story about how I implemented a Dynamic Programming algorithm in a Django project...

ยท

7 min read

Two years ago, I was working on a delivery app. Our client wanted to create a system that could manage users ordering products, and service providers would register on the app to sell their products. Sounds easy, right?

In this article, I will tell you the story of how I got to implement a cool Dynamic Programming algorithm to satisfy the requirements of a not so experienced Project Manager. Keep reading until the end; I assure you it will be educational and fun.

Let's go!

Some Context

My role on the project was to create the API that would serve the web and mobile apps. We decided to go with Python, using Django Rest Framework for the backend side.

We were a fairly new team with little experience in the Software Development industry. But somehow we managed to understand each other well and to collaborate to keep the project moving forward.

As you can imagine, there are several parts involved in the software development process. In our case, these parts were:

  • A Back-end Team (where I was)

  • A Front-end Team (responsible for the web apps mainly)

  • A Mobile Team (responsible for the mobile apps)

  • A DevOps team (responsible for... well, you know, DevOps stuff ๐Ÿ˜…)

Whenever you are part of a team involving that many members, there is usually a person (or persons) that should keep track of the progress the team is making, should coordinate all the parts so that everyone knows what it's doing, and should decide on the next steps of the development process. That way, the developers can focus on what they do best: coding.

In our case, we had a so-called Project Manager in charge of all the previous and more. But, as you will see in a minute, it was not always so accurate in making the right decisions.

Enter the Browser

One day we had a request from our Project Manager to add a browser feature to the mobile apps. This seems like a good idea, right? The user would log in to the app and could search for something on the Home screen to speed up the process of ordering products.

My initial thought was: "We probably want a browser feature that can match somehow the name of the service providers". This seemed like the most obvious initial approach because then you could provide the user with a list of the most probable providers it was looking for, giving it the option to select the one that suits its necessities the most.

Still, I had to ask for some specifications so I could have a better understanding of how this new feature was conceived to be used. Then, the moment I asked the obvious question: "How will this be used? Should it match the names of the service providers?", I got the most unexpected answer.

"Yes! But not only that. It should also match the names of the brands, products, product categories, etc... Everything that we can match, it should!

Also, of course, the match doesn't have to be exact. Maybe the user can have some typos and miss a letter or two... We should be able to take that into account."

So, in my mind, it was clear that our Project Manager wanted to have a very specific browser feature (we only cared about its use in the app) that behaved the most similarly to the Google browser. Nice!

Dynamic Programming to the Rescue

When I was tasked with this browser feature I knew that I had to come up with a solution that solved the problem but that was not too overkilled also. Fortunately, I have some background in Data Structures and Algorithms and I could quickly find a solution that was good enough for us.

I have always struggled to find good applications of the algorithms I know in real-life scenarios. That is why I try to think of different approaches to the problems I face every day, hoping that someday it will allow me to implement and deploy some cool algorithms to production.

In this specific case, the algorithm that solved our problem was Edit Distance. This algorithm receives two strings as arguments and returns an integer value that represents the minimum amount of "operations" that you should perform on one of the strings to make both of them equal. The operations allowed are:

  • Insertion of a single symbol.

  • Deletion of a single symbol.

  • Substitution of a single symbol x for a symbol y.

As I see it, this algorithm returns a number that shows you how "close" two words are to each other. Which seems like a good ad-hoc approach to our brand-new browser.

Imagine that we want to know how many operations we need to convert the string kitten to sitting. The algorithm will return 3:

  • kitten โ†’ sitten (substitute s for k)

  • sitten โ†’ sittin (substitute i for e)

  • sittin โ†’ sitting (insert g at the end)

Do you see how to use this algorithm in our specific case? Keep reading so you find out how we put all this together.

Levenshtein goes to Production

With all the theoretical knowledge sorted out, the only thing missing was how to include the algorithm inside the application in a way that solved our initial requirements. Let's analyze a simplified version of the problem we were facing:

Suppose the user is only allowed to enter one word in the browser. How to match products, categories, brands, service providers, etc... with that word? Remember that the match doesn't have to be exact.

First of all, the easiest part is to get the input word from the user. Let's assume we already have that and that we called it query.

After that, we needed a candidate list of words to match our query. Fortunately, the corresponding entities in our database all had a name attribute that we could use for that purpose. So, we just had to collect every name from our Product, ProductCategory, ServiceProvider, and Brand entities.

With all these names collected, we could run the Edit Distance algorithm with our query word and the names. Then, we could sort the results from lowest to highest. This means that we would be recommending to the user a list of results starting from the ones that were more similar to the word that was entered in the browser.

A sample code would be something like this:

from utils import edit_distance

results = []
for entity in entities_collected:
    results.append((edit_distance(query, entity.name), entity))
results.sort(key=lambda x: x[0])

This worked pretty well for being an ad-hoc solution that came out of thin air. The only thing we had to adjust was the fact that, sometimes, the results we were showing differed too much from the original word entered as the query.

For example, usually, after the 10th word the value resulting from the Edit Distance algorithm was too high that even if it wasn't the 10th but the 1st result it would have made no sense to show it as a "match". To deal with this issue we introduced the concept of threshold to our algorithm.

This threshold would allow us to discard results whose edit distance to the query word was higher than a specified value. After a few tries tunning the threshold value and analyzing how it behaved with the data in our database, we decided on one and we were ready to test this feature in our staging cluster.

After that, we shipped it for production. Succesfully!

As a note, the process of tunning this threshold parameter was surprisingly educational for me. We started doing it completely wrong by setting actual numeric values in the code. But we ended up with a solution that used advanced Object-Oriented Programming concepts such as the Singleton Pattern.

I plan to create an article talking about this, so stay tuned!

Conclusions

I hope this story was inspiring enough for people that, like me, find beauty in algorithms.

I showed you how I and my team managed to quickly develop a solution that completely suited all our needs. And it all started with some very generic requirements that we had to satisfy somehow.

Sometimes, we get the most creative when faced with the most difficult tasks. So, my advice to you is: don't miss the chance to step up in front of a challenging situation. It will build up your trust, it will allow you to take out the best of yourself.

See you soon!


๐Ÿ‘‹ Hello, I'm Alberto, Software Developer at doWhile, Competitive Programmer, Teacher, and Fitness Enthusiast.

๐Ÿงก If you liked this article, consider sharing it.

๐Ÿ”— All links | Twitter | LinkedIn

Did you find this article valuable?

Support Alberto Gonzalez by becoming a sponsor. Any amount is appreciated!

ย