The Complexity Series P1: P vs. NP

Series Introduction

In this series of articles I would like to share with you a set of ideas that are meant to help non-technical people build a better intuition as to what the current software and A.I. trends are all about. My intention is to keep this Complexity Series as readable and digestible by anyone regardless of their technical background.

In order for us to understand how the current trends came about and why the computer science community is so focused on A.I. and complex systems in general, we first need to understand the underlying problem(s) that the community is trying to address. That’s why I’ll kick this series off by introducing the topic of Computational Complexity and specifically a problem known as P vs. NP. From there, we’ll dive into the specifics of A.I. in the next chapter and what type of advancements we can expect moving forward. So let’s start this series with a fun little thought experiment.

Introducing the problem: The Ultimate Offer

Imagine the following scenario: The powers that be offer you endless amounts of money and fame if you could solve 4 simple math problems that any 12 year old gets to work on in school. There are only two caveats:

  1. You must do the calculations by hand. No calculators or external aid of any kind.
  2. You have a time limit of 20 minutes to solve all 4 problems.

You think to yourself: “Heck yea, why not? That sounds like a great deal. I’ll do it!!”

Problem #1:

Question: What is the product of 111 x 133?

Your answer: 111 x 133 = 14763

This took you 15 seconds to solve and check for errors. The test administrator verifies your answer and congratulates you on your success!

Problem #2:

Question: Which two integers (factors) multiply to produce 121? 
You cannot use 1 x 121 as it is considered cheating!

Your answer: 121 = 11 x 11

It took you 15 seconds to solve and check for errors. The test administrator verifies your answer and congratulates you on your success!

Problem #3:

Question: What is the product of 654151 x 943343?

Your answer: 654151 x 943343 = 617088766793

This took you ~3 minutes to solve and double check for errors. The test administrator once again verifies the answer and congratulates you. On to the fourth and final problem.

Problem #4:

Question: Which two integers (factors) multiply to produce 201863? 
You cannot use 1 x 201863 as it is considered cheating!

You take a few minutes to think things through and come to the realisation that you don’t know of any easy way to solve this problem. The only thing that comes to mind is the “guess and check” method. So you grab your pen and paper and get to work:

2 is not a viable factor because 201863 / 2 = 100931.5 which is not an integer.
3 is not a viable factor because 201863 / 3 = 67287.66 which is not an integer.
.
.
67 is not a viable factor because 201863 / 67 = 3012.88 which is not an integer.
.

Before you know it, time is up. 20 minutes passed and you’ve unfortunately failed to generate the right answer in time. So what went wrong?

For the #1st problem, multiplying two 3-digit numbers took you 15 seconds, meanwhile multiplying two 6-digit numbers in the #3rd problem took you 3 minutes. It is obviously the case that as the numbers scale and grow larger, more time and work/effort is required to solve them. But even then, 3 minutes was a pretty decent time.

For problems #2 and #4, going from 3-digit number (121) to a 6-digit number (201863) resulted in a much bigger increase in the amount of work/effort required. This is because while these two problems, multiplication and factorization, may seem similar, they are actually very very different in nature.

All problems are not born equal: P vs. NP

In the early days of computing, scientists were busy grouping problems by their scaling difficulty. Some problems, like multiplication, seemed to have algorithms (like the ones you learn in school) that we’ve discovered to solve for them in a relatively short amount of time. As the number of digits grows and the problem gets harder, our algorithms seem to keep up. Other problems, like factorization, seem to elude us. In spite of our best efforts, we can’t figure out good algorithms for solving them quickly.

So, in order to formalise these observations and get a better handle on the situation, computer scientists and mathematicians created classes (buckets) of problems. When we encounter a computational problem, we look at how difficult it is to solve at scale, and based on that difficulty we place it into one of the classes/buckets. Initially the following classes were created:

zYIkC6O[1]

Verifiability, the act of being able to prove the correctness of an answer, turned out to be just as important as being able to generate the answer itself. If I were to ask you what the product of 111 x 133 is, you could compute that in a short period of time as you did in Problem #1. The test administrator verified your answer by doing that same multiplication and compared the results in order to check for mistakes. That’s exactly what all techers do for their students;  They compute and compare in order to verify. The fact that the multiplication problem is so easily solvable is what also made it possible for the test administrator to check your solution for mistakes.

In the #4th problem however (factoring 201863), it took you a long time and you still did not manage to produce a solution. Yet, if I were to tell you that 201863 has the factors 337 and 599, you could easily check my work by multiplying the two factors to see if they actually produce the right number. In other words, for the factorization problem, checking an answer for errors is easier than finding the two numbers. This is the difference between P and NP.

P is a class of problems that are easy to solve and easy to verify. For the sake of simplicity we can define “easiness” as: being solvable in a reasonable amount of time.

NP, on the other hand, is a class of problems that are hard to solve yet easy to verify.

EXP is a class of problems that are hard to solve and hard to verify. A classic example of an EXP problem would be a generic chess game. Think of a randomly generated chess board configuration with pieces all over the place, what’s the best move to make there? What algorithm do you even use to find out what the best move is? How can you even verify the correctness of any given suggestion?

It all boils down to scale. As you scale things up (in our math test we scaled the number of digits), certain problems grow much faster in difficulty than others and quickly become out of reach. And while we’re currently discussing algorithms and math questions, the same actually applies for any given problem we might face. Cooking dinner for two people is not the same as running a restaurant – even though the act of cooking might be the same. Scale is, and will always be, the fundamental challenge we have to overcome when solving important generalised problems of any kind.

Every decade or so we come across an algorithm that takes a problem from the NP domain to the P domain. This happens because some clever person makes a breakthrough that solves that problem much faster than we previously thought was possible. So the question now remains: is every NP-problem a P-problem in hiding? or are these classes fundamentally different and NP problems will forever be hard to solve? 

Why is this so important? 

There is a whole host of problems in NP that are extremely important for us to solve. They range from problems in encryption/security and cryptocurrency mining to computational biology (ex. cancer research) and language. A subset of these problems are known as NP-Complete. That means that finding a solution that lets us solve any one of those problems will immediately help us solve all of them.

An example of this would be if you can find a fast algorithm for playing an N x N size sudoku (yes sudoku the game an NP problem), you’ve essentially rendered most online encryption systems completely useless because those are based on NP problems as well. You can use that sudoku algorithm to break into peoples’ bank accounts. You can apply that solution to the protein folding problem which could help us find cures for diseases and thereby extending the average human lifespan. Heck, you could even use that algorithm to cause a lot of global political turmoil if you wanted to.

This is why the P vs NP question is so important. Hard to solve problems (NP/EXP/etc.) are the next frontier for us to conquer. Artificial Intelligence systems are actually nothing more than a way for us to try to approximate solutions for these hard to solve problems. And while A.I. may at first glance seem unrelated to the P vs NP problem, nothing could be further from the truth.

This first article was meant to set the stage. In part 2 of this series I’ll be diving into A.I. specifically and why what you’re seeing should both terrify as well as excite you. So stay tuned for the next one!

In the meantime, here is a phenomenal video for those of you that would like to learn more about P vs NP.

6 comments

Leave a Reply