Hands-Off Driving: Computer Vision on Our Roads

By Vivian Liu

It may be hard to believe, but the ride-share conglomerate Uber actually has a profit margin in the red. After subtracting driver costs and overhead costs from revenue, last year, Uber reported a net loss of $3 billion. The obvious question is: why? How can such a popular company be losing money? 

In the modern world, artificial intelligence (AI) is undeniable: it’s in our phones, in our healthcare system, on our roads, and is helping us explore new frontiers in space. Big Silicon Valley technology companies such as Google, Uber, and Facebook are racing to funnel resources into AI research and development.

The term “artificial intelligence” refers to intelligence—vision, speech recognition, and translation, among others—displayed by a machine. Whereas humans organically develop and store knowledge in neurons, machines have to “learn how to learn” through carefully coded syntax. The specific field of AI is very new, as the term “artificial intelligence” was only coined in 1956. Since its inception, the field has experienced exponential growth—from the chatbots that automate the customer service experience to Google Translate’s natural language processing algorithm, to Alpha Go’s legendary world champion-beating function, AI has accomplished some amazing feats. The infinite potential of AI has also taken pop culture by storm. For example, the eerily realistic robo-women in Ex Machina and the lovable Baymax from Big Hero Six display our dichotomous perceptions of AI in the media.

Specifically, a field of AI that has gained a lot of traction in recent years is computer vision (CV). Computer vision refers to the broad field of using machine learning to process and interpret images to provide useful information for humans. For example, the classic computer vision application is “training” a program to identify a cat from an image. When you see a picture of a cat, your brain doesn’t have to work very hard to make the association between pixels on a screen and the physical object. However, it is much harder for a computer system to establish this connection. There are two main issues to tackle—first, the system must identify the location of the cat in the image. Second, the system must be able to differentiate between a cat and other objects.

This is where big data comes in. Big data is an extremely large set of images—with a size ranging from the hundreds of thousands to millions—that are hand-marked by humans as positive (meaning they contain the object in question) or negative (which indicates that they do not contain the object). The human then divides the images into two types of data—training and test data—and writes code that outputs whether or not the computer thinks the object is in a given image.

After the programmer has set up the data and parameters, they adjust the inputs to maximize the accuracy of the program on the training data. By adjusting the parameters of  the training data, the programmer can observe what values of inputs bring about the highest rate of identification success. The goal is to maximize the number of images correctly marked in the set. Once the programmer is satisfied with the accuracy, the program is then evaluated on the test data to see how the model will fit on a different set of data. After a final round of adjustments, the program is ready to classify images that have not been pre-marked by a human.

One of the most popular applications of computer vision is in the development of self- driving cars. Designing cars that can drive without human intervention depends on computer vision. For example, below is an image taken of a busy street that has been marked up by a computer program through computer vision:

Image that the program “sees” in a self-driving car design

The program has even been trained to tell the difference between a “car” identifier and a “truck” using computer vision. Once these markers have been laid on the image, programmers write code to analyze the machine’s course of action given these on-screen parameters. For example, once the program marks the “traffic light” as “red,” the car is programmed to apply the brakes a certain amount. Now imagine doing this for image analysis continuously, with a constantly moving setting as you drive down the street. This is what companies such as Uber and Zoox have to deal with in order to deliver a product that will be able to handle the many perils of the open road. 

Back to Uber: its ultimate goal is to be able to deliver a self-driving product such that it can eliminate the necessity of a human driver, and instead “employ” much cheaper and autonomous drivers. This way, instead of having to continuously pay for human labor, they can pay a fixed cost on the self-driving cars and then a significantly lower cost of fueling them. 

Currently, the advancement of computer vision in the application of self-driving cars is not quite complete: much is left to be done to improve the safety of the software, accuracy of the image analysis, and incorporation into modern roads. But lawmakers and city planners have started to plan for the incorporation of computer vision technologies in our daily lives. The Silicon Valley-based company Zoox already has a fully autonomous vehicle which has been cleared to be tested in a limited capacity. 

AI is an extremely exciting field—its implications in our daily lives are far-reaching and will only grow in the next decade. So next time you are waiting in traffic, look around and see whether there is a driver in the car next to you.

Leave a Reply