How to teach a machine? #Image Annotation

Annotation is a process of labeling or tagging data. Annotation can be applied on any type of data. Ex: text, image, video, lidar etc. If the data is image then it is called Image Annotation.

As the automation is entering our day to day life, it is exciting to understand the process behind this automation. For example Autonomous (Self driving) vehicles, Home automation etc. How self driving cars drive? How home automation works?

The machine has to be intelligent to take real time decisions. To reach the destination, the vehicle has to choose the right path. This path details has to come from some source, for example google map. To stop the car when pedestrians cross the road! This should be decided based on the photograph captured by the vehicle camera. Car, in general a machine should analyze the image and perceive the details in it. Here comes the term Computer Vision.

Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos. (Source: Wikipedia)

To understand the image or to give a vision system to a computer we need to train it with all types of patterns which may exist in image or video. This process can be compared with how a child start identifying a cat, dog or moon. How to train the computer? We need to have training data, these training data has to be labeled.

Source : https://github.com/saksham789

Example : If you want a computer to identify whether there is a dog in a given photograph, the computer should know how the dog looks like. So we need to feed enough amount of images of dog (data set) and train a algorithm to identify dog after the training process. During the training process, the computer will save the features of dog. When image is given, the computer will look for the dog’s feature and output the result.

Similarly in the case of autonomous vehicle, we need to train the algorithm to identify different objects on its view. Hence the training data set should

contain all the objects which may come in the vehicle’s view.

There are different ways to annotate an image.

  • Bounding box,
  • Cuboid (3D Bounding box),
  • Polygon,
  • Landmark,
  • Point Group,
  • Polyline,
  • Polyline Group

Details of these individual types are explained in next article.