Encoding of RLE

Why is the bounding box used to decode the RLE, shouldn’t it be the shape of the original image in question and not the bounding box of the respective mask? Is that how Hasty AI encodes the RLE?

Hi John!

To answer your first question, we store only the mask RLE to minimize redundancy and possibly encoding size. As you can imagine, we have millions of labels in the Hasty AI service, so storing information about where an object is NOT is kinda pointless. Also, since RLE stores Run lengths, we can’t store the information of where the object is NOT. By design, the encoding has to be of the actual object.

To answer your second question: You know where to place the object mask by the bounding box. To visualize the mask for an image, here is a pseudo-code algorithm (this assumes you’re using NumPy and python):

  1. Create a blank image with the same height and width of your image (image = np.zeros(shape=(height, width)))
  2. For each label in the image:
    Decode RLE mask using bbox (decoded_mask = rle_decode(mask, bbox))
    Assign the label to the blank image from step 1 at the coordinates specified by the bbox (image[bbox[1]:bbox[3], bbox[0]:bbox[2]][decoded_mask == 1] = decoded_mask)

Please take care that the order in which you assign the labels to the blank image matters (as you might draw over other annotations). You may want to sort the labels using the ‘z-index’ found in the hasty JSON to avoid this.

I hope this helps.

2 Likes