Skip to main content

Formatting Met images for GAN work

Published onFeb 04, 2019
Formatting Met images for GAN work

Mount a local copy of the images

  1. Update the Met image set on the cluster (for bulk tasks applied to all images, the API latency is a bottleneck). Currently, this has only ~150K of the 420K total images, and is missing many of the prints/artwork that we could use for GAN experiments.

  2. Mount the Met images on our Azure vm.

sudo dpkg -i packages-microsoft-prod.deb
sudo apt-get update

install blobfuse:

sudo apt-get install blobfuse

create a temporary blobfuse path:

sudo mkdir /mnt/blobfusetmp
sudo chown <youruser> /mnt/blobfusetmp

create a config file for the storage container you want to mount:

accountName myaccount (genartdiag505)
accountKey storageaccesskey (here's how to find it)
containerName mycontainer (metartcopy, the blob container in your azure storage that you want to mount)

create an empty directory for mounting

mkdir ~/mycontainer

mount the blob !

blobfuse ~/mycontainer --tmp-path=/mnt/blobfusetmp --config-file=/path/to/fuse_connection.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120

you should now be able to access the files in that blob container inside the container in your VM.

Generate square thumbnails

  1. Convert each image to a 512x512 square in a new directory.
    Name it <ObjectID>.png

  2. Convert the set of images to a single large array: 512x512x3x<# of images>
    For each ObjectID, store R,G,B values for each pixel in the 512x512 array
    Annoyance: grayscale/B&W images may not have 3-channel RGB to start, may need conversion to match.

  3. Load this array into some tensor-friendly? format

Tensorflow records

The optimal format for input into tensorflow deep learning networks are TFRecords, a binary storage format. Here’s template code for making a tfRecord from image data and for loading that tfRecord into a tfDataset that can be used by a neural network. This code assumes that the images stored at image_path are already clipped/scaled as necessary.

  1. Load in list of files to put in our dataset & start our tfRecordWriter.

import tensorflow as tf
from glob import glob
#will also need some image-loading library

home_directory = ##
image_path = ##

##Load our data
images = glob(home_directory + image_path + '*.jpg')

##Make our tfRecord writer 
record_filename = '{}MET.tfrecords'.format(home_directory)
options = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.GZIP)
writer = tf.python_io.TFRecordWriter(record_filename,options=options)
  1. We need bytes, eventually, so we keep some type conversion functions on hand

    def _int64_feature(value):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
    def _int64_feature_b(value):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
    def _float_feature(value):
        return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
    def _float_feature_b(value):
        return tf.train.Feature(float_list=tf.train.FloatList(value=value))
    def _bytes_feature(value):
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
  2. Load a datapoint. Because we’re converting to bytes, we also need to save the shape of the array so we can reconstruct it later. Put everything into a “feature” and write that feature to the tfRecord. Here the class label is being saved as a string just for variety, but it could (should?) be treated as a 1-hot vector, integer, etc.

for image in images:
   #Load in image_data 
   image_data = image_data.astype(np.float32) ##Ensure the type that tensorflow expects is the type that this is!
   image_shape = np.shape(image_data) #512x512x3
   #Load in class_label, a string

   feature = {}
   feature['image'] = _bytes_feature(tf.compat.as_bytes(image_data.tostring()))
   feature['image_shape'] = _int64_feature_b(image_shape)
   feature['class'] = _bytes_feature(class_label.encode('ascii'))
   example = tf.train.Example(features=tf.train.Features(feature=feature))

  1. Check that the tfRecord is not corrupted.

def check_record(record_filename):
    """ Check if TF Record became corrupted in saving

    train_filename (string): TFRecord to check

    (bool): whether the TF Record is uncorrupted
    i (int): index of last record checked
    reader_opts = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.GZIP)
    record_iterator = tf.python_io.tf_record_iterator(path=record_filename,
    #check for corrupted records
        for _ in record_iterator:
            i += 1
    except Exception as e:
        print('Error in {} at record {}'.format(record_filename, i))
        return False, i
    return True, i

Now we need to load in our tfRecord. Here’s the overall logic of loading tfRecords into Datasets:

def combine_records(record_files, parser, group_key, group_reduce, set_type, feed_batch_size): 
   """ create dataset object: parse tfRecord, batch, shuffle, repeat for several epochs """

   #Read in tfRecord, make sure read options match write options
   reader_opts = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.GZIP)
   dataset =,compression_type="GZIP")

   #Parse bytes from Record into tfTensors
   dataset = 
   #Group tensors into batches for more efficient training
   dataset = dataset.apply(,reduce_func=group_reduce, window_size=feed_batch_size))
   #Shuffle and repeat dataset for training
   if set_type == 'train': 
      dataset = dataset.shuffle(dataset_params['buffer_size'],seed=dataset_params['shuffle_seed']).repeat(dataset_params['n_epochs'])
   #Load in data for next epoch while 
   if set_type == 'train' or set_type == 'test': 
      dataset = dataset.prefetch(buffer_size=1)

   return dataset
  1. We first write a parser, which will parse the bytes in the TFRecord into tensors

def parser(record):

parsed = tf.parse_single_example(record, {'image_data': tf.FixedLenFeature((), tf.string), 'image_shape': tf.FixedLenFeature([3], tf.int64), 'demo_name':tf.FixedLenFeature([],dtype=tf.string)})

return {'image_data': tf.reshape(tf.decode_raw(parsed['image_data'], tf.float32), parsed['image_shape']),'demo_name':tf.decode_raw(parsed['class_label'],tf.uint8)}
  1. Next we write group_key, which assigns an integer to each datapoint. Datapoints assigned to the same integer may be grouped together in a batch, datapoints with different integers will not be grouped together in a batch. Here we give everything the same integer.

def group_key(datapoint):
   return tf.constant(1, dtype=tf.int64)

3.Finally, we write group_reduce, which formats the batches. It must have these input arguments.

batch_size = 256
def group_reduce(key, window):
return window.padded_batch(batch_size=batch_size, padded_shapes={'image_data':tf.TensorShape([512, 512, 3]), 'class_label':tf.TensorShape([None])})

Now you can use this dataset in your neural network:

## Define the computation graph, including the use of your dataset
dataset = load_dataset(...)
iterator = dataset.make_one_shot_iterator() 
x = iterator.get_next()
# this is where you would define the neuralnet portion of your computation graph

## Run the computation graph
sess = tf.Session()
while True:
      iteration_start_time = time()
      z =
      print(''.join(chr(j) for j in list(x['class_label'][0, :])))
   except tf.errors.OutOfRangeError:
      print('Out of range of dataset.')

The above code simply prints out each image (as a numpy array) and its class label (as a string), where the size of the first dimension will equal the batch_size.


No comments here