Support with big dataset

Question

Support with big dataset

htw2012 opened this issue 6 years ago · comments

H.Tongwen commented 6 years ago

Hi,

If we want to find architectures on big datasets such as ImageNet, how to change our code to support it?

Thanks

Kirthevasan Kandasamy · Answer 1 · Fri Nov 23 2018 02:58:30 GMT+0800 (China Standard Time)

You should start here, which can take an architecture and evaluate it on your problem: https://github.com/kirthevasank/nasbot/blob/master/demos/cnn_function_caller.py

For specifics on converting our nn representation into TF, look at this directory: https://github.com/kirthevasank/nasbot/tree/master/cg

H.Tongwen · Answer 2 · Fri Nov 23 2018 14:01:56 GMT+0800 (China Standard Time)

Thank you, I try to use the generator to load the original data. I do it as follows:

model.train(input_fn=lambda: input_fn("train", training=True, batch_size=params['trainBatchSize']), steps=params['trainNumStepsPerLoop'])
results = model.evaluate(input_fn=lambda: input_fn("valid", training=False, batch_size=params['valiBatchSize']), steps=params['valiNumStepsPerLoop'])

    def input_fn(partition, training, batch_size):
        """Generate an input_fn for the Estimator."""

        def _input_fn():
            if partition == "train":
                dataset = tf.data.Dataset.from_generator(generator(trfile), (tf.float32, tf.int32), ((feature_dim), ()))
            else:
                dataset = tf.data.Dataset.from_generator(generator(vafile), (tf.float32, tf.int32), ((feature_dim), ()))

            # We call repeat after shuffling, rather than before, to prevent separate
            # epochs from blending together.
            if training:
                dataset = dataset.shuffle(10 * batch_size, seed=RANDOM_SEED).repeat()

            dataset = dataset.batch(batch_size)
            dataset = dataset.map(preprocess_text)
            iterator = dataset.make_one_shot_iterator()
            features, labels = iterator.get_next()
            return features, labels

        return _input_fn

FEATURES_KEY = 'x'
    
    # Define input layer (hook in data)
    def generator(inputfile):

        def _gen():
            with open(inputfile) as fr:
                for line in fr:
                    print("line", line)
                    feature, label = line_processor(embedding, line)
                    yield feature, label
        return _gen

    def preprocess_text(image, label):
        features = {FEATURES_KEY: image}
        return features, label

But when I debug the code in this line:
model_fn = get_model_fn(mlp, params['learningRate'], num_classes)

It will also point to this function.

def mlp_definition(features, nn, num_classes):
    """ Defines layers in tensorflow neural network, using info from nn python structure. """
    # Define input layer, cast data as tensor
    features = features['x']
    layers = [tf.reshape(tf.cast(features, tf.float32), features.shape)]  ### NEED TO VERIFY FLOAT32

I got the error TypeError: 'function' object has no attribute '__getitem__'.
This features is type of function not a IteratorGetNext, I don't know where I was wrong. Could you help me?