guxd / deep-code-search

DeepCS: Deep Code Search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

关于.h5数据

BGMpool2014 opened this issue · comments

我这里处理好了txt格式的文本数据,文本里的每一行是一个方法名或者tokens。我怎么将数据转换成.h5格式的?
不太熟悉tables操作。
数据好像是这种格式的
/phrases (EArray(102966,), shuffle, blosc(5)) ''
/indices (Table(10000,)) 'a table of indices and lengths'
然后通过incices里的pos和len从dada获取数据。
将txt转化为.h5这部分的代码有吗

class Index(tables.IsDescription):
    pos = tables.UInt32Col()
    length = tables.UInt32Col()
def save_hdf5(vecs, filename):
    '''save the processed data into a hdf5 file'''
    f = tables.open_file(filename, 'w')
    filters = tables.Filters(complib='blosc', complevel=5)
    earrays = f.create_earray(f.root, 'phrases', tables.Int16Atom(),shape=(0,),filters=filters)
    indices = f.create_table("/", 'indices', Index, "a table of indices and lengths")
    pos = 0
    for x in tqdm(vecs):
        earrays.append(numpy.array(x))
        ind = indices.row
        ind['pos'] = pos
        ind['length'] = len(x)
        ind.append()
        pos += len(x)
    f.close()