[Python]Split a Dataframe to a Chunk list

In many cases, you would need to split a huge dataframe to sub chunks so that you can insert or store or persist it to a database.

You may say ,hey, it is a piece of cake, I can simply use numpy.split to make it happen, like this:

import numpy as np
chunk_num = 10
chunks = np.split(my_df,chunk_num)

done? yes, if the df size is 100. but what if the size is 101? np.split is no longer fit the need. Here I created one of my own df split that works for all cases.

def split_df_chunks(data_df,chunk_size):
total_length = len(data_df)
total_chunk_num = math.ceil(total_length/chunk_size)
normal_chunk_num = math.floor(total_length/chunk_size)
chunks = []
for i in range(normal_chunk_num):
chunk = data_df[(i*chunk_size):((i+1)*chunk_size)]
chunks.append(chunk)
if total_chunk_num > normal_chunk_num:
chunk = data_df[(normal_chunk_num*chunk_size):total_length]
chunks.append(chunk)
return chunks

done

--

--