I’m extracting xml data from 465 webpages ,and parsing and storing it in “.csv” file using python dataframe. After running the program for 30 mins, the program saves “200.csv” files and kills itself. The command line execution says “Killed”. But when I run the program for first 200 pages and rest of 265 pages for extraction separately, it works well. I had thoroughly searched on the internet, no proper answer for this issue. Could you please tell me what could be the reason?
for i in list: addr = str(url + i + '?&$format=json') response = requests.get(addr, auth=(self.user_, self.pass_)) # print (response.content) json_data = response.json() if ('d' in json_data): df = json_normalize(json_data['d']['results']) paginate = 'true' while paginate == 'true': if '__next' in json_data['d']: addr_next = json_data['d']['__next'] response = requests.get(addr_next, auth=(self.user_, self.pass_)) json_data = response.json() df = df.append(json_normalize(json_data['d']['results'])) else: paginate = 'false' try: if(not df.empty): storage = '/usr/share/airflow/documents/output/' + i + '_output.csv' df.to_csv(storage, sep=',', encoding='utf-8-sig') else: pass except: pass
Thanks in advance!
Advertisement
Answer
It looks like you are running out of memory.
Can you try to increase allowed memory (fast solution)
Or optimize your code for less memory consumption (best solution)
If speed is not what is required, you can try to save data to temp files and read from them when needed, but I guess that for loop can be optimised for less memory consumption without using the file system.
After all, memory is the place where the loop should live.
Try to run your code without
try
catch
block