I am new at programming and I have written a script to extract text from a vcf file. I am using a Linux virtual machine and running Ubuntu. I have run this script through the command line by changing my directory to the file with the vcf file in and then entering python script.py
.
My script knows which file to process because the beginning of my script is:
my_file = open("inputfile1.vcf", "r+") outputfile = open("outputfile.txt", "w")
The script puts the information I need into a list and then I write it to outputfile. However, I have many input files (all .vcf
) and want to write them to different output files with a similar name to the input (such as input_processed.txt
).
Do I need to run a shell script to iterate over the files in the folder? If so how would I change the python script to accommodate this? I.e writing the list to an outputfile?
Advertisement
Answer
I would integrate it within the Python script, which will allow you to easily run it on other platforms too and doesn’t add much code anyway.
import glob import os # Find all files ending in 'vcf' for vcf_filename in glob.glob('*.vcf'): vcf_file = open(vcf_filename, 'r+') # Similar name with a different extension output_filename = os.path.splitext(vcf_filename)[0] + '.txt' outputfile = open(output_filename, 'w') # Process the data ...
To output the resulting files in a separate directory I would:
import glob import os output_dir = 'processed' os.makedirs(output_dir, exist_ok=True) # Find all files ending in 'vcf' for vcf_filename in glob.glob('*.vcf'): vcf_file = open(vcf_filename, 'r+') # Similar name with a different extension output_filename = os.path.splitext(vcf_filename)[0] + '.txt' outputfile = open(os.path.join(output_dir, output_filename), 'w') # Process the data ...