I have a data file that looks like this:
TOPIC:topic_0 2056 ab 2.0 cd 5.0 ef 3.0 gh 10.0 TOPIC:topic_1 1000 aa 3.0 bd 5.0 gh 2.0
and so on……till TOPIC:topic_2000. The first line is the topic and it’s weight. That is, I have the words in that specific topic and their respective weights.
Now, I want to sum up the second column of each topic and check what value it gives. That is, I want to get the output as:
Topic:topic_0 20 Topic:topic_1 10
That is, the topic number and the sum of column value (that is in topic 1, the weights of the words are 2,5,3,10). I tried using:
with open('Input.txt') as in_file: for line in in_file: columns = line.split(' ') value = columns[0] if value[:6] == 'TOPIC:': total_value = columns[1] total_value = total_value[:-1] total_values = float(total_value) #print 'n' print columns[0]
But, I am not sure how to proceed from this. This is just printing the topic numbers. Please help!
Advertisement
Answer
import re input = """ TOPIC:topic_0 2056 ab 2.0 cd 5.0 ef 3.0 gh 10.0 TOPIC:topic_1 1000 aa 3.0 bd 5.0 gh 2.0 """ result = {} for line in input.splitlines(): line = line.strip() if not line: continue columns = re.split(r"s+", line) value = columns[0] if value[:6] == 'TOPIC:': result[value] = [] points = result[value] continue points.append(float(columns[1])) for k, v in result.items(): print k, sum(v)