Skip to content
Advertisement

Summing columns in a text file

I have a data file that looks like this:

 TOPIC:topic_0 2056
 ab  2.0
 cd  5.0
 ef  3.0
 gh  10.0

 TOPIC:topic_1 1000
 aa  3.0
 bd  5.0
 gh  2.0

and so on……till TOPIC:topic_2000. The first line is the topic and it’s weight. That is, I have the words in that specific topic and their respective weights.

Now, I want to sum up the second column of each topic and check what value it gives. That is, I want to get the output as:

 Topic:topic_0  20
 Topic:topic_1  10

That is, the topic number and the sum of column value (that is in topic 1, the weights of the words are 2,5,3,10). I tried using:

with open('Input.txt') as in_file:
    for line in in_file:
        columns = line.split(' ')
        value = columns[0]

        if value[:6] == 'TOPIC:':
            total_value = columns[1]
            total_value = total_value[:-1]
            total_values = float(total_value)
            #print 'n'
            print columns[0]

But, I am not sure how to proceed from this. This is just printing the topic numbers. Please help!

Advertisement

Answer

import re

input = """
TOPIC:topic_0 2056
 ab  2.0
 cd  5.0
 ef  3.0
 gh  10.0

 TOPIC:topic_1 1000
 aa  3.0
 bd  5.0
 gh  2.0
"""

result = {}
for line in input.splitlines():
    line = line.strip()
    if not line:
        continue

    columns = re.split(r"s+", line)
    value = columns[0]
    if value[:6] == 'TOPIC:':
        result[value] = []
        points = result[value]
        continue

    points.append(float(columns[1]))

for k, v in result.items():
    print k, sum(v)
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement