Skip to content
Advertisement

How can I take average of second column for degenerate values of first column in c++/python or using any other linux command?

I have a collection of data in a text file arranged in two columns. What I want is to calculate the average value for repeating numbers in the first column. e.g. for the first three rows take one average of the second column and so on. I will be grateful for any help you can provide.

0.628319 0.123401

0.628319 0.23044

0.628319 4.57734

0.888577 0.390783

1.40496 0.110672

1.40496 0.239377

1.40496 0.248376

1.40496 0.751108

1.40496 0.971678

1.40496 1.36865

Advertisement

Answer

Put the data in an Excel file and read it into a Pandas DataFrame. Compute the mean of the second column grouped by the first column.

import pandas as pd


# header=None because there are no column headers in my XLSX file
# Column names will be integers: 0 and 1
data = pd.read_excel("physics.xlsx", header=None, engine="openpyxl")

# What does "grouped means" mean?: Sort column 1 values by column 0 value and take mean of each column-1 group
grp_means = data.groupby(0).mean()
print(grp_means)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement