2

I have a list of values ​​in python who change between positive and negative values, and I want to work with the average of those values. However, I need initialize a counter when the number was positive and end the counting before it becomes positive again, on the last negative number. Then, take the average of this counting.

Here is my sample data:

   0;  2.3360; 0.4675
   1;  1.7439; 0.4174
   2;  1.3766; 0.3673
   3;  1.3766; 0.1719
   4;  1.4002; 0.1719
   5;  1.5687; 0.1719
   6;  2.2238; -0.6552
   7;  1.6181; -0.6552
   8;  2.2797; -0.6552
   9;  2.9562; -0.6552
  10;  3.4301; -0.6552
  11;  3.7597; -0.6552
  12;  4.0999; -0.6552
  13;  4.6294; -0.6552
  14;  4.4860; -0.6552
  15;  4.4504; 0.0356
  16;  4.3090; 0.1414
  17;  3.9967; 0.1556
  18;  3.8269; 0.1698
  19;  3.4952; 0.1978
  20;  3.2694; 0.1307
  21;  3.2059; 0.0635
  22;  3.1428; 0.0631
  23;  3.0802; 0.0626
  24;  2.9562; -0.0619
  25;  2.8950; -0.0612
  25;  2.8950; -0.0612
  26;  2.4214; -0.1155
  27;  2.2517; -0.1697
  28;  2.0055; -0.1900
  29;  1.7952; -0.1835
  30;  1.7952; 0.1835

In this case, I would need make the average from the value on the 0 position until the value on the 14 position, after, from the 15 until the 29, and from the 30 until the end.

However, this average is relative from the values on the second column, vazmed[1], but the range for this average, len(n), will variate with the third column, vazmed[2]

I'm tryna do this with an auxiliary variable, but, without success.

Here is what i did until now:

arq = open('vazdif.out', 'rt')

vazmed1 = []
vazmed2 = []

i = 0

for row in arq:
    field = row.split(';')
    vaz1 = float(field[1])
    vaz2 = float(field[2])
    vazmed1.append(vaz1)
    vazmed2.append(vaz2)
    i = i+1
n = len(vazmed1)
m = sum(vazmed1)
aux = 0
for actual in vazmed2:
    if actual < 0:
        aux = 1
    if actual >= 0 and aux == 1:
        aux = 0
    if aux == 1:
        average = m/n
        print(average)

If anybody could help me, and also explain where am i going wrong, I will be grateful.

3 Answers 3

1

The problem is you are calculating the m and n for the entire list (by iterating all the lines).

Then you are running another for loop in which you are printing average using 'm' and 'n' which are fixed. So, this will be printing average of the entire list every time.

Please refer below for some idea how to proceed:

arq = open('vazdif.out', 'rt')

vazmed1 = []
vazmed2 = []

aux = 0

for row in arq:
    field = row.split(';')

    # strip() is handle to remove the white space and newline characters.
    # not sure whether it is needed 
    vaz1 = float(field[1].strip())
    vaz2 = float(field[2].strip())

    # Logic here is when you get negative number, append the numbers to the list.
    # When you get positive number after the negative numbers has come, stop and calculate the average and proceed with the next iteration.

    # When you get positive number before the negative numbers, append the numbers to the list.

    if vaz2 < 0:
        vazmed1.append(vaz1)
        vazmed2.append(vaz2)
        aux = 1

    elif vaz2 >= 0 and aux == 1:
        n = len(vazmed1)
        m = sum(vazmed1)

        # calculate the average
        print(m/n)

        # update the lists to contain the new positive number encountered.
        vazmed1 = [vaz1]
        vazmed2 = [vaz2]

        aux = 0

    else:
        vazmed1.append(vaz1)
        vazmed2.append(vaz2)



# when you have reached EOF, there may be some numbers present in the vazmed1, so depending on the use case, you may want the average of the remaining ones too, include the below line.
print(sum(vazmed1)/ len(vazmed1))

PS: You don't need the vazmed2 list at all as you are calculating the averages for vazmed1 only.

Also, if you need the entire lists, you can use another temporary lists for average purposes and vazmed1 and vazmed2 for retaining the entire list data.

Sign up to request clarification or add additional context in comments.

Comments

1

You can also use pandas to get mean:

import pandas as pd
# get values into dataframe from csv
df = pd.read_csv('vazdif.out', sep=';', header=None, names=['ID','A','B'], index_col=0)

# compose start-end pairs (end is first positive number after last negative)
neg_pos = [0,0]
iters = []
for r in df.itertuples():
    if r.B < 0:
        neg_pos[-1] = r.Index
    elif neg_pos[-1] > 0:
        iters.append(neg_pos)
        neg_pos = [r.Index,0]

iters.append([neg_pos[0], r.Index])
# iters now contains start:end pairs - [[0, 14],[15, 29],[30, 30]]

# use aggregation to compute average and save it into array
avg = [df.loc[v[0]:v[1], 'A'].mean() for v in iters]    

print(avg)

Output:
[2.6581625, 3.0465705882352943, 1.7952]

Comments

0

Breaking up into discrete steps might be simpler (assuming performance isn't greatly affected concern):

arq = open('vazdif.out', 'rt')

vazmed1 = []
# vazmed2 = [] # not needed
val_range = []

sign = 0

for i, row in enumerate(arq):
    vaz1, vaz2  = row.split(';')[1:3]
    vazmed1.append(vaz1)
    # vazmed2.append(vaz2) # not needed

    # set the value if it doesn't exist
    if sign == 0:
        val_range.append(i)
        sign = 1 if vaz2 > 0 else -1

    # make note when the signs do not match
    if (sign > 0 and vaz2 < 0) or (sign < 0 and vaz2 < 0):
        val_range.append(i)
        sign = 1 if vaz2 > 0 else -1


# use the pair of indices to get average values
avg_vals = []
for n, v in zip(val_range[::2], val_range[1::2]):
    avg_val = mean(vazmed1[n:v+1])
    avg_vals.append(avg_val)


# You can print these values
print(*avg_vals, sep='\n')

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.