Skip to main content
edited tags
Link
Peilonrayz
  • 44.6k
  • 7
  • 80
  • 158
Source Link

Linear Regression in Scikit_learn

I have 2 datasets (one for training and the other for testing) containing information about days temperature and humidity; My programm should process the training dataset and find a relation between both, and then predicts humidity values from test dataset, after processing temperature values.

My training dataset has almost 97.000 rows of examples, but I only got 42% of accuracy. Maybe because weather is so complex to measure, or it's the program. Any tips for improvement are very welcome.

Training Dataset: https://drive.google.com/file/d/1d-jGkFlM6_Wf01UUZGH1mDbDAEyypgnL/view?usp=sharing

Testing Dataset: https://drive.google.com/file/d/1wRb-rufT046q7hR83l2IKcCB-raZYhLW/view?usp=sharing

import numpy as np
import pandas as pd
import sklearn as sk
from sklearn import linear_model


df_train = pd.read_csv("path\\weather_train.csv")
df_test = pd.read_csv("path\\weather_test.csv")


x_train = np.array(df_train['Temperature (C)']).reshape(-1, 1)
y_train = np.array(df_train['Humidity']).reshape(-1, 1)

x_test = np.array(df_test['Temperature (C)']).reshape(-1, 1)
y_test = np.array(df_test['Humidity']).reshape(-1, 1)

#The Model
algorithm = linear_model.LinearRegression()
algorithm.fit(x_train,y_train)

#Here it will predict humidity (Y values) from Temperature (X values) of test dataset and get precision%
print(algorithm.predict(x_test))
accu = algorithm.score(x_test, y_test)

print("==============================")
print(f"Accuracy: {accu * 100}%")
print("==============================")