I have a problem solving for x and y using multiple equations. I have different data points (in # of frames), as follows:
Group 1:
1003, 145, 1344, 66, 171, 962Group 2:
602, 140, 390, 1955, 289, 90
I have total hours as follows:
- Total Hours:
1999, 341, 1151, 2605, 568, 864
I have set these up in different equations like this:
1003x + 602y = 1999
145x + 140y = 341 and so on.
I would like to find the optimal values for x and y that make all equations as close to true as can be.
I tried a linear regression in Python to extract the data, but I am unsure if I am going down the right road or not.
Here is my code in Python:
dataset = pd.read_csv(r"C:\Users\path\to\.csv")
X = dataset[['Group 1 Frames', 'Group 2 Frames']]
y = dataset['Total Hours']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25, random_state=0)
regressor = LinearRegression()
regressor.fit(X_train, y_train)
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])
coeff_df
Now this gives me two different values, 1.3007 and 1.2314. After calculating the Mean Absolute Error and the Mean Squared Error, it seems the results conclude that the numbers are inaccurate and unusable.
Is there a way to more accurately calculate the desired x and y values?
My thoughts as to the error:
- My method (I am very new to python and data analysis like this, so I bet heavy on this one)
- Lack of data points (I can collect more)
xandydon't have a great relationship withTotal Hours, hence the high error

