Datathon cases

Using Machine Learning to explain and predict the life expectancy of different countries

The project tries to create a model based on data provided by the World Health Organization (WHO) to evaluate the life expectancy for different countries in years. The data offers a timeframe from 2000 to 2015. The data originates from here: https://www.kaggle.com/kumarajarshi/life-expectancy-who/data The output algorithms have been used to test if they can maintain their accuracy in predicting the life expectancy for data they haven’t been trained. Four algorithms have been used:

Linear Regression
Ridge Regression
Lasso Regression
ElasticNet Regression
Linear Regression with Polynomic features
Decision Tree Regression
Random Forest Regression

0
votes

Project

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

(!) WIDGET INSIDE AN ARTICLE (!)

from nose.tools import *
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.metrics import make_scorer
from scipy import stats
import seaborn as sns

Part 1. Loading packages

The following packages have been imported NymPy, Pandas, Matplotlib, Scipy, Seaborn. Sklearn is the most widely used package for the machine learning process. The fallowing subpackages have been used:

  1. train_test_split
  2. linear_model
  3. model_selection
  4. metrics
  5. tree
  6. ensemble
  7. preprocessing
In [59]:
life_data = pd.read_csv('data/lifeExpectancy.csv', sep = ',')
life_data = life_data.drop('Year', axis = 1)

Part 2. Reading the data

The data is saved as a csv file as LifeExpectancy.csv and it is read and stored in the life_data variable.The Year column is dropped as it will not be used in the analysis. Below the first 5 rows are shown. The data contains 21 columns and 2938 rows with the header row. The table contains data about:

  1. Country
  2. Status
  3. Life Expectancy
  4. Adult Mortality
  5. Alcohol
  6. percentage expenditure
  7. Hepatitis B
  8. Measles
  9. BMI
  10. under-five deaths
  11. Polio
  12. Total expenditure
  13. Diphtheria
  14. HIV/AIDS
  15. GDP
  16. Population
  17. thinness 1-19 years
  18. thinness 5-9 years
  19. Income composition of resources
  20. Schooling

With the exclution of Country name and Status(either developed or developing) all of the data is numeric. The values are either in years, precentages, millions or dollars in the case of Gross Domestic Product (GDP)

In [7]:
life_data.head()
Out[7]:
Country Status Life expectancy Adult Mortality infant deaths Alcohol percentage expenditure Hepatitis B Measles BMI Polio Total expenditure Diphtheria HIV/AIDS GDP Population thinness 1-19 years thinness 5-9 years Income composition of resources Schooling
0 Afghanistan Developing 65.0 263.0 62 0.01 71.279624 65.0 1154 19.1 6.0 8.16 65.0 0.1 584.259210 33736494.0 17.2 17.3 0.479 10.1
1 Afghanistan Developing 59.9 271.0 64 0.01 73.523582 62.0 492 18.6 58.0 8.18 62.0 0.1 612.696514 327582.0 17.5 17.5 0.476 10.0
2 Afghanistan Developing 59.9 268.0 66 0.01 73.219243 64.0 430 18.1 62.0 8.13 64.0 0.1 631.744976 31731688.0 17.7 17.7 0.470 9.9
3 Afghanistan Developing 59.5 272.0 69 0.01 78.184215 67.0 2787 17.6 67.0 8.52 67.0 0.1 669.959000 3696958.0 17.9 18.0 0.463 9.8
4 Afghanistan Developing 59.2 275.0 71 0.01 7.097109 68.0 3013 17.2 68.0 7.87 68.0 0.1 63.537231 2978599.0 18.2 18.2 0.454 9.5

5 rows × 21 columns

In [60]:
status = pd.get_dummies(life_data.Status)
life_data = pd.concat([life_data, status], axis = 1)
life_data = life_data.drop(['Status'], axis=1)
life_data.rename(columns = {'Deloping' : '0', 'Developed' : 1})
Out[60]:
Country Life expectancy Adult Mortality infant deaths Alcohol percentage expenditure Hepatitis B Measles BMI under-five deaths Diphtheria HIV/AIDS GDP Population thinness 1-19 years thinness 5-9 years Income composition of resources Schooling 1 Developing
0 Afghanistan 65.0 263.0 62 0.01 71.279624 65.0 1154 19.1 83 65.0 0.1 584.259210 33736494.0 17.2 17.3 0.479 10.1 0 1
1 Afghanistan 59.9 271.0 64 0.01 73.523582 62.0 492 18.6 86 62.0 0.1 612.696514 327582.0 17.5 17.5 0.476 10.0 0 1
2 Afghanistan 59.9 268.0 66 0.01 73.219243 64.0 430 18.1 89 64.0 0.1 631.744976 31731688.0 17.7 17.7 0.470 9.9 0 1
3 Afghanistan 59.5 272.0 69 0.01 78.184215 67.0 2787 17.6 93 67.0 0.1 669.959000 3696958.0 17.9 18.0 0.463 9.8 0 1
4 Afghanistan 59.2 275.0 71 0.01 7.097109 68.0 3013 17.2 97 68.0 0.1 63.537231 2978599.0 18.2 18.2 0.454 9.5 0 1
5 Afghanistan 58.8 279.0 74 0.01 79.679367 66.0 1989 16.7 102 66.0 0.1 553.328940 2883167.0 18.4 18.4 0.448 9.2 0 1
6 Afghanistan 58.6 281.0 77 0.01 56.762217 63.0 2861 16.2 106 63.0 0.1 445.893298 284331.0 18.6 18.7 0.434 8.9 0 1
7 Afghanistan 58.1 287.0 80 0.03 25.873925 64.0 1599 15.7 110 64.0 0.1 373.361116 2729431.0 18.8 18.9 0.433 8.7 0 1
8 Afghanistan 57.5 295.0 82 0.02 10.910156 63.0 1141 15.2 113 63.0 0.1 369.835796 26616792.0 19.0 19.1 0.415 8.4 0 1
9 Afghanistan 57.3 295.0 84 0.03 17.171518 64.0 1990 14.7 116 58.0 0.1 272.563770 2589345.0 19.2 19.3 0.405 8.1 0 1
10 Afghanistan 57.3 291.0 85 0.02 1.388648 66.0 1296 14.2 118 58.0 0.1 25.294130 257798.0 19.3 19.5 0.396 7.9 0 1
11 Afghanistan 57.0 293.0 87 0.02 15.296066 67.0 466 13.8 120 5.0 0.1 219.141353 24118979.0 19.5 19.7 0.381 6.8 0 1
12 Afghanistan 56.7 295.0 87 0.01 11.089053 65.0 798 13.4 122 41.0 0.1 198.728544 2364851.0 19.7 19.9 0.373 6.5 0 1
13 Afghanistan 56.2 3.0 88 0.01 16.887351 64.0 2486 13.0 122 36.0 0.1 187.845950 21979923.0 19.9 2.2 0.341 6.2 0 1
14 Afghanistan 55.3 316.0 88 0.01 10.574728 63.0 8762 12.6 122 33.0 0.1 117.496980 2966463.0 2.1 2.4 0.340 5.9 0 1
15 Afghanistan 54.8 321.0 88 0.01 10.424960 62.0 6532 12.2 122 24.0 0.1 114.560000 293756.0 2.3 2.5 0.338 5.5 0 1
16 Albania 77.8 74.0 0 4.60 364.975229 99.0 0 58.0 0 99.0 0.1 3954.227830 28873.0 1.2 1.3 0.762 14.2 0 1
17 Albania 77.5 8.0 0 4.51 428.749067 98.0 0 57.2 1 98.0 0.1 4575.763787 288914.0 1.2 1.3 0.761 14.2 0 1
18 Albania 77.2 84.0 0 4.76 430.876979 99.0 0 56.5 1 99.0 0.1 4414.723140 289592.0 1.3 1.4 0.759 14.2 0 1
19 Albania 76.9 86.0 0 5.14 412.443356 99.0 9 55.8 1 99.0 0.1 4247.614380 2941.0 1.3 1.4 0.752 14.2 0 1
20 Albania 76.6 88.0 0 5.37 437.062100 99.0 28 55.1 1 99.0 0.1 4437.178680 295195.0 1.4 1.5 0.738 13.3 0 1
21 Albania 76.2 91.0 1 5.28 41.822757 99.0 10 54.3 1 99.0 0.1 494.358832 291321.0 1.4 1.5 0.725 12.5 0 1
22 Albania 76.1 91.0 1 5.79 348.055952 98.0 0 53.5 1 98.0 0.1 4114.136545 2927519.0 1.5 1.6 0.721 12.2 0 1
23 Albania 75.3 1.0 1 5.61 36.622068 99.0 0 52.6 1 99.0 0.1 437.539647 2947314.0 1.6 1.6 0.713 12.0 0 1
24 Albania 75.9 9.0 1 5.58 32.246552 98.0 22 51.7 1 98.0 0.1 363.136850 29717.0 1.6 1.7 0.703 11.6 0 1
25 Albania 74.2 99.0 1 5.31 3.302154 98.0 68 5.8 1 97.0 0.1 35.129300 2992547.0 1.7 1.8 0.696 11.4 0 1
26 Albania 73.5 15.0 1 5.16 26.993121 98.0 6 49.9 1 98.0 0.1 279.142931 311487.0 1.8 1.8 0.685 10.8 0 1
27 Albania 73.0 17.0 1 4.54 221.842800 99.0 7 48.9 1 97.0 0.1 2416.588235 326939.0 1.8 1.9 0.681 10.9 0 1
28 Albania 72.8 18.0 1 4.29 14.719289 97.0 8 47.9 1 97.0 0.1 189.681557 339616.0 1.9 2.0 0.674 10.7 0 1
29 Albania 73.3 15.0 1 3.73 104.516916 96.0 16 46.9 1 98.0 0.1 1453.642777 3511.0 2.0 2.1 0.670 10.7 0 1
2908 Zambia 63.0 328.0 29 2.41 20.623063 79.0 35 22.3 42 79.0 4.8 185.793359 1515321.0 6.4 6.2 0.565 12.5 0 1
2909 Zambia 59.2 349.0 29 2.59 196.915250 78.0 896 21.7 43 78.0 5.6 1734.936120 14699937.0 6.5 6.3 0.554 12.3 0 1
2910 Zambia 58.2 366.0 29 2.57 183.046170 81.0 13234 21.2 44 81.0 6.3 1644.619672 14264756.0 6.6 6.4 0.543 12.0 0 1
2911 Zambia 58.0 363.0 30 2.47 184.364910 83.0 15754 2.7 45 83.0 6.8 1463.213573 138533.0 6.7 6.5 0.533 11.8 0 1
2912 Zambia 57.4 368.0 30 2.30 143.869887 94.0 26 2.2 47 94.0 9.1 1139.112330 13456417.0 6.7 6.6 0.518 11.6 0 1
2913 Zambia 55.7 45.0 31 2.12 153.678375 87.0 140 19.7 49 87.0 11.9 1369.682490 1382517.0 6.8 6.7 0.504 11.4 0 1
2914 Zambia 52.6 487.0 32 2.08 10.851482 8.0 535 19.2 51 8.0 13.6 114.587985 12725974.0 6.9 6.8 0.492 11.1 0 1
2915 Zambia 58.0 526.0 33 2.25 1.860004 81.0 459 18.8 52 81.0 15.9 13.154199 12383446.0 7.0 6.9 0.479 10.9 0 1
2916 Zambia 49.3 554.0 34 2.33 121.879331 82.0 45 18.4 55 82.0 17.0 691.317816 1252156.0 7.1 7.0 0.467 10.7 0 1
2917 Zambia 47.9 578.0 36 2.46 8.369852 NaN 35 18.0 59 83.0 17.6 53.277222 11731746.0 7.2 7.1 0.456 10.5 0 1
2918 Zambia 46.4 64.0 39 2.33 65.789974 NaN 881 17.6 62 83.0 18.2 429.158343 11421984.0 7.3 7.2 0.443 10.2 0 1
2919 Zambia 45.5 69.0 41 2.44 54.043480 NaN 25036 17.3 66 84.0 18.4 377.135244 111249.0 7.4 7.3 0.433 10.0 0 1
2920 Zambia 44.6 611.0 43 2.61 46.830275 NaN 16997 17.1 70 85.0 18.6 378.273624 1824125.0 7.4 7.4 0.424 9.8 0 1
2921 Zambia 43.8 614.0 44 2.62 45.616880 NaN 30930 16.8 72 85.0 18.7 341.955625 1531221.0 7.5 7.5 0.418 9.6 0 1
2922 Zimbabwe 67.0 336.0 22 NaN 0.000000 87.0 0 31.8 32 87.0 6.2 118.693830 15777451.0 5.6 5.5 0.507 10.3 0 1
2923 Zimbabwe 59.2 371.0 23 6.50 10.822595 91.0 0 31.3 34 91.0 6.3 127.474620 15411675.0 5.9 5.7 0.498 10.3 0 1
2924 Zimbabwe 58.0 399.0 25 6.39 10.666707 95.0 0 3.8 36 95.0 6.8 111.227396 155456.0 6.2 6.0 0.488 10.4 0 1
2925 Zimbabwe 56.6 429.0 26 6.09 92.602336 97.0 0 3.3 39 95.0 8.8 955.648466 1471826.0 6.5 6.4 0.464 9.8 0 1
2926 Zimbabwe 54.9 464.0 28 6.00 63.750530 94.0 0 29.9 42 93.0 13.3 839.927936 14386649.0 6.8 6.7 0.452 10.1 0 1
2927 Zimbabwe 52.4 527.0 29 5.21 53.308581 9.0 9696 29.4 44 89.0 15.7 713.635620 1486317.0 7.1 7.0 0.436 10.0 0 1
2928 Zimbabwe 50.0 587.0 30 4.64 1.040021 73.0 853 29.0 45 73.0 18.1 65.824121 1381599.0 7.5 7.4 0.419 9.9 0 1
2929 Zimbabwe 48.2 632.0 30 3.56 20.843429 75.0 0 28.6 46 75.0 20.5 325.678573 13558469.0 7.8 7.8 0.421 9.7 0 1
2930 Zimbabwe 46.6 67.0 29 3.88 29.814566 72.0 242 28.2 46 73.0 23.7 396.998217 1332999.0 8.2 8.2 0.414 9.6 0 1
2931 Zimbabwe 45.4 7.0 28 4.57 34.262169 68.0 212 27.9 45 7.0 26.8 414.796232 13124267.0 8.6 8.6 0.408 9.5 0 1
2932 Zimbabwe 44.6 717.0 28 4.14 8.717409 65.0 420 27.5 43 68.0 30.3 444.765750 129432.0 9.0 9.0 0.406 9.3 0 1
2933 Zimbabwe 44.3 723.0 27 4.36 0.000000 68.0 31 27.1 42 65.0 33.6 454.366654 12777511.0 9.4 9.4 0.407 9.2 0 1
2934 Zimbabwe 44.5 715.0 26 4.06 0.000000 7.0 998 26.7 41 68.0 36.7 453.351155 12633897.0 9.8 9.9 0.418 9.5 0 1
2935 Zimbabwe 44.8 73.0 25 4.43 0.000000 73.0 304 26.3 40 71.0 39.8 57.348340 125525.0 1.2 1.3 0.427 10.0 0 1
2936 Zimbabwe 45.3 686.0 25 1.72 0.000000 76.0 529 25.9 39 75.0 42.1 548.587312 12366165.0 1.6 1.7 0.427 9.8 0 1
2937 Zimbabwe 46.0 665.0 24 1.68 0.000000 79.0 1483 25.5 39 78.0 43.5 547.358879 12222251.0 11.0 11.2 0.434 9.8 0 1

2938 rows × 22 columns

In [61]:
life_data = life_data.groupby('Country').mean()
life_data.head()
Out[61]:
Life expectancy Adult Mortality infant deaths Alcohol percentage expenditure Hepatitis B Measles BMI under-five deaths Polio Diphtheria HIV/AIDS GDP Population thinness 1-19 years thinness 5-9 years Income composition of resources Schooling Developed Developing
Country
Afghanistan 58.19375 269.0625 78.2500 0.014375 34.960110 64.562500 2362.2500 15.51875 107.5625 48.3750 52.3125 0.10000 340.015425 9.972260e+06 16.58125 15.58125 0.415375 8.21250 0 1
Albania 75.15625 45.0625 0.6875 4.848750 193.259091 98.000000 53.3750 49.06875 0.9375 98.1250 98.0625 0.10000 2119.726679 6.969116e+05 1.61875 1.70000 0.709875 12.13750 0 1
Algeria 73.61875 108.1875 20.3125 0.406667 236.185241 78.000000 1943.8750 48.74375 23.5000 91.7500 91.8750 0.10000 2847.853392 2.164983e+07 6.09375 5.97500 0.694875 12.71250 0 1
Angola 49.01875 328.5625 83.7500 5.740667 102.100268 70.222222 3561.3125 18.01875 132.6250 46.1250 47.6875 2.36875 1975.143045 1.014710e+07 6.19375 6.66875 0.458375 8.04375 0 1
Antigua and Barbuda 75.05625 127.5000 0.0000 7.949333 1001.585226 98.266667 0.0000 38.42500 0.0000 96.9375 98.3125 0.12500 9759.305728 NaN 3.42500 3.37500 0.488625 8.84375 0 1

5 rows × 21 columns

Part 3. EDA

In [67]:
life_data.columns
Out[67]:
Index(['Life expectancy ', 'Adult Mortality', 'infant deaths', 'Alcohol',
       'percentage expenditure', 'Hepatitis B', 'Measles ', ' BMI ',
       'under-five deaths ', 'Polio', 'Total expenditure', 'Diphtheria ',
       ' HIV/AIDS', 'GDP', 'Population', ' thinness  1-19 years',
       ' thinness 5-9 years', 'Income composition of resources', 'Schooling',
       'Developed', 'Developing'],
      dtype='object')
In [94]:
plt.scatter(life_data[' HIV/AIDS'], life_data['Life expectancy '])
plt.xlabel('HIV/AIDS')
plt.ylabel('Life expectancy')
Out[94]:
Text(0,0.5,'Life expectancy')
In [69]:
plt.scatter(life_data.GDP, life_data['Life expectancy '])
plt.xlabel('GDP')
plt.ylabel('Life expectancy')
Out[69]:
Text(0,0.5,'Life expectancy')
In [74]:
plt.scatter(life_data[' BMI '], life_data['Life expectancy '])
plt.xlabel('BMI')
plt.ylabel('Life expectancy')
Out[74]:
Text(0,0.5,'Life expectancy')
In [75]:
plt.scatter(life_data['under-five deaths '], life_data['Life expectancy '])
plt.xlabel('under-five deaths')
plt.ylabel('Life expectancy')
Out[75]:
Text(0,0.5,'Life expectancy')
In [77]:
plt.scatter(life_data['Alcohol'], life_data['Life expectancy '])
plt.xlabel('Alcohol')
plt.ylabel('Life expectancy')
Out[77]:
Text(0,0.5,'Life expectancy')
In [88]:
plt.scatter(life_data['Adult Mortality'], life_data['Life expectancy '])
plt.xlabel('Adult Mortality')
plt.ylabel('Life expectancy')
Out[88]:
Text(0,0.5,'Life expectancy')
In [89]:
plt.scatter(life_data['Schooling'], life_data['Life expectancy '])
plt.xlabel('Schooling')
plt.ylabel('Life expectancy')
Out[89]:
Text(0,0.5,'Life expectancy')
In [91]:
plt.scatter(life_data['percentage expenditure'], life_data['Life expectancy '])
plt.xlabel('Percentage Healhcare expenditure')
plt.ylabel('Life expectancy')
Out[91]:
Text(0,0.5,'Life expectancy')

Using the scatter plot we plot the Life Expectancy against some other varyables to see if there is any correlation between them.
There seem to be a positive correlation between The Percentage of Healthcare Expenditure, Schooling, GDP and BMI and Life Expectancy, while there is a negative one between Adult Mortality, AIDS and Life Expectancy,there does not seem to have any correlation between Alcohol, under 5 years – old deaths and Life Expectancy.

In [86]:
plt.figure(figsize = (14, 10))
sns.heatmap(life_data.corr(), annot = True)
Out[86]:
<matplotlib.axes._subplots.AxesSubplot at 0x2050f36cc88>

Now we will plot the correlation matrix visulizing it with a heatmap. The legend tells that the warmer colors show higher and positive correlation, while the colder low or negative.
There is a very high correlation between thinness of 5-9 year-old and that of 1-19 year-old. Also between population and infant deaths, under 5 deaths, another is between schooling and income composition of resources. On the other hand Life expectancy and Adult Mortality are very highly negatively correlated.

Part 4. Preprocessing the data

The raw data is not suitable for us to start building a model so some preprocessing will be done. First the Status of the country is turned into numberical with the get_dummies function, so we get 2 new columns. The original column is being dropped. Second the data is being grouped by the country and we find the mean values during the 2000 – 2015 year period.
Then the Life expectancy column is removed to form the life_labels variable or the output, and the rest is stored as the life_features variable. Now we consider that we have some null values in the table, the isnull function has been used to find the with the boolean True. Below that the number of null values are displayed in each separate column. It is mostly situated in the Population and GDP columns.
Now the missing values are filled with the mean of its respective column. This will create some distortions, but the other option in removing parts of the table will shrink the data so it will be avoided here because the number of rows is not that high. The final shape for the life_features is 193 rows to 20 columns. Finally considering the large differences in the values of the columns, there will be some scaling with the MinMaxScaler function.
Now we will split the data into a training part of 70% and a testing of 30%. Cross validation will be initialized with the creation of 5 fold split.

In [95]:
life_labels = life_data['Life expectancy ']
life_features = life_data.drop('Life expectancy ', axis = 1)
In [7]:
life_features.isnull().head()
Out[7]:
Adult Mortality infant deaths Alcohol percentage expenditure Hepatitis B Measles BMI under-five deaths Polio Total expenditure Diphtheria HIV/AIDS GDP Population thinness 1-19 years thinness 5-9 years Income composition of resources Schooling Developed Developing
Country
Afghanistan False False False False False False False False False False False False False False False False False False False False
Albania False False False False False False False False False False False False False False False False False False False False
Algeria False False False False False False False False False False False False False False False False False False False False
Angola False False False False False False False False False False False False False False False False False False False False
Antigua and Barbuda False False False False False False False False False False False False False True False False False False False False
In [13]:
life_features.isnull().sum()
Out[13]:
Adult Mortality                    10
infant deaths                       0
Alcohol                             2
percentage expenditure              0
Hepatitis B                         9
Measles                             0
 BMI                                4
under-five deaths                   0
Polio                               0
Total expenditure                   2
Diphtheria                          0
 HIV/AIDS                           0
GDP                                30
Population                         48
 thinness  1-19 years               4
 thinness 5-9 years                 4
Income composition of resources    17
Schooling                          13
Developed                           0
Developing                          0
dtype: int64
In [14]:
life_labels.isnull().sum()
Out[14]:
10
In [98]:
life_features.fillna(value = life_features.mean(), inplace = True)
Out[98]:
Adult Mortality infant deaths Alcohol percentage expenditure Hepatitis B Measles BMI under-five deaths Polio Total expenditure Diphtheria HIV/AIDS GDP Population thinness 1-19 years thinness 5-9 years Income composition of resources Schooling Developed Developing
Country
Afghanistan 269.062500 78.2500 0.014375 34.960110 64.562500 2362.2500 15.51875 107.5625 48.375000 8.252500 52.312500 0.10000 340.015425 9.972260e+06 16.58125 15.58125 0.415375 8.212500 0 1
Albania 45.062500 0.6875 4.848750 193.259091 98.000000 53.3750 49.06875 0.9375 98.125000 5.945625 98.062500 0.10000 2119.726679 6.969116e+05 1.61875 1.70000 0.709875 12.137500 0 1
Algeria 108.187500 20.3125 0.406667 236.185241 78.000000 1943.8750 48.74375 23.5000 91.750000 4.604000 91.875000 0.10000 2847.853392 2.164983e+07 6.09375 5.97500 0.694875 12.712500 0 1
Angola 328.562500 83.7500 5.740667 102.100268 70.222222 3561.3125 18.01875 132.6250 46.125000 3.919333 47.687500 2.36875 1975.143045 1.014710e+07 6.19375 6.66875 0.458375 8.043750 0 1
Antigua and Barbuda 127.500000 0.0000 7.949333 1001.585226 98.266667 0.0000 38.42500 0.0000 96.937500 4.791333 98.312500 0.12500 9759.305728 1.257006e+07 3.42500 3.37500 0.488625 8.843750 0 1
Argentina 106.000000 10.1250 7.966667 773.038981 81.285714 2.0000 54.98125 11.3750 93.375000 6.912667 92.375000 0.10000 6998.575103 2.012120e+07 1.07500 0.95000 0.794125 16.506250 0 1
Armenia 117.375000 1.0000 3.702667 131.007419 87.562500 274.2500 44.70625 1.0000 89.125000 4.861333 87.375000 0.10000 1999.986318 1.015165e+06 2.03750 2.11250 0.698375 11.787500 0 1
Australia 63.187500 1.0000 10.155333 5332.226473 93.400000 103.9375 55.86250 1.3750 86.750000 8.836667 86.875000 0.10000 34637.565047 4.587010e+06 0.66875 0.62500 0.918125 20.037500 1 0
Austria 65.750000 0.0000 12.236000 4928.439188 81.062500 77.2500 48.28750 0.0000 86.000000 4.715333 86.750000 0.10000 33827.476309 6.474880e+06 1.73125 1.93750 0.862375 15.387500 1 0
Azerbaijan 115.187500 6.0625 0.992000 131.148747 59.214286 598.1875 43.84375 7.1875 75.812500 5.679333 82.312500 0.10000 3302.947807 2.157370e+06 2.90000 2.94375 0.702687 11.350000 0 1
Bahamas 152.062500 0.0000 10.271333 0.000000 83.600000 0.0000 52.88125 0.0000 95.000000 6.706000 95.750000 0.11875 7223.452050 1.257006e+07 2.55000 2.51875 0.737938 12.425000 0 1
Bahrain 65.375000 0.0000 1.934667 602.087140 98.125000 6.6875 51.65625 0.0000 98.000000 3.807333 98.062500 0.10000 11191.247421 1.257006e+07 6.10625 5.95000 0.807937 14.112500 0 1
Bangladesh 141.375000 153.6250 0.010000 32.320972 77.615385 4649.9375 12.87500 201.5625 87.500000 2.854667 92.750000 0.10000 443.561481 4.298516e+07 13.77500 14.33125 0.517875 8.693750 0 1
Barbados 77.625000 0.0000 8.206667 389.076995 84.066667 0.0000 45.97500 0.0000 80.312500 6.122000 89.937500 0.25625 12017.098529 1.257006e+07 3.95625 3.91875 0.770437 14.893750 0 1
Belarus 218.750000 0.5625 13.497333 329.046455 88.937500 24.8125 54.74375 0.8750 89.875000 6.192000 92.187500 0.10000 3811.525348 6.371867e+06 2.21250 2.33750 0.743250 14.750000 0 1
Belgium 70.187500 0.2500 11.042667 2392.432657 74.500000 81.5625 50.89375 1.0000 97.750000 5.962000 97.312500 0.10000 16915.306000 2.884043e+06 0.86250 0.85625 0.877750 16.787500 1 0
Belize 155.500000 0.0000 6.252667 365.843490 94.750000 0.0000 37.67500 0.0000 95.500000 5.039333 95.187500 0.43125 3933.012174 1.703929e+05 3.55625 3.48750 0.693937 12.456250 0 1
Benin 269.375000 25.0000 1.000000 43.991956 62.571429 1116.8750 19.61250 39.2500 67.687500 4.636000 67.937500 1.70625 561.167968 3.942979e+06 8.25000 8.13750 0.438062 8.850000 0 1
Bhutan 230.250000 0.6875 0.278667 134.764070 94.125000 80.2500 17.58125 0.9375 89.375000 5.346000 93.875000 0.35625 1353.986946 4.925849e+05 17.21875 17.92500 0.183875 10.225000 0 1
Bolivia (Plurinational State of) 177.500000 10.7500 3.212000 0.000000 82.875000 7.6250 44.81875 14.0000 81.687500 5.626000 87.375000 0.15625 7223.452050 1.257006e+07 1.30000 1.18750 0.633750 13.962500 0 1
Bosnia and Herzegovina 64.937500 0.0000 4.405333 215.457459 71.666667 319.1875 48.96250 0.0000 78.312500 8.742000 78.500000 0.10000 2245.026024 1.986993e+06 2.78125 2.75625 0.450375 12.225000 0 1
Botswana 448.125000 2.0000 4.670000 334.266971 87.312500 237.8750 32.24375 3.1250 96.187500 5.510667 95.812500 16.52500 4498.285431 1.063867e+06 7.43750 7.81250 0.630375 12.137500 0 1
Brazil 150.687500 68.2500 7.213333 390.695595 96.187500 95.3750 47.06250 77.2500 98.312500 8.019333 97.937500 0.10000 6143.161794 8.812807e+07 3.05000 3.01250 0.710187 14.162500 0 1
Brunei Darussalam 67.062500 0.0000 0.378667 1276.879485 97.875000 7.3125 29.71875 0.0000 97.500000 2.820000 96.812500 0.10000 19744.808102 1.257006e+07 6.20000 5.67500 0.839375 14.106250 0 1
Bulgaria 125.500000 0.9375 10.865333 350.745204 94.500000 1530.3125 54.50000 1.0000 94.250000 7.236000 94.000000 0.10000 4938.981821 5.290924e+06 2.13125 2.15625 0.754625 13.725000 1 0
Burkina Faso 252.500000 44.7500 4.241333 52.045581 81.100000 5091.7500 15.50000 79.9375 69.000000 6.038000 77.312500 1.83750 410.372034 6.143695e+06 7.41875 6.98125 0.229688 5.406250 0 1
Burundi 291.562500 23.2500 4.103333 15.335498 93.333333 1349.5000 15.31250 35.5625 86.625000 7.089333 85.750000 3.08750 137.815321 3.915447e+06 8.01875 8.01250 0.327875 7.481250 0 1
Cabo Verde 116.187500 0.0000 3.448000 188.632987 80.642857 0.1250 24.37500 0.0000 84.375000 4.755333 79.000000 0.53125 2023.541039 2.926002e+05 8.01250 7.97500 0.570000 12.200000 0 1
Cambodia 196.375000 16.7500 1.486000 33.796561 79.600000 1880.8125 15.36250 20.5000 74.187500 5.740667 72.062500 1.02500 466.196878 7.145967e+06 10.14375 10.88125 0.491937 9.875000 0 1
Cameroon 294.875000 54.9375 4.141333 44.462300 76.818182 2979.2500 23.61875 86.1875 72.500000 4.719333 69.437500 6.21250 781.016004 9.967292e+06 6.55000 6.60625 0.470000 8.906250 0 1
Suriname 166.000000 0.0000 4.984000 508.494033 85.272727 0.0000 50.15625 0.0000 78.250000 6.662667 82.562500 0.76250 4781.216822 1.909166e+05 3.51250 3.45625 0.481062 11.825000 0 1
Swaziland 339.000000 2.6250 4.463333 278.099713 76.000000 49.1250 25.50000 3.6875 89.000000 7.308667 83.750000 32.94375 2165.090838 4.573031e+05 6.47500 6.63125 0.515688 10.350000 0 1
Sweden 59.187500 0.0000 6.926667 4438.163154 59.200000 18.9375 56.25000 0.0000 98.312500 9.932667 98.312500 0.10000 29334.990639 5.514868e+06 1.35000 1.30625 0.893125 15.868750 1 0
Switzerland 55.750000 0.0000 10.338000 9801.810377 78.518282 397.5000 51.43750 0.0000 95.375000 6.087333 94.562500 0.10000 57362.874601 5.913242e+06 0.53750 0.39375 0.911062 15.393750 1 0
Syrian Arab Republic 139.625000 8.0000 0.804667 39.299486 64.125000 295.6875 47.31250 9.5625 64.812500 3.946667 48.437500 0.10000 1087.435774 6.741445e+06 6.45000 6.25625 0.618062 10.981250 0 1
Tajikistan 177.562500 10.6875 0.330667 17.277192 84.428571 208.1875 33.11250 12.7500 88.312500 5.648000 89.250000 0.25625 335.841725 4.751355e+06 3.80625 3.85000 0.583812 10.681250 0 1
Thailand 160.375000 12.1250 6.131333 401.259963 97.000000 4311.9375 21.59375 14.1250 98.312500 3.757333 98.187500 0.38750 3494.781702 3.117051e+07 8.54375 8.70000 0.694688 12.550000 0 1
The former Yugoslav republic of Macedonia 60.812500 0.0000 1.858667 0.000000 83.545455 74.7500 52.90625 0.0000 95.500000 7.536000 94.875000 0.10000 7223.452050 1.257006e+07 2.43125 2.45625 0.455750 12.381250 0 1
Timor-Leste 170.375000 2.3750 0.235333 21.187523 76.000000 85.0625 14.55000 3.0000 63.214286 1.646667 64.214286 0.10000 551.710649 4.601956e+05 10.52500 11.69375 0.517625 10.700000 0 1
Togo 311.312500 14.1875 0.947333 23.705604 76.625000 532.5000 17.99375 21.4375 67.187500 5.243333 74.312500 3.86250 317.204092 3.722046e+06 7.86875 7.73750 0.445937 10.650000 0 1
Tonga 129.625000 0.0000 1.167333 323.809343 75.500000 0.2500 62.94375 0.0000 76.875000 5.206000 70.312500 0.10000 1981.555186 2.658981e+04 0.10000 0.10000 0.698813 14.218750 0 1
Trinidad and Tobago 163.375000 0.0000 6.044667 559.314872 54.928571 0.0000 37.73125 0.2500 76.437500 5.199333 66.500000 0.62500 7741.748090 7.761331e+05 6.13750 6.50625 0.753375 12.331250 0 1
Tunisia 18.750000 3.2500 1.310667 366.741260 96.750000 33.1875 48.25625 3.6875 97.687500 6.052667 97.687500 0.10000 3044.081488 3.274493e+06 6.40625 6.33750 0.692625 14.056250 0 1
Turkey 98.375000 26.9375 1.421333 253.417178 87.062500 5272.9375 56.41250 32.2500 80.812500 5.618000 80.750000 0.10000 3983.917722 3.350135e+07 5.01250 4.85000 0.703313 12.675000 0 1
Turkmenistan 214.812500 6.5625 2.654667 187.781988 97.285714 15.1250 38.11875 8.0625 95.562500 2.828667 96.375000 0.10000 2511.611540 2.635550e+06 3.33750 3.36250 0.211625 9.831250 0 1
Tuvalu 164.796448 0.0000 0.010000 78.281203 9.000000 0.0000 79.30000 0.0000 9.000000 16.610000 9.000000 0.10000 3542.135890 1.819000e+03 0.20000 0.10000 0.629502 0.000000 0 1
Uganda 300.187500 89.6875 8.050667 47.692108 63.428571 12394.6250 15.52500 138.8750 65.437500 8.542000 64.875000 7.65000 421.048496 1.554059e+07 6.26250 6.25625 0.445312 10.593750 0 1
Ukraine 205.750000 5.0625 7.369333 169.427330 54.625000 5395.1875 50.89375 5.9375 81.437500 6.777333 70.000000 0.53750 1577.293329 1.000493e+07 2.57500 2.67500 0.716563 14.606250 0 1
United Arab Emirates 67.062500 1.0000 1.750000 1886.731590 93.875000 75.6250 53.80625 1.0000 95.062500 3.036667 94.625000 0.10000 22110.366986 1.257006e+07 5.18125 4.96875 0.819563 12.812500 0 1
United Kingdom of Great Britain and Northern Ireland 70.375000 3.6250 11.131250 0.000000 78.518282 715.7500 55.38750 4.0625 92.875000 8.534000 92.875000 0.10000 7223.452050 1.257006e+07 0.75000 0.50625 0.629502 11.894097 1 0
United Republic of Tanzania 304.437500 95.2500 3.582667 0.000000 73.500000 3348.5625 17.31875 143.1875 83.375000 4.701333 74.687500 7.27500 7223.452050 1.257006e+07 7.42500 7.35625 0.629502 11.894097 0 1
United States of America 58.187500 26.1875 8.579333 0.000000 81.375000 130.6250 58.45000 30.7500 82.125000 15.863333 95.125000 0.10000 7223.452050 1.257006e+07 0.73125 0.60625 0.629502 11.894097 1 0
Uruguay 119.937500 0.5625 6.172667 621.838919 94.312500 0.0000 52.92500 0.7500 94.250000 8.750000 89.125000 0.10000 7192.584875 2.396771e+06 1.60000 1.54375 0.765625 15.231250 0 1
Uzbekistan 184.812500 21.9375 1.608667 44.373450 95.642857 208.4375 34.80625 25.6875 98.562500 5.638000 98.437500 0.20625 651.092359 9.036317e+05 3.14375 3.17500 0.603000 11.643750 0 1
Vanuatu 137.875000 0.0000 0.806667 282.325746 56.125000 20.8750 44.25625 0.0000 66.187500 3.928667 59.062500 0.10000 2000.245518 1.230962e+05 1.56875 1.49375 0.367500 10.568750 0 1
Venezuela (Bolivarian Republic of) 163.000000 9.3750 7.420000 0.000000 66.250000 165.0000 54.48750 10.7500 74.687500 4.998667 68.500000 0.10000 7223.452050 1.257006e+07 1.65000 1.56250 0.726812 12.787500 0 1
Viet Nam 126.562500 29.1875 3.087333 0.000000 87.538462 4232.9375 11.18750 36.5000 94.937500 5.977333 91.750000 0.14375 7223.452050 1.257006e+07 14.92500 15.62500 0.627063 11.512500 0 1
Yemen 211.812500 39.3750 0.047333 0.000000 55.687500 2761.1875 33.48750 51.6250 67.125000 5.005333 72.625000 0.10000 7223.452050 1.257006e+07 13.83125 13.75000 0.475500 8.506250 0 1
Zambia 354.312500 33.4375 2.239333 89.650407 69.818182 6563.8125 17.45000 52.3750 64.375000 5.824000 74.250000 11.93125 811.811841 6.260246e+06 6.88125 6.76250 0.498437 11.212500 0 1
Zimbabwe 462.375000 26.5625 4.482000 20.364271 70.562500 923.0000 25.13750 40.8125 75.625000 6.158667 75.187500 23.26250 410.980194 8.021343e+06 7.01250 6.98750 0.439125 9.825000 0 1

193 rows × 20 columns

In [97]:
life_labels.fillna(value = life_labels.mean(), inplace = True)
In [9]:
stats.describe(life_features[1:])
Out[9]:
DescribeResult(nobs=192, minmax=(array([  1.87500000e+01,   0.00000000e+00,   1.00000000e-02,
         0.00000000e+00,   8.00000000e+00,   0.00000000e+00,
         5.20000000e+00,   0.00000000e+00,   9.00000000e+00,
         1.64666667e+00,   9.00000000e+00,   1.00000000e-01,
         1.36183210e+02,   2.92000000e+02,   1.00000000e-01,
         1.00000000e-01,   1.31687500e-01,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00]), array([  5.50062500e+02,   1.36668750e+03,   1.34973333e+01,
         9.80181038e+03,   9.90000000e+01,   6.58579375e+04,
         8.73000000e+01,   1.81250000e+03,   9.90000000e+01,
         1.72400000e+01,   9.90000000e+01,   3.29437500e+01,
         5.73628746e+04,   4.21467691e+08,   2.71000000e+01,
         2.79437500e+01,   9.31437500e-01,   2.00375000e+01,
         1.00000000e+00,   1.00000000e+00])), mean=array([  1.64253396e+02,   2.85745443e+01,   4.46296480e+00,
         7.12321318e+02,   7.85909681e+01,   2.30174674e+03,
         3.96797168e+01,   3.96419271e+01,   8.26078218e+01,
         6.01668050e+00,   8.23842541e+01,   1.67047526e+00,
         7.25930328e+03,   1.25835883e+07,   4.62089964e+00,
         4.65416339e+00,   6.30617733e-01,   1.19132722e+01,
         1.66666667e-01,   8.33333333e-01]), variance=array([  8.75843142e+03,   1.28540496e+04,   1.48700520e+01,
         2.10950227e+06,   3.38370018e+02,   6.25178778e+07,
         3.01949462e+02,   2.36800172e+04,   2.74638648e+02,
         4.83699092e+00,   2.92218598e+02,   1.89712358e+01,
         9.65744006e+07,   1.09787565e+15,   1.57725109e+01,
         1.65888768e+01,   3.03882176e-02,   1.03601660e+01,
         1.39616056e-01,   1.39616056e-01]), skewness=array([  1.19225145,   9.3630654 ,   0.5920435 ,   3.27460365,
        -1.36320782,   6.05254594,   0.09453142,   9.01254987,
        -1.3261097 ,   1.66458352,  -1.43066862,   4.23443451,
         2.46889639,  10.16055662,   1.72955815,   1.8023565 ,
        -0.48106083,  -0.71599891,   1.78885438,  -1.78885438]), kurtosis=array([   1.604733  ,  101.90959278,   -0.86402143,   12.36028988,
          2.20027664,   38.55513895,   -0.66005587,   94.30411285,
          1.85630801,    6.59116886,    2.06805175,   20.55716829,
          6.62435538,  119.83739182,    5.27091348,    5.62745981,
         -0.34978933,    1.50234464,    1.2       ,    1.2       ]))
In [99]:
min_max_scaler = MinMaxScaler()
life_features = min_max_scaler.fit_transform(life_features)
In [11]:
life_features
Out[11]:
array([[  4.71121045e-01,   5.72552248e-02,   3.24378429e-04, ...,
          4.09856519e-01,   0.00000000e+00,   1.00000000e+00],
       [  4.95235855e-02,   5.03041112e-04,   3.58762543e-01, ...,
          6.05739239e-01,   0.00000000e+00,   1.00000000e+00],
       [  1.68333137e-01,   1.48625783e-02,   2.94103109e-02, ...,
          6.34435434e-01,   0.00000000e+00,   1.00000000e+00],
       ..., 
       [  3.63369015e-01,   2.88105364e-02,   2.76802926e-03, ...,
          4.24516532e-01,   0.00000000e+00,   1.00000000e+00],
       [  6.31572756e-01,   2.44660905e-02,   1.65290890e-01, ...,
          5.59575795e-01,   0.00000000e+00,   1.00000000e+00],
       [  8.34960593e-01,   1.94356793e-02,   3.31570362e-01, ...,
          4.90330630e-01,   0.00000000e+00,   1.00000000e+00]])
In [100]:
life_features_train, life_features_test, life_labels_train, life_labels_test = train_test_split(
        life_features, life_labels, train_size = 0.7, test_size = 0.3)

Part 5 Linear Regression and additions

In [101]:
linear_model = LinearRegression()
linear_model.fit(life_features_train, life_labels_train)
Out[101]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [102]:
print('R_square score on the training: %.2f' % linear_model.score(life_features_train, life_labels_train))
R_square score on the training: 0.92
In [103]:
linear_model_predict = linear_model.predict(life_features_test)
In [32]:
print('Coefficients: \n', linear_model.coef_)
print("Mean squared error: %.2f"
      % mean_squared_error(life_labels_test, linear_model_predict))
print("Mean absolute error: %.2f"
      % mean_absolute_error(life_labels_test, linear_model_predict))
print('R_square score: %.2f' % r2_score(life_labels_test, linear_model_predict))
Coefficients: 
 [ -26.81746051  160.65752164    3.37662659    5.17479785   -1.62130018
    0.26189121    3.75961357 -180.19924226    1.39153602    1.99816095
    5.43624984   -4.59596039   -1.35215782   12.43058537   20.18640175
  -18.82270723    5.73376054    2.86398351   -0.28926014    0.28926014]
Mean squared error: 9.80
Mean absolute error: 2.32
R_square score: 0.87

Several algorithms will be tried out. First the classical linear regression.
The model is fitted first on the trained data the R square is 0.92 on the training data.Later its R square is checked on the testing data. The score is 87 % in the iteration of writing. We also calculate the MAE (the modulus between the predicted and the real value) at 2.32 and the MSE (the same only put to the power of 2) at 9.8. Now we will try to have some changes to the initial model. We will use both the properties of the Ridge regression and the Lasso and eventually the ElasticNet to see if the score can be improved.

In [104]:
scoring = make_scorer(r2_score)
grid_cv = GridSearchCV(Ridge(),
              param_grid={'alpha': range(0, 10), 'max_iter' : [10, 100, 1000]},
              scoring=scoring, cv=5, refit=True)
grid_cv.fit(life_features_train, life_labels_train)
print("Best Parameters: " + str(grid_cv.best_params_))
result = grid_cv.cv_results_
print("R^2 score on training data: %.2f" %grid_cv.score(life_features_train, life_labels_train))
print("R^2 score: %.2f"
      % r2_score(life_labels_test, grid_cv.best_estimator_.predict(life_features_test)))
print("Mean squared error: %.2f"
      % mean_squared_error(life_labels_test, linear_model_predict))
print("Mean absolute error: %.2f"
      % mean_absolute_error(life_labels_test, linear_model_predict))
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 2.0868024066249365e-18
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 7.372725493506869e-18
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 9.43416281984731e-19
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 1.3449932991350745e-18
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 2.0868024066249365e-18
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 7.372725493506869e-18
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 9.43416281984731e-19
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 1.3449932991350745e-18
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 2.0868024066249365e-18
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 7.372725493506869e-18
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 9.43416281984731e-19
  ' condition number: {}'.format(rcond), RuntimeWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\scipy\linalg\basic.py:223: RuntimeWarning: scipy.linalg.solve
Ill-conditioned matrix detected. Result is not guaranteed to be accurate.
Reciprocal condition number: 1.3449932991350745e-18
  ' condition number: {}'.format(rcond), RuntimeWarning)
Best Parameters: {'alpha': 1, 'max_iter': 10}
R^2 score on training data: 0.90
R^2 score: 0.91
Mean squared error: 6.05
Mean absolute error: 1.83

The Ridge Regression uses L2 regularization to minimize the weights of the coefficients, this is controlled with the hyperparameter alpha. Increasing the value of alpha decreases the weights of the coefficients.
We performed a grid search with cross-validation on the grid regression with alpha varying between 0 and 10 and having 3 max iterations of 10, 100 and 1000. Finally the best parameters here are alpha = 1, and max iterations = 10. The R square on the training data is 90% compared to 92 % on the standard linear model There is some improvement of the R square on the test data statistic 91% vs 87% the errors stayed the same. The MAE is 1.83 better and MSE is 6.05 again lower the classic linear regression

In [107]:
scoring = make_scorer(r2_score)
grid_cv = GridSearchCV(Lasso(),
              param_grid={'alpha': range(0, 10), 'max_iter' : [10, 100, 1000]},
              scoring=scoring, cv=5, refit=True)
grid_cv.fit(life_features_train, life_labels_train)
print("Best Parameters: " + str(grid_cv.best_params_))
result = grid_cv.cv_results_
print("R^2 score on training data: %.2f" % grid_cv.score(life_features_train, life_labels_train))
print("R^2 score: %.2f"
      % r2_score(life_labels_test, grid_cv.best_estimator_.predict(life_features_test)))
print("Mean squared error: %.2f"
      % mean_squared_error(life_labels_test, linear_model_predict))
print("Mean absolute error: %.2f"
      % mean_absolute_error(life_labels_test, linear_model_predict))
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
Best Parameters: {'alpha': 0, 'max_iter': 10}
R^2 score on training data: 0.92
R^2 score: 0.92
Mean squared error: 6.05
Mean absolute error: 1.83
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:739: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  self.best_estimator_.fit(X, y, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)

The Lasso Regression uses the L1 regularization which reduces some coefficent weights to zero. Again the regularization is controlled with aplha, decreasing it reduces the weights, also at the same time the max iterations hyperparameter needs to be increased.
The best parameters for the Lasso Regression are alpha = 0, and max iterations set to 100. The R square is a little higher at 92 % on the training data and 92 % on the test data the errors continue to stay the same as the Ridge regression.

In [108]:
scoring = make_scorer(r2_score)
grid_cv = GridSearchCV(ElasticNet(),
              param_grid={'alpha': range(0, 10), 'max_iter' : [10, 100, 1000], 'l1_ratio' : [0.1, 0.4, 0.8]},
              scoring=scoring, cv=5, refit=True)
grid_cv.fit(life_features_train, life_labels_train)
print("Best Parameters: " + str(grid_cv.best_params_))
result = grid_cv.cv_results_
print("R^2 score on training data: %.2f" % grid_cv.score(life_features_train, life_labels_train))
print("R^2 score: %.2f"
      % r2_score(life_labels_test, grid_cv.best_estimator_.predict(life_features_test)))
print("Mean squared error: %.2f"
      % mean_squared_error(life_labels_test, linear_model_predict))
print("Mean absolute error: %.2f"
      % mean_absolute_error(life_labels_test, linear_model_predict))
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:458: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
Best Parameters: {'alpha': 0, 'l1_ratio': 0.1, 'max_iter': 10}
R^2 score on training data: 0.92
R^2 score: 0.92
Mean squared error: 6.05
Mean absolute error: 1.83
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:739: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  self.best_estimator_.fit(X, y, **fit_params)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:477: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\Stefan\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)

The ElasticNet combines both L1 and L2 regularization of the previous regression types.
Finally the grid search concerning ElasticNet shows that it performs as well as the Lasso regression so far with alpha = 0, max iterations = 100 and the L1 ration set at 0.1. The R square and the errors remain the same as the Lasso Regression

Part 5 Linear Regression with Polynomial Features

The Polynomial Features function has been used to get the interactions of the input variables only to the power of 2

In [109]:
quad_feature_transformer = PolynomialFeatures(2, interaction_only = True)
quad_feature_transformer.fit(life_features_train)
life_features_train_quad = quad_feature_transformer.transform(life_features_train)
life_features_test_quad = quad_feature_transformer.transform(life_features_test)
In [110]:
poly_model_quad = LinearRegression()
poly_model_quad.fit(life_features_train_quad, life_labels_train)
accuracy_score_quad = poly_model_quad.score(life_features_train_quad, life_labels_train)
print(accuracy_score_quad)
1.0

The Linear Regression is being tested on the training data with the new Polynomial Features, the accuracy is 1, meaning the model has learned all the data. An prediction for the life_features_test_quad variable has been made.

In [111]:
poly_model_quad_predict = poly_model_quad.predict(life_features_test_quad)
In [112]:
print("Mean squared error: %.2f"
      % mean_squared_error(life_labels_test, poly_model_quad_predict))
print("Mean absolute error: %.2f"
      % mean_absolute_error(life_labels_test, poly_model_quad_predict))
print('R_square score: %.2f' % r2_score(life_labels_test, poly_model_quad_predict))
Mean squared error: 1620.72
Mean absolute error: 14.84
R_square score: -18.45

All the errors are significantly higher then the privious models, and the R square is in this case negative (this happens only in sklearn). This is the worst performing model for now.

Part 5 Decision Tree Regression

In [113]:
decision_tree_model = DecisionTreeRegressor()
decision_tree_fit = decision_tree_model.fit(life_features_train, life_labels_train)
decision_tree_score = cross_val_score(decision_tree_fit, life_features_train, life_labels_train, cv = 5)
print("mean cross validation score: %.2f"  % np.mean(decision_tree_score))
print("score without cv: %.2f" % decision_tree_fit.score(life_features_train, life_labels_train))
print("R^2 score on the test data %.2f"% r2_score(life_labels_test, decision_tree_fit.predict(life_features_test)))
mean cross validation score: 0.77
score without cv: 1.00
R^2 score on the test data 0.80

Now we will try the Decision Tree Regression. Cross Validation has been performed
The R square on the training data is 1 meaning that the algorithm has learned the data by heart, with the cross validation the figure declines to 77% and using the test date we get 80%
Now we use the algorithm to predict the values of the life_features_test.

In [114]:
decision_tree_model_predict = decision_tree_model.predict(life_features_test)
In [118]:
scoring = make_scorer(r2_score)
grid_cv = GridSearchCV(DecisionTreeRegressor(),
              param_grid={'min_samples_split': range(2, 10)},
              scoring=scoring, cv=5, refit=True)
grid_cv.fit(life_features_train, life_labels_train)
grid_cv.best_params_
print("Best Parameters: " + str(grid_cv.best_params_))
result = grid_cv.cv_results_
print("R^2 score on training data: %.2f"  % grid_cv.best_estimator_.score(life_features_train, life_labels_train))
print("R^2 score: %.2f"
      % r2_score(life_labels_test, grid_cv.best_estimator_.predict(life_features_test)))
print("Mean squared error: %.2f"
      % mean_squared_error(life_labels_test, decision_tree_model_predict))
print("Mean absolute error: %.2f"
      % mean_absolute_error(life_labels_test, decision_tree_model_predict))
Best Parameters: {'min_samples_split': 9}
R^2 score on training data: 0.98
R^2 score: 0.81
Mean squared error: 16.59
Mean absolute error: 2.71

After performing grid search with minimun samples split in the range between 2 and 10 we get the best split of 3.
The R square on the training data is 98%, the algorithms has nearly leaned the data by hearth. On the test data we get R square of 81 %, the MAE is 2.71 and MSE is 16.59. This method is worse than the Elastic Net Regression

Part 6 Random Forest Regression

In [119]:
random_forest_model = RandomForestRegressor()
random_forest_fit = random_forest_model.fit(life_features_train, life_labels_train)
random_forest_score = cross_val_score(random_forest_fit, life_features_train, life_labels_train, cv = 5)
print("mean cross validation score: %.2f"
       % np.mean(random_forest_score))
print("score without cv: %.2f"
      % random_forest_fit.score(life_features_train, life_labels_train))
print("R^2 score on the test data %.2f"
      %r2_score(life_labels_test, random_forest_fit.predict(life_features_test)))
mean cross validation score: 0.88
score without cv: 0.98
R^2 score on the test data 0.92

Now we will use the Random Forest Regression
The alogorithm has learned 98% on the training data without cross validation and 88% with, the value is 92 % on the test data.

In [120]:
random_forest_model_predict = random_forest_model.predict(life_features_test)
In [125]:
scoring = make_scorer(r2_score)
grid_cv = GridSearchCV(RandomForestRegressor(),
              param_grid={'min_samples_split': range(2, 10)},
              scoring=scoring, cv=5, refit=True)
grid_cv.fit(life_features_train, life_labels_train)
grid_cv.best_params_
result = grid_cv.cv_results_
print("Best Parameters: " + str(grid_cv.best_params_))
result = grid_cv.cv_results_
print("R^2 score on training data: %.2f"  % grid_cv.best_estimator_.score(life_features_train, life_labels_train))
print("R^2 score: %.2f"
      % r2_score(life_labels_test, grid_cv.best_estimator_.predict(life_features_test)))
print("Mean squared error: %.2f"
      % mean_squared_error(life_labels_test, random_forest_model_predict))
print("Mean absolute error: %.2f"
      % mean_absolute_error(life_labels_test, random_forest_model_predict))
Best Parameters: {'min_samples_split': 2}
R^2 score on training data: 0.98
R^2 score: 0.91
Mean squared error: 6.71
Mean absolute error: 1.89

After performing grid search in the range 2 to 10 for the minimun samples split we get 2 for the split. The R square on the training is 98 % while on the test it is 91% and a MAE of 1.89 and MSE of 6.71.

Part 7 Conclution

After comparing all the algorithms we can conclude the Lasso and the Elastic Net Regression offer which are the same:

  1. Best Parameters: {‘alpha’: 0, ‘max_iter’: 10}
  2. R square on the test data of 92%
  3. MAE of 1.83
  4. MSE of 6.05

Part 8 Sources

The following sources have been used:

  1. https://www.kaggle.com/kumarajarshi/life-expectancy-who/data
  2. Introduction to Machine Learning with Python by Andreas C. Müller & Sarah Guido
  3. Labs of the couse
  4. stack overflow
  5. Lectures of the course

Share this

Leave a Reply