Run the Cell to import the packages
import pandas as pd import numpy as np #import dataframe as df
Data Loading Fill in the Command to load your CSV dataset "weather.csv" with pandas
weather = pd.read_csv('weather.csv', sep=',')
Data Analysis
Get the shape of the dataset and print it.
Get the column names in list and print it.
Describe the dataset to understand the basic statistics of the dataset.
Print the first three rows of the dataset
data_size=weather.shape print(data_size) weather_col_names = list(weather.columns) print(weather_col_names) print(weather.describe()) print(weather.head(3))
Target Identification
Execute the below cell to identify the target variables. If yes it will Rain Tommorow otherwise it will not Rain.
weather_target=weather['RainTomorrow'] print(weather_target)
Feature Identification
In our case by analyzing the dataset, we can understand that the columns like Date might be irrelevant as they are not dependent on call usage pattern.
Since RainTomorrow is our target variable, we will be removing it from the feature set.
cols_to_drop = ['Date','RainTomorrow'] weather_feature = weather.drop(cols_to_drop,axis = 1) print(weather_feature.head(5))
Categorical Data
In order to Identify the categorical variable in a data, use the following command in the below cell,
weather_categorical = weather.select_dtypes(include=[object]) print(weather_categorical.head(15))
Convert to boolean
Assign the column RainToday for the variable yes_no_cols and run the below cell to print first 5 rows of weather_feature
yes_no_cols = ["RainToday"] weather_feature[yes_no_cols] = weather_feature[yes_no_cols] == 'Yes' print(weather_feature.head(5))
One Hot Encoding
Execute the below cells to perform One Hot Encoding
weather_dumm=pd.get_dummies(weather_feature, columns=["Location","WindGustDir","WindDir9am","WindDir3pm"], prefix=["Location","WindGustDir","WindDir9am","WindDir3pm"]) weather_matrix = weather_dumm.values.astype(np.float)
Imputing-Missing Values
Do the Imputing-Missing Values by using the following parameters
from sklearn.impute import SimpleImputer imp=SimpleImputer(missing_values=np.nan,strategy='mean', fill_value=None,verbose=0,copy=True) weather_matrix=imp.fit_transform(weather_matrix)
Standardization
Run the below cell to perform standardization
from sklearn.preprocessing import StandardScaler #Standardize the data by removing the mean and scaling to unit variance scaler = StandardScaler() #Fit to data, then transform it. weather_matrix = scaler.fit_transform(weather_matrix)
Train and Test Data
Splitting the data for training and testing(90% train,10% test)
from sklearn.model_selection import train_test_split seed=5000 train_data,test_data, train_label, test_label = train_test_split(weather_matrix,weather_target,test_size=0.1,random_state = seed)
Decision Tree Classification
Initialize SVM classifier with following parameters
Train the model with train_data and train_label
Now predict the output with test_data
Evaluate the classifier with score from test_data and test_label
Print the predicted score
from sklearn.svm import SVC classifier = SVC(kernel="linear",C=0.025,random_state=seed ) classifier = classifier.fit(train_data,train_label) churn_predicted_target=classifier.predict(test_data) score = classifier.score(test_data,test_label) print('SVM Classifier : ',score) with open('output.txt', 'w') as file: file.write(str(np.mean(score)))
Random Forest Classifier
Do the Random Forest Classifier of the Dataset using the following parameters.
Train the model with train_data and train_label.
Now predict the output with test_data.
Evaluate the classifier with score from test_data and test_label.
from sklearn.ensemble import RandomForestClassifier classifier = RandomForestClassifier(max_depth=5,n_estimators=10,max_features=10,random_state=seed) classifier = classifier.fit(train_data,train_label) churn_predicted_target=classifier.predict(test_data) score = classifier.score(test_data,test_label) print('Random Forest Classifier : ',score) with open('output1.txt', 'w') as file: file.write(str(np.mean(score)))
First read the answer fully, then try to explain it in your own words. After that, open a few related questions and compare the concepts. This method helps you remember the topic for a longer time and improves exam preparation.