write a program which takes a file and classify the file type to below Html/system verilog/CPP/python

I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of a classifier. The code I have written is,

import sklearn
import numpy as np

#Importing a local data set from the desktop
import pandas as pd
mydata = pd.read_csv('file_format.csv',skipinitialspace=True)
print mydata

x_train = mydata.script
y_train = mydata.label

#print x_train
#print y_train
x_test = mydata.script

from sklearn import tree
classi = tree.DecisionTreeClassifier()

classi.fit(x_train, y_train)

predictions = classi.predict(x_test)
print predictions

And I am getting the error as,

  script  class  div   label
0       5      6    7    html
1       0      0    0  python
2       1      1    1     csv
Traceback (most recent call last):
  File "newtest.py", line 21, in <module>
  classi.fit(x_train, y_train)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 790, in fit
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 116, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/utils/validation.py", line 410, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 5.  0.  1.].
Reshape your data either using array.reshape(-1, 1) if your data has a 
single feature or array.reshape(1, -1) if it contains a single sample.

If anyone can help me with the code, it would be so helpful to me !!

0 Comment



Captcha image