Ability ๐ŸŒฑ/ML

[ํ˜ผ๊ณต๋จธ์‹ ] 02. ๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๊ธฐ - ํ›ˆ๋ จ์„ธํŠธ & ํ…Œ์ŠคํŠธ ์„ธํŠธ, ์ƒ˜ํ”Œ๋ง ํŽธํ–ฅ, ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

n_young 2022. 4. 24. 09:00

๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜

์ง€๋„ ํ•™์Šต vs ๋น„์ง€๋„ ํ•™์Šต

์ง€๋„ํ•™์Šต : ์ •๋‹ต(ํƒ€๊นƒ)์ด ์žˆ์œผ๋‹ˆ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ •๋‹ต์„ ๋งžํžˆ๋Š” ๊ฒƒ์„ ํ•™์Šต
ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์™€ ์ •๋‹ต ํ•„์š”
๋ฐ์ดํ„ฐ = ์ž…๋ ฅ, ์ •๋‹ต = ํƒ€๊นƒ, ์ด ๋‘˜์„ ํ•ฉ์ณ ํ›ˆ๋ จ๋ฐ์ดํ„ฐ
๋น„์ง€๋„ํ•™์Šต : ํƒ€ํ‚ท ์—†์ด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋งŒ ์‚ฌ์šฉ
-> ์ •๋‹ต์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๋ฌด์–ธ๊ฐ€ ๋งžํž ์ˆ˜ ์—†์Œ
๋Œ€์‹  ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ํŒŒ์•…ํ•˜๊ฑฐ๋‚˜ ๋ณ€ํ˜•ํ•  ์ˆ˜ ์žˆ์Œ

+) ๊ฐ•ํ™”ํ•™์Šต : ํƒ€ํ‚ท์ด ์•„๋‹ˆ๋ผ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ–‰๋™ํ•œ ๊ฒฐ๊ณผ๋กœ ์–ป์€ ๋ณด์ƒ์„ ์‚ฌ์šฉํ•ด ํ•™์Šต


02-1. ํ›ˆ๋ จ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ

๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ์„ ์ œ๋Œ€๋กœ ํ‰๊ฐ€ํ•˜๋ ค๋ฉด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ํ‰๊ฐ€์— ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ๊ฐ ๋‹ฌ๋ผ์•ผ ํ•จ
-> 1. ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๋˜ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์ค€๋น„ 2. ์ด๋ฏธ ์ค€๋น„๋œ ๋ฐ์ดํ„ฐ ์ค‘์— ์ผ๋ถ€๋ฅผ ๋–ผ์–ด ๋‚ด์–ด ํ™œ์šฉ( V )

ํ‰๊ฐ€์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ = ํ…Œ์ŠคํŠธ ์„ธํŠธ, ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ = ํ›ˆ๋ จ ์„ธํŠธ

ํ›ˆ๋ จ์— ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ ์ ์ ˆํ•˜์ง€ ์•Š์Œ
ํ›ˆ๋ จํ•  ๋•Œ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๋กœ ํ‰๊ฐ€ํ•ด์•ผ ํ•จ
์ด๋ฅผ ์œ„ํ•ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์—์„œ ์ผ๋ถ€๋ฅผ ๋–ผ์–ด ๋‚ด์–ด ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ์‚ฌ์šฉ

๋„๋ฏธ์™€ ๋น™์–ด์˜ ๋ฐ์ดํ„ฐ ํ•ฉ์นœ ํŒŒ์ด์ฌ ๋ฆฌ์ŠคํŠธ

fish_length = [25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7, 31.0, 31.0, 
                31.5, 32.0, 32.0, 32.0, 33.0, 33.0, 33.5, 33.5, 34.0, 34.0, 34.5, 35.0, 
                35.0, 35.0, 35.0, 36.0, 36.0, 37.0, 38.5, 38.5, 39.5, 41.0, 41.0, 9.8, 
                10.5, 10.6, 11.0, 11.2, 11.3, 11.8, 11.8, 12.0, 12.2, 12.4, 13.0, 14.3, 15.0]
fish_weight = [242.0, 290.0, 340.0, 363.0, 430.0, 450.0, 500.0, 390.0, 450.0, 500.0, 475.0, 500.0, 
                500.0, 340.0, 600.0, 600.0, 700.0, 700.0, 610.0, 650.0, 575.0, 685.0, 620.0, 680.0, 
                700.0, 725.0, 720.0, 714.0, 850.0, 1000.0, 920.0, 955.0, 925.0, 975.0, 950.0, 6.7, 
                7.5, 7.0, 9.7, 9.8, 8.7, 10.0, 9.9, 9.8, 12.2, 13.4, 12.2, 19.7, 19.9]

 

๋‘ ํŒŒ์ด์ฌ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ˆœํšŒํ•˜์—ฌ ๊ฐ ์ƒ์„ ์˜ ๊ธธ์ด์™€ ๋ฌด๊ฒŒ๋ฅผ ํ•˜๋‚˜์˜ ๋ฆฌ์ŠคํŠธ๋กœ ๋‹ด์€ 2์ฐจ์› ๋ฆฌ์ŠคํŠธ ๋งŒ๋“ค๊ธฐ

fish_data = [[l, w] for l, w in zip(fish_length, fish_weight)]
fish_target = [1]*35 + [0]*14

ํ•˜๋‚˜์˜ ์ƒ์„  ๋ฐ์ดํ„ฐ = ์ƒ˜ํ”Œ
๋„๋ฏธ(35) + ๋น™์–ด(14) -> ์ „์ฒด ๋ฐ์ดํ„ฐ๋Š” 49๊ฐœ ์ƒ˜ํ”Œ, ์‚ฌ์šฉํ•˜๋Š” ํŠน์„ฑ์€ ๊ธธ์ด์™€ ๋ฌด๊ฒŒ 2๊ฐœ

์‚ฌ์ดํ‚ท๋Ÿฐ์˜ KNeighborsClassifer ํด๋ž˜์Šค ์ž„ํฌํŠธํ•˜๊ณ  ๋ชจ๋ธ ๊ฐ์ฒด ์ƒ์„ฑ

from sklearn.neighbors import KNeighborsClassifier
kn= KNeighborsClassifier()

 

๋ฐ์ดํ„ฐ์˜ ์ฒ˜์Œ 35๊ฐœ๋ฅผ ํ›ˆ๋ จ ์„ธํŠธ, ๋‚˜๋จธ์ง€ 14๊ฐœ๋ฅผ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ์‚ฌ์šฉ

์ธ๋ฑ์Šค : ๋ฐฐ์—ด์˜ ์œ„์น˜
์Šฌ๋ผ์ด์‹ฑ : ์ฝœ๋ก (:)์„ ๊ฐ€์šด๋ฐ ๋‘๊ณ  ์ธ๋ฑ์Šค์˜ ๋ฒ”์œ„๋ฅผ ์ง€์ •ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์›์†Œ ์„ ํƒ
๋งˆ์ง€๋ง‰ ์ธ๋ฑ์Šค์˜ ์›์†Œ๋Š” ํฌํ•จ๋˜์ง€ ์•Š๋Š”๋‹ค [ ์‹œ์ž‘ : ๋งˆ์ง€๋ง‰ +1 ]
์ฒ˜์Œ๋ถ€ํ„ฐ ์‹œ์ž‘ [ : n ] ๋งˆ์ง€๋ง‰๊นŒ์ง€ [ n : ] ์ƒ๋žต ๊ฐ€๋Šฅ

# ํ›ˆ๋ จ ์„ธํŠธ ์ž…๋ ฅ๊ฐ’  0~34  (์ฒ˜์Œ 35๊ฐœ)
train_input = fish_data[ :35]
# ํ›ˆ๋ จ ์„ธํŠธ ํƒ€๊นƒ๊ฐ’  0~34
train_target = fish_target[ :35]

# ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ž…๋ ฅ๊ฐ’  35~๋งˆ์ง€๋ง‰  (๋‚˜๋จธ์ง€ 14๊ฐœ)
test_input = fish_data[35: ]
# ํ…Œ์ŠคํŠธ ์„ธํŠธ ํƒ€๊นƒ๊ฐ’  35~๋งˆ์ง€๋ง‰
test_target = fish_target[35: ]

์ธ๋ฑ์Šค 0~34๊นŒ์ง€ ์ฒ˜์Œ 35๊ฐœ ์ƒ˜ํ”Œ - ํ›ˆ๋ จ ์„ธํŠธ
์ธ๋ฑ์Šค 35~48๊นŒ์ง€ ๋‚˜๋จธ์ง€ 14๊ฐœ ์ƒ˜ํ”Œ - ํ…Œ์ŠคํŠธ ์„ธํŠธ

 

ํ›ˆ๋ จ ์„ธํŠธ๋กœ fit( ) ๋ฉ”์„œ๋“œ ํ˜ธ์ถœ ๋ชจ๋ธ ํ›ˆ๋ จ
ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ score( ) ๋ฉ”์„œ๋“œ ํ˜ธ์ถœ ๋ชจ๋ธ ํ‰๊ฐ€

kn = kn.fit(train_input, train_target)
kn.score(test_input, test_target)

fish_data์—๋Š” ์ฒ˜์Œ๋ถ€ํ„ฐ ์ˆœ์„œ๋Œ€๋กœ 35๊ฐœ์˜ ๋„๋ฏธ์™€ 14๊ฐœ์˜ ๋น™์–ด ์ƒ˜ํ”Œ์ด ๋“ค์–ด๊ฐ€ ์žˆ์Œ
๋”ฐ๋ผ์„œ ๋งˆ์ง€๋ง‰ 14๊ฐœ๋ฅผ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋งŒ๋“ค๋ฉด ๋น™์–ด ๋ฐ์ดํ„ฐ๋งŒ ๋“ค์–ด๊ฐ€๊ฒŒ ๋จ

ํ›ˆ๋ จ ์„ธํŠธ์— ๋„๋ฏธ๋งŒ ์žˆ์–ด์„œ ํ…Œ์ŠคํŠธ ์„ธํŠธ๊ฐ€ ๋ฌด์—‡์ด๋“  ๋ฌด์กฐ๊ฑด ๋„๋ฏธ๋กœ ๋ถ„๋ฅ˜
๊ทธ๋Ÿฐ๋ฐ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋Š” ๋น™์–ด๋งŒ ์žˆ์Œ
=> ์ •ํ™•๋„ 0 %

 

์ƒ˜ํ”Œ๋ง ํŽธํ–ฅ

ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ์— ์ƒ˜ํ”Œ์ด ๊ณจ๊ณ ๋ฃจ ์„ž์—ฌ ์žˆ์ง€ ์•Š์œผ๋ฉด ์ƒ˜ํ”Œ๋ง์ด ํ•œ์ชฝ์œผ๋กœ ์น˜์šฐ์นœ ๊ฒฝ์šฐ
-> ํ›ˆ๋ จ ์„ธํŠธ, ํ…Œ์ŠคํŠธ ์„ธํŠธ ๋‚˜๋ˆ„๊ธฐ ์ „์— ๋ฐ์ดํ„ฐ๋ฅผ ์„ž๋“ ์ง€, ๊ณจ๊ณ ๋ฃจ ์ƒ˜ํ”Œ์„ ๋ฝ‘์•„์„œ ํ›ˆ๋ จ ์„ธํŠธ, ํ…Œ์ŠคํŠธ ์„ธํŠธ ๋งŒ๋“ค์–ด์•ผ ํ•จ

* ๋„˜ํŒŒ์ด numpy

๊ณ ์ฐจ์›์˜ ๋ฐฐ์—ด์„ ์†์‰ฝ๊ฒŒ ๋งŒ๋“ค๊ณ  ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ„ํŽธํ•œ ๋„๊ตฌ <- ์ œ๊ณต ๋ฐ์ดํ„ฐ๋ฅผ ์„ž๊ฑฐ๋‚˜ ๋ฝ‘๋Š” ๊ฒƒ์„ ๊ฐ„ํŽธํžˆ ์ฒ˜๋ฆฌ๊ฐ€๋Šฅ
1์ฐจ์› ๋ฐฐ์—ด - ์„ , 2์ฐจ์› ๋ฐฐ์—ด - ๋ฉด, 3์ฐจ์› ๋ฐฐ์—ด - ๊ณต๊ฐ„
๋ณดํ†ต์˜ xy ์ขŒํ‘œ๊ณ„์™€ ๋‹ฌ๋ฆฌ ์‹œ์ž‘์ ์ด ์™ผ์ชฝ ์œ„์—์„œ๋ถ€ํ„ฐ ์‹œ์ž‘

๋„˜ํŒŒ์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ

import numpy as np

 

 

ํŒŒ์ด์ฌ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋„˜ํŒŒ์ด ๋ฐฐ์—ด๋กœ ๋ฐ”๊พธ๊ธฐ

* ๋„˜ํŒŒ์ด array( ) ํ•จ์ˆ˜ : ํŒŒ์ด์ฌ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋„˜ํŒŒ์ด ๋ฐฐ์—ด๋กœ ๋ฐ”๊พธ๊ธฐ

input_arr = np.array(fish_data)
target_arr = np.array(fish_target)

 

 

๋ฐฐ์—ด ํฌ๊ธฐ ํ™•์ธ

* shape ๋ฐฐ์—ด์˜ ํฌ๊ธฐ ์•Œ๋ ค์คŒ (์ƒ˜ํ”Œ ์ˆ˜, ํŠน์„ฑ ์ˆ˜)

print(input_arr.shape)

(49, 2) => ์ƒ˜ํ”Œ 49๊ฐœ, ํŠน์„ฑ 2๊ฐœ

 

๋ฐฐ์—ด์—์„œ ๋žœ๋คํ•˜๊ฒŒ ์ƒ˜ํ”Œ ์„ ํƒํ•ด ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ ๋งŒ๋“ค๊ธฐ

์ฃผ์˜ ) input_arr์™€ target_arr์—์„œ ๊ฐ™์€ ์œ„์น˜๋Š” ํ•จ๊ป˜ ์„ ํƒ๋˜์–ด์•ผ ํ•จ
-> ์ธ๋ฑ์Šค๋ฅผ ์„ž์€ ๋‹ค์Œ input_arr, target_arr ์ƒ˜ํ”Œ ์„ ํƒ -> ๋ฌด์ž‘์œ„

* ๋„˜ํŒŒ์ด arange( ) ํ•จ์ˆ˜ : ์‹œ์ž‘์—์„œ ๋๊นŒ์ง€ step ํฌ๊ธฐ๋งŒํผ ์ผ์ •ํ•˜๊ฒŒ ๋–จ์–ด์ง„ ์ˆซ์ž๋“ค์„ array ํ˜•ํƒœ๋กœ ๋ฐ˜ํ™˜
np.arange(์‹œ์ž‘์ (์ƒ๋žต ์‹œ 0), ๋์ (๋ฏธํฌํ•จ), step size(์ƒ๋žต ์‹œ 1))
* shuffle( ) ํ•จ์ˆ˜ : ์ฃผ์–ด์ง„ ๋ฐฐ์—ด์„ ๋ฌด์ž‘์œ„๋กœ ์„ž์Œ

np.random.seed(42) # ์ผ์ •ํ•œ ๊ฒฐ๊ณผ ์–ป์œผ๋ ค๋ฉด ์ดˆ๊ธฐ์— ๋žœ๋ค์‹œ๋“œ ์ง€์ •
index = np.arange(49)
np.random.shuffle(index)

* ๋ฐฐ์—ด ์ธ๋ฑ์‹ฑ : ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ธ๋ฑ์Šค๋กœ ํ•œ ๋ฒˆ์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์›์†Œ ์„ ํƒ

index ๋ฐฐ์—ด์˜ ์ฒ˜์Œ 35๊ฐœ๋ฅผ input_arr์™€ target_arr ์ „๋‹ฌํ•˜์—ฌ ๋žœ๋คํ•˜๊ฒŒ 35๊ฐœ ์ƒ˜ํ”Œ์„ ํ›ˆ๋ จ ์„ธํŠธ๋กœ ๋งŒ๋“ค๊ธฐ

train_input = input_arr[index[:35]]
train_target = target_arr[index[:35]]


๋‚˜๋จธ์ง€ 14๊ฐœ๋ฅผ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋งŒ๋“ค๊ธฐ

test_input = input_arr[index[35:]]
test_target = target_arr[index[35:]]

 

์‚ฐ์ ๋„ ํ™•์ธ

import matplotlib.pyplot as plt
plt.scatter(train_input[:,0], train_input[:,1])
plt.scatter(test_input[:,0], test_input[:,1])
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

ํŒŒ๋ž€์ƒ‰ = ํ›ˆ๋ จ ์„ธํŠธ, ์ฃผํ™ฉ์ƒ‰ = ํ…Œ์ŠคํŠธ ์„ธํŠธ

๋‘ ๋ฒˆ์งธ ๋จธ์‹ ๋Ÿฌ๋‹ ํ”„๋กœ๊ทธ๋žจ

K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ๋ชจ๋ธ ํ›ˆ๋ จ

kn = kn.fit(train_input, train_target)

 

๋ชจ๋ธ ํ…Œ์ŠคํŠธ

kn.score(test_input, test_target)

 

ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ์™€ ์‹ค์ œ ํƒ€๊นƒ ํ™•์ธ

kn.predict(test_input)
test_target

predict( ) ๋ฉ”์„œ๋“œ๊ฐ€ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฐ’์€ ๋„˜ํŒŒ์ด ๋ฐฐ์—ด์ž„


02-2. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

์‹ค์ „์— ํˆฌ์ž… > ๊ธธ์ด๊ฐ€ 25cm, ๋ฌด๊ฒŒ๊ฐ€ 150g์ด๋ฉด ๋„๋ฏธ์ธ๋ฐ ๋น™์–ด๋ผ๊ณ  ์˜ˆ์ธก --> ๋ญ๊ฐ€ ์ž˜๋ชป๋œ ๊ฑธ๊นŒ?

๋„˜ํŒŒ์ด๋กœ ๋ฐ์ดํ„ฐ ์ค€๋น„ํ•˜๊ธฐ

fish_length = [25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7, 31.0, 31.0, 
                31.5, 32.0, 32.0, 32.0, 33.0, 33.0, 33.5, 33.5, 34.0, 34.0, 34.5, 35.0, 
                35.0, 35.0, 35.0, 36.0, 36.0, 37.0, 38.5, 38.5, 39.5, 41.0, 41.0, 9.8, 
                10.5, 10.6, 11.0, 11.2, 11.3, 11.8, 11.8, 12.0, 12.2, 12.4, 13.0, 14.3, 15.0]
fish_weight = [242.0, 290.0, 340.0, 363.0, 430.0, 450.0, 500.0, 390.0, 450.0, 500.0, 475.0, 500.0, 
                500.0, 340.0, 600.0, 600.0, 700.0, 700.0, 610.0, 650.0, 575.0, 685.0, 620.0, 680.0, 
                700.0, 725.0, 720.0, 714.0, 850.0, 1000.0, 920.0, 955.0, 925.0, 975.0, 950.0, 6.7, 
                7.5, 7.0, 9.7, 9.8, 8.7, 10.0, 9.9, 9.8, 12.2, 13.4, 12.2, 19.7, 19.9]

 

๋„˜ํŒŒ์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ

import numpy as np


์ „๋‹ฌ๋ฐ›์€ ๋ฆฌ์ŠคํŠธ ์ผ๋ ฌ๋กœ ์„ธ์šด ๋‹ค์Œ ์ฐจ๋ก€๋Œ€๋กœ ์—ฐ๊ฒฐ

for๋ฌธ, zip() ์ด์šฉ ํŒŒ์ด์ฌ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ˆœํšŒํ•˜๋ฉด์„œ ์›์†Œ๋ฅผ ํ•˜๋‚˜์”ฉ ๊บผ๋‚ด
์ƒ์„  ํ•˜๋‚˜์˜ ๊ธธ์ด์™€ ๋ฌด๊ฒŒ๋ฅผ ๋ฆฌ์ŠคํŠธ ์•ˆ์˜ ๋ฆฌ์ŠคํŠธ๋กœ ์ง์ ‘ ๊ตฌ์„ฑ
-> ๋„˜ํŒŒ์ด column_stack( ) ์‚ฌ์šฉ!

* column_stack( ) : ์ „๋‹ฌ๋ฐ›์€ ๋ฆฌ์ŠคํŠธ ์ผ๋ ฌ๋กœ ์„ธ์šด ๋‹ค์Œ ์ฐจ๋ก€๋Œ€๋กœ ๋‚˜๋ž€ํžˆ ์—ฐ๊ฒฐ
์—ฐ๊ฒฐํ•œ ๋ฆฌ์ŠคํŠธ๋Š” ํŒŒ์ด์ฌ ํŠœํ”Œ๋กœ ์ „๋‹ฌ
์˜ˆ) np.column_stack(( [1,2,3], [4,5,6] ))
=> array([[1,4],
[2,5],
[3,6]])

* ํŒŒ์ด์ฌ ํŠœํ”Œ์€ ๋ฆฌ์ŠคํŠธ์™€ ๋งค์šฐ ๋น„์Šท. ๋ฆฌ์ŠคํŠธ์ฒ˜๋Ÿผ ์›์†Œ์— ์ˆœ์„œ๊ฐ€ ์žˆ์ง€๋งŒ ํ•œ ๋ฒˆ ๋งŒ๋“ค์–ด์ง„ ํŠœํ”Œ์€ ์ˆ˜์ •ํ•  ์ˆ˜ ์—†๋‹ค.
ํŠœํ”Œ์„ ์‚ฌ์šฉํ•˜๋ฉด ํ•จ์ˆ˜๋กœ ์ „๋‹ฌํ•œ ๊ฐ’์ด ๋ฐ”๋€Œ์ง€ ์•Š๋Š”๋‹ค ~> ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐ’์œผ๋กœ ๋งŽ์ด ์‚ฌ์šฉ

fish_data = np.column_stack((fish_length, fish_weight))


๋„˜ํŒŒ์ด ๋ฐฐ์—ด์„ ์ถœ๋ ฅํ•˜๋ฉด ๋ฆฌ์ŠคํŠธ์ฒ˜๋Ÿผ ํ•œ ์ค„๋กœ ๊ธธ๊ฒŒ ์ถœ๋ ฅ x
ํ–‰๊ณผ ์—ด์„ ๋งž์ถ”์–ด ๊ฐ€์ง€๋Ÿฐํžˆ ์ •๋ฆฌ๋œ ๋ชจ์Šต์œผ๋กœ ๋ณด์—ฌ์คŒ

ํƒ€๊นƒ ๋ฐ์ดํ„ฐ ๋งŒ๋“ค๊ธฐ
> 1์ด 35๊ฐœ์ธ ๋ฐฐ์—ด๊ณผ 0์ด 14๊ฐœ์ธ ๋ฐฐ์—ด ๋งŒ๋“ค๊ธฐ
์›์†Œ๊ฐ€ ํ•˜๋‚˜์ธ ๋ฆฌ์ŠคํŠธ [1], [0]์„ ์—ฌ๋Ÿฌ ๋ฒˆ ๊ณฑํ•ด์„œ ํƒ€๊นƒ ๋ฐ์ดํ„ฐ ๋งŒ๋“ฆ
-> ๋„˜ํŒŒ์ด np.ones( ), np.zeros( ) ํ•จ์ˆ˜ ์ด์šฉ!

* np.ones( ), np.zeros( ) : ๊ฐ๊ฐ ์›ํ•˜๋Š” ๊ฐœ์ˆ˜์˜ 1๊ณผ 0์„ ์ฑ„์šด ๋ฐฐ์—ด์„ ๋งŒ๋“ค์–ด์คŒ
์˜ˆ) print(np.ones(5)) => [1,1,1,1,1]

 

> ๋‘ ๋ฐฐ์—ด ์—ฐ๊ฒฐ
np.ones(), np.zeros() ํ•จ์ˆ˜ ์‚ฌ์šฉํ•ด 1์ด 35๊ฐœ์ธ ๋ฐฐ์—ด๊ณผ 0์ด 14๊ฐœ์ธ ๋ฐฐ์—ด ๋งŒ๋“ค๊ณ 
np.concatenate() ํ•จ์ˆ˜ ์‚ฌ์šฉํ•ด ๋ฐฐ์—ด ์—ฐ๊ฒฐ
* np.concatenate( ) : ์ฒซ ๋ฒˆ์งธ ์ฐจ์›์„ ๋”ฐ๋ผ ๋ฐฐ์—ด์„ ์—ฐ๊ฒฐ
์—ฐ๊ฒฐํ•œ ๋ฆฌ์ŠคํŠธ๋‚˜ ๋ฐฐ์—ด์„ ํŠœํ”Œ๋กœ ์ „๋‹ฌํ•ด์•ผ ํ•จ

fish_target = np.concatenate((np.ones(35), np.zeros(14)))


> np.column_stack( ) vs np.concatenate( )


์‚ฌ์ดํ‚ท๋Ÿฐ์œผ๋กœ ํ›ˆ๋ จ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ ๋‚˜๋ˆ„๊ธฐ

๋„˜ํŒŒ์ด ๋ฐฐ์—ด์˜ ์ธ๋ฑ์Šค๋ฅผ ์ง์ ‘ ์„ž์–ด ํ›ˆ๋ จ ์„ธํŠธ์™€ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋‚˜๋ˆ” ... ๋ฒˆ๊ฑฐ๋กœ์›€
-> ์‚ฌ์ดํ‚ท๋Ÿฐ train_test_split( ) ํ•จ์ˆ˜ ์ด์šฉ!

 

train_test_split ํ•จ์ˆ˜ ์ž„ํฌํŠธ

* train_test_split( ) : ์ „๋‹ฌ๋˜๋Š” ๋ฆฌ์ŠคํŠธ๋‚˜ ๋ฐฐ์—ด์„ ๋น„์œจ์— ๋งž๊ฒŒ ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋‚˜๋ˆ„์–ด์คŒ
๋‚˜๋ˆ„๊ธฐ ์ „์— ์•Œ์•„์„œ ์„ž์Œ
์‚ฌ์ดํ‚ท๋Ÿฐ model_selection ๋ชจ๋“ˆ ์•„๋ž˜ ์žˆ์Œ

from sklearn.model_selection import train_test_split

 

fish_data์™€ fish_target ๋‚˜๋ˆ„๊ธฐ

์‚ฌ์šฉ๋ฒ• : ๋‚˜๋ˆ„๊ณ  ์‹ถ์€ ๋ฆฌ์ŠคํŠธ๋‚˜ ๋ฐฐ์—ด์„ ์›ํ•˜๋Š” ๋งŒํผ ์ „๋‹ฌ
train_test_split() ํ•จ์ˆ˜์—๋Š” ์ž์ฒด์ ์œผ๋กœ ๋žœ๋ค ์‹œ๋“œ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋Š” random_state ๋งค๊ฐœ๋ณ€์ˆ˜ ์žˆ์Œ

train_input, test_input, train_target, test_target = train_test_split(fish_data, fish_target, random_state=42)

fish_data์™€ fish_target 2๊ฐœ์˜ ๋ฐฐ์—ด์„ ์ „๋‹ฌ -> 2๊ฐœ์”ฉ ๋‚˜๋‰˜์–ด ์ด 4๊ฐœ์˜ ๋ฐฐ์—ด ๋ฐ˜ํ™˜
์ฒ˜์Œ 2๊ฐœ๋Š” ์ž…๋ ฅ๋ฐ์ดํ„ฐ(train_input, test_input) ๋‚˜๋จธ์ง€ 2๊ฐœ๋Š” ํƒ€๊นƒ๋ฐ์ดํ„ฐ(train_target, test_target)

์ด ํ•จ์ˆ˜๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ 25%๋ฅผ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋–ผ์–ด๋ƒ„

 

์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ ์ถœ๋ ฅ

* shape( ) : ๋ฐฐ์—ด์˜ ์ถ• ๊ธธ์ด

print(train_input.shape, test_input.shape)

=> (36, 2) (13, 2)
์ž…๋ ฅ๋ฐ์ดํ„ฐ๋Š” 2๊ฐœ์˜ ์—ด์ด ์žˆ๋Š” 2์ฐจ์› ๋ฐฐ์—ด

 

ํƒ€๊นƒ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ ์ถœ๋ ฅ

print(train_target.shape, test_target.shape)

=> (36,) (13,)
ํƒ€๊นƒ ๋ฐ์ดํ„ฐ๋Š” 1์ฐจ์› ๋ฐฐ์—ด

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ 36๊ฐœ, ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ 13๊ฐœ๋กœ ๋‚˜๋ˆ”

 

๋ฐ์ดํ„ฐ ์ถœ๋ ฅ

print(test_target)

=> [1. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
13 ์„ธํŠธ ์ค‘์—์„œ 10๊ฐœ๊ฐ€ ๋„๋ฏธ(1), 3๊ฐœ๊ฐ€ ๋น™์–ด(3)
์›๋ž˜ ๋„๋ฏธ์™€ ๋น™์–ด ๊ฐœ์ˆ˜ 35๊ฐœ์™€ 14 -> 2.5:1 ๋น„์œจ
ํ•˜์ง€๋งŒ ์ด ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ๋น„์œจ์€ 3.3:1
-> ์ƒ˜ํ”Œ๋ง ํŽธํ–ฅ

 

ํด๋ž˜์Šค ๋น„์œจ์— ๋งž๊ฒŒ ๋ฐ์ดํ„ฐ ๋‚˜๋ˆ„๊ธฐ

ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ์— ์ƒ˜ํ”Œ์˜ ํด๋ž˜์Šค ๋น„์œจ์ด ์ผ์ •ํ•˜์ง€ ์•Š๋‹ค๋ฉด ๋ชจ๋ธ์ด ์ผ๋ถ€ ์ƒ˜ํ”Œ์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์—†์Œ
-> train_test_split( ) ํ•จ์ˆ˜์˜ stratify ๋งค๊ฐœ๋ณ€์ˆ˜์— ํƒ€๊นƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜๋ฉด ํด๋ž˜์Šค ๋น„์œจ์— ๋งž๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ”

train_input, test_input, train_target, test_target = train_test_split(fish_data, fish_target, stratify=fish_target, random_state=42)
print(test_target)

=> [0. 0. 1. 0. 1. 0. 1. 1. 1. 1. 1. 1. 1.]

๋น™์–ด๊ฐ€ ํ•˜๋‚˜ ๋Š˜์–ด ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ๋น„์œจ์ด 2.25:1์ด ๋จ

๋ฐ์ดํ„ฐ ์ค€๋น„ ์™„๋ฃŒ! ๋ฌธ์ œ ํ™•์ธ!


์ˆ˜์ƒํ•œ ๋„๋ฏธ ํ•œ ๋งˆ๋ฆฌ

K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํ›ˆ๋ จ

์•ž์„œ ์ค€๋น„ํ•œ ๋ฐ์ดํ„ฐ๋กœ k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํ›ˆ๋ จ

from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier()
kn.fit(train_input, train_target)
kn.score(test_input, test_target)

 

๋ชจ๋ธ์— ์ƒˆ ๋„๋ฏธ ๋ฐ์ดํ„ฐ ๋„ฃ๊ณ  ๊ฒฐ๊ณผ ํ™•์ธ

๊ธธ์ด๊ฐ€ 25cm, ๋ฌด๊ฒŒ๊ฐ€ 150g ๋ฐ์ดํ„ฐ

print(kn.predict([[25,150]]))

=> [0]
๋„๋ฏธ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์—ˆ๋Š”๋ฐ ๋น™์–ด๋กœ ์˜ˆ์ธก๋˜์—ˆ๋‹ค ??????

 

์‚ฐ์ ๋„ ํ™•์ธ

import matplotlib.pyplot as plt 
plt.scatter(train_input[:,0], train_input[:,1])
plt.scatter(25, 150, marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()
train_input[ : , 0]  ๋ชจ๋“  ํ–‰, ์—ด 0๋ฒˆ
train_input[ : , 1]  ๋ชจ๋“  ํ–‰, ์—ด 1๋ฒˆ

k-์ตœ๊ทผ์ ‘ ์ด์›ƒ์€ ์ฃผ๋ณ€ ์ƒ˜ํ”Œ ์ค‘์—์„œ ๋‹ค์ˆ˜์ธ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธก์œผ๋กœ ์‚ฌ์šฉ
* kneighbors( ) : KNeighborsClassifier ํด๋ž˜์Šค - ์ฃผ์–ด์ง„ ์ƒ˜ํ”Œ์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ ์ฐพ์•„์ฃผ๋Š” ๋ฉ”์„œ๋“œ
์ด์›ƒ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ์™€ ์ด์›ƒ ์ƒ˜ํ”Œ์˜ ์ธ๋ฑ์Šค ๋ฐ˜ํ™˜
* n_neighbors( ) : KNeighborsClassifier ํด๋ž˜์Šค์˜ ์ด์›ƒ ๊ฐœ์ˆ˜, ๊ธฐ๋ณธ๊ฐ’ 5

distances, indexes = kn.kneighbors([[25, 150]])

 

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์ค‘์—์„œ ์ด์›ƒ ์ƒ˜ํ”Œ์„ ๋”ฐ๋กœ ๊ตฌ๋ถ„ํ•ด ๊ทธ๋ฆฌ๊ธฐ

indexes ๋ฐฐ์—ด ์‚ฌ์šฉ

plt.scatter(train_input[:,0], train_input[:,1])
plt.scatter(25, 150, marker='^')
plt.scatter(train_input[indexes,0], train_input[indexes,1], marker='D')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

์‚ผ๊ฐํ˜• ์ƒ˜ํ”Œ์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด 5๊ฐœ์˜ ์ƒ˜ํ”Œ์ด ์ดˆ๋ก ๋‹ค์ด์•„๋ชฌ๋“œ๋กœ ํ‘œํ˜„

๋ฐ์ดํ„ฐ ํ™•์ธ

print(train_input[indexes])

=> [[[ 25.4 242. ] [ 15. 19.9] [ 14.3 19.7] [ 13. 12.2] [ 12.2 12.2]]]

ํƒ€๊นƒ ๋ฐ์ดํ„ฐ๋กœ ํ™•์ธ

print(train_target[indexes])

=> [[1. 0. 0. 0. 0.]]

๊ธธ์ด 25cm ๋ฌด๊ฒŒ 150g์ธ ์ƒ์„ ์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ์€ ๋น™์–ด๊ฐ€ ์••๋„์ ์œผ๋กœ ๋งŽ์Œ
์‚ฐ์ ๋„๋ฅผ ๋ณด๋ฉด ์ง๊ด€์ ์œผ๋กœ ๋„๋ฏธ๊ฐ€ ๊ฐ€๊น๊ฒŒ ๋ณด์ด๋Š”๋ฐ ์™œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ์„ ๋น™์–ด๋ผ๊ณ  ์ƒ๊ฐ?????
-> kneighbors() ๋ฉ”์„œ๋“œ์—์„œ ๋ฐ˜ํ™˜ํ•œ distances ๋ฐฐ์—ด ์ถœ๋ ฅ!

์ด์›ƒ ์ƒ˜ํ”Œ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ ์ถœ๋ ฅ

* distances : ์ด์›ƒ ์ƒ˜ํ”Œ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ

print(distances)

=> [[ 92.00086956 130.48375378 130.73859415 138.32150953 138.39320793]]


๊ธฐ์ค€์„ ๋งž์ถฐ๋ผ


[[ 92.00086956 130.48375378 130.73859415 138.32150953 138.39320793]]
์‚ผ๊ฐํ˜• ์ƒ˜ํ”Œ์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ฒซ ๋ฒˆ์งธ ์ƒ˜ํ”Œ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ 92
๊ทธ ์™ธ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ƒ˜ํ”Œ๋“ค์€ 130, 138

๊ทธ๋Ÿฐ๋ฐ 92์™€ 130 ๊ทธ๋ž˜ํ”„ ๊ฑฐ๋ฆฌ ๋น„์œจ์ด ์ด์ƒํ•˜๋‹ค?????
-> x์ถ•์€ ๋ฒ”์œ„๊ฐ€ ์ข๊ณ (10~40) y์ถ•์€ ๋ฒ”์œ„๊ฐ€ ๋„“์Œ(0~1000)
-> y์ถ•์œผ๋กœ ์กฐ๊ธˆ๋งŒ ๋ฉ€์–ด์ ธ๋„ ๊ฑฐ๋ฆฌ๊ฐ€ ํฐ ๊ฐ’์œผ๋กœ ๊ณ„์‚ฐ
-> x์ถ• ๋ฒ”์œ„๋ฅผ ๋™์ผํ•˜๊ฒŒ 0~1000์œผ๋กœ ๋งž์ถ”์ž


x์ถ• ๋ฒ”์œ„๋ฅผ ๋™์ผํ•˜๊ฒŒ ๋งž์ถ”์ž

* xlim( ) : x์ถ• ๋ฒ”์œ„ ์ง€์ •
* ylim( ) : y์ถ• ๋ฒ”์œ„ ์ง€์ •

plt.scatter(train_input[:,0], train_input[:,1])
plt.scatter(25,150, marker='^')
plt.scatter(train_input[indexes,0], train_input[indexes,1], marker='D')
plt.xlim((0,1000))
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

x์ถ•๊ณผ y์ถ• ๋ฒ”์œ„ ๋™์ผํ•˜๊ฒŒ ๋งž์ถ”์—ˆ๋”๋‹ˆ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ์ˆ˜์ง์œผ๋กœ ๋Š˜์–ด์„  ํ˜•ํƒœ
-> ์ƒ์„  ๊ธธ์ด(x์ถ•)๋Š” ํฐ ์˜ํ–ฅ ๋ฏธ์น˜์ง€ ๋ชป ํ•จ, ์ƒ์„  ๋ฌด๊ฒŒ(y์ถ•)๋งŒ ๊ณ ๋ ค ๋Œ€์ƒ

๋‘ ํŠน์„ฑ(๊ธธ์ด์™€ ๋ฌด๊ฒŒ)์˜ ๊ฐ’์ด ๋†“์ธ ๋ฒ”์œ„๊ฐ€ ๋งค์šฐ ๋‹ค๋ฆ„
์ด๋ฅผ ๋‘ ํŠน์„ฑ์˜ ์Šค์ผ€์ผ์ด ๋‹ค๋ฅด๋‹ค๊ณ  ํ•จ

๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๊ธฐ์ค€์ด ๋‹ค๋ฅด๋ฉด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์˜ˆ์ธก ๋ถˆ๊ฐ€ (ํŠนํžˆ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜์ผ ๋•Œ)
์ œ๋Œ€๋กœ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ํŠน์„ฑ๊ฐ’์„ ์ผ์ •ํ•œ ๊ธฐ์ค€์œผ๋กœ ๋งž์ถฐ์•ผ ํ•จ


* ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

ํŠน์„ฑ๊ฐ’์„ ์ผ์ •ํ•œ ๊ธฐ์ค€์œผ๋กœ ๋งž์ถ”๋Š” ์ž‘์—…

ํ‘œ์ค€์ ์ˆ˜
๊ฐ ํŠน์„ฑ๊ฐ’์ด ํ‰๊ท ์—์„œ ํ‘œ์ค€ํŽธ์ฐจ์˜ ๋ช‡ ๋ฐฐ๋งŒํผ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋ƒ„
์ด๋ฅผ ํ†ตํ•ด ์‹ค์ œ ํŠน์„ฑ๊ฐ’์˜ ํฌ๊ธฐ์™€ ์ƒ๊ด€์—†์ด ๋™์ผํ•œ ์กฐ๊ฑด์œผ๋กœ ๋น„๊ต ๊ฐ€๋Šฅ

* ๋ถ„์‚ฐ : ๋ฐ์ดํ„ฐ์—์„œ ํ‰๊ท ์„ ๋บ€ ๊ฐ’์„ ๋ชจ๋‘ ์ œ๊ณฑํ•œ ๋‹ค์Œ ํ‰๊ท ์„ ๋‚ธ๋‹ค
* ํ‘œ์ค€ํŽธ์ฐจ : ๋ถ„์‚ฐ์˜ ์ œ๊ณฑ๊ทผ์œผ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ„์‚ฐ๋œ ์ •๋„
๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์›์ ์—์„œ ๋ช‡ ํ‘œ์ค€ํŽธ์ฐจ๋งŒํผ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€ ๋‚˜ํƒ€๋ƒ„

๊ณ„์‚ฐ๋ฒ• : ํ‰๊ท ์„ ๋นผ๊ณ  ํ‘œ์ค€ํŽธ์ฐจ๋กœ ๋‚˜๋ˆ„๊ธฐ

mean= np.mean(train_input, axis=0)
std=np.std(train_input, axis=0)

* np.mean( ) : ํ‰๊ท 
* np.std( ) : ํ‘œ์ค€ํŽธ์ฐจ
* train_input : ํฌ๊ธฐ์˜ ๋ฐฐ์—ด
* aixs : ์ค‘์‹ฌ ์ถ•
ํŠน์„ฑ๋งˆ๋‹ค ๊ฐ’์˜ ์Šค์ผ€์ผ์ด ๋‹ค๋ฅด๋ฏ€๋กœ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋Š” ๊ฐ ํŠน์„ฑ๋ณ„๋กœ ๊ณ„์‚ฐํ•ด์•ผ ํ•จ
axis=0 : ํ–‰์„ ๋”ฐ๋ผ ๊ฐ ์—ด์˜ ํ†ต๊ณ„ ๊ฐ’ ๊ณ„์‚ฐ

๊ณ„์‚ฐ๋œ ํ‰๊ท , ํ‘œ์ค€ํŽธ์ฐจ ์ถœ๋ ฅ

print(mean, std)

=> [ 27.29722222 454.09722222] [ 9.98244253 323.29893931]

๊ฐ ํŠน์„ฑ๋งˆ๋‹ค ํ‰๊ท , ํ‘œ์ค€ํŽธ์ฐจ ๊ตฌํ•ด์ง

ํ‘œ์ค€์ ์ˆ˜ ๋ฐ˜ํ™˜

์›๋ณธ ๋ฐ์ดํ„ฐ์—์„œ ํ‰๊ท ์„ ๋นผ๊ณ  ํ‘œ์ค€ํŽธ์ฐจ๋กœ ๋‚˜๋ˆ„์–ด ํ‘œ์ค€์ ์ˆ˜๋กœ ๋ฐ˜ํ™˜

train_scaled = (train_input - mean) / std

train_input์˜ ๋ชจ๋“  ํ–‰์—์„œ mean์— ์žˆ๋Š” ๋‘ ํ‰๊ท ๊ฐ’์„ ๋บด์คŒ
๊ทธ ๋‹ค์Œ std์— ์žˆ๋Š” ๋‘ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๋‹ค์‹œ ๋ชจ๋“  ํ–‰์— ์ ์šฉ

* ๋ธŒ๋กœ๋“œ์บ์ŠคํŒ… : ๋„˜ํŒŒ์ด ๋ฐฐ์—ด ์‚ฌ์ด์—์„œ ์ผ์–ด๋‚จ


์ „์ฒ˜๋ฆฌ ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ ํ›ˆ๋ จํ•˜๊ธฐ

ํ‘œ์ค€์ ์ˆ˜๋กœ ๋ณ€ํ™˜ํ•œ ๋ฐ์ดํ„ฐ์™€ ์ƒ˜ํ”Œ ์‚ฐ์ ๋„

ํ‘œ์ค€์ ์ˆ˜๋กœ ๋ณ€ํ™˜ํ•œ train_sacled์™€ [25,150] ์ƒ˜ํ”Œ๋กœ ์‚ฐ์ ๋„

plt.scatter(train_scaled[:,0], train_scaled[:,1])
plt.scatter(25, 150, marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

์˜ค๋ฅธ์ชฝ ๋งจ ๊ผญ๋Œ€๊ธฐ์— ์ƒ˜ํ”Œ ํ•˜๋‚˜๋งŒ ๋ฉ๊ทธ๋Ÿฌ๋‹ˆ?????
-> ํ›ˆ๋ จ ์„ธํŠธ๋ฅผ mean์œผ๋กœ ๋นผ๊ณ  std๋กœ ๋‚˜๋ˆ„์–ด ์ฃผ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ’์˜ ๋ฒ”์œ„ ๋‹ฌ๋ผ์ง

์ƒ˜ํ”Œ๋„ ํ›ˆ๋ จ ์„ธํŠธ mean, std ์ด์šฉ ๋ณ€ํ™˜

์ƒ˜ํ”Œ [25,150]์„ ๋™์ผํ•œ ๋น„์œจ๋กœ ๋ณ€ํ™˜
์ค‘์š”!! โ˜… ํ›ˆ๋ จ ์„ธํŠธ์˜ mean, std๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ณ€ํ™˜ํ•ด์•ผ ํ•จ โ˜…

new = ([25,150] - mean) / std
plt.scatter(train_scaled[:,0], train_scaled[:,1])
plt.scatter(new[0], new[1], marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

ํ‘œ์ค€ํŽธ์ฐจ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์ „์˜ ์‚ฐ์ ๋„์™€ ๊ฑฐ์˜ ๋™์ผ
๋‹ฌ๋ผ์ง„ ์  : x, y์ถ•์˜ ๋ฒ”์œ„๊ฐ€ -1.5 ~ 1.5 ์‚ฌ์ด๋กœ ๋ฐ”๋€œ
ํ›ˆ๋ จ๋ฐ์ดํ„ฐ์˜ ๋‘ ํŠน์„ฑ์ด ๋น„์Šทํ•œ ๋ฒ”์œ„๋ฅผ ์ฐจ์ง€

๋ณ€ํ™˜ํ•œ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋‹ค์‹œ ํ›ˆ๋ จ

kn.fit(train_scaled, train_target)


ํ…Œ์ŠคํŠธ ์„ธํŠธ๋„ ํ›ˆ๋ จ ์„ธํŠธ ๊ธฐ์ค€์œผ๋กœ ๋ณ€ํ™˜

์ฃผ์˜ ) ํ›ˆ๋ จ ํ›„ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋ฅผ ํ‰๊ฐ€ํ•  ๋•Œ๋Š”
ํ›ˆ๋ จ ์„ธํŠธ์˜ ๊ธฐ์ค€์œผ๋กœ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋ฅผ ๋ณ€ํ™˜ํ•ด์•ผ ๊ฐ™์€ ์Šค์ผ€์ผ๋กœ ์‚ฐ์ ๋„๋ฅผ ๊ทธ๋ฆด ์ˆ˜ ์žˆ๋‹ค

test_scaled = (test_input - mean) / std

 

๋ชจ๋ธ ํ‰๊ฐ€

kn.score(test_scaled, test_target)

 

๋ชจ๋ธ ์˜ˆ์ธก

print(kn.predict([new]))

=> [1.]
๋“œ๋””์–ด ๋„๋ฏธ๋กœ ์˜ˆ์ธก

kneighbors( ) ํ•จ์ˆ˜๋กœ ์ƒ˜ํ”Œ k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ๊ตฌํ•œ ๋‹ค์Œ ์‚ฐ์ ๋„

ํŠน์„ฑ์„ ํ‘œ์ค€์ ์ˆ˜๋กœ ๋ฐ”๊พธ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๊ฑฐ๋ฆฌ ์ธก์ •
๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ์— ๋ณ€ํ™”!

distances, indexes = kn.kneighbors([new])
plt.scatter(train_scaled[:,0], train_scaled[:,1])
plt.scatter(new[0], new[1], marker='^')
plt.scatter(train_scaled[indexes,0], train_scaled[indexes,1], marker='D')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

 

์‚ผ๊ฐํ˜• ์ƒ˜ํ”Œ์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ƒ˜ํ”Œ ๋ชจ๋‘ ๋„๋ฏธ!
ํŠน์„ฑ๊ฐ’์˜ ์Šค์ผ€์ผ์— ๋ฏผ๊ฐํ•˜์ง€ ์•Š๊ณ  ์•ˆ์ •์ ์ธ ์˜ˆ์ธก ๊ฐ€๋Šฅ!




์ฐธ๊ณ ๋„์„œ : ํ˜ผ์ž๊ณต๋ถ€ํ•˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ + ๋”ฅ๋Ÿฌ๋‹, ๋ฐ•ํ•ด์„ , ํ•œ๋น›๋ฏธ๋””์–ด, 2020๋…„

๋ฐ˜์‘ํ˜•