๊ณ ๊ฐ ์˜ฌ๋ฆฐ ์‚ฌ์ง„์„ ์–ด๋–ค ๊ณผ์ผ์ธ์ง€ ๋ถ„๋ฅ˜


ํƒ€๊นƒ์„ ๋ชจ๋ฅด๋Š” ๋น„์ง€๋„ ํ•™์Šต

ํƒ€๊นƒ์„ ๋ชจ๋ฅด๋Š” ์‚ฌ์ง„์„ ์ข…๋ฅ˜๋ณ„๋กœ ๋ถ„๋ฅ˜

๋น„์ง€๋„ ํ•™์Šต : ํƒ€๊นƒ์ด ์—†์„ ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ 

                       ์‚ฌ๋žŒ์ด ๊ฐ€๋ฅด์ณ ์ฃผ์ง€ ์•Š์•„๋„ ๋ฐ์ดํ„ฐ์— ์žˆ๋Š” ๋ฌด์–ธ๊ฐ€๋ฅผ ํ•™์Šต

 

06-1.  ๊ตฐ์ง‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜

๊ณผ์ผ ์‚ฌ์ง„ ๋ฐ์ดํ„ฐ ์ค€๋น„

์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ๋„˜ํŒŒ์ด ๋ฐฐ์—ด์˜ ๊ธฐ๋ณธ ์ €์žฅ ํฌ๋งท์ธ npy ํŒŒ์ผ๋กœ ์ €์žฅ๋˜์–ด ์žˆ์Œ

์ฝ”๋žฉ์—์„œ ! ๋ฌธ์ž๋กœ ์‹œ์ž‘ํ•˜๋ฉด ์ดํ›„ ๋ช…๋ น์„ ํŒŒ์ด์ฌ ์ฝ”๋“œ๊ฐ€ ์•„๋‹ˆ๋ผ ๋ฆฌ๋ˆ…์Šค ์…ธ ๋ช…๋ น์–ด๋กœ ์ธ์‹

wget ์›๊ฒฉ ์ฃผ์†Œ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ์ €์žฅ

!wget https://bit.ly/fruits_300 -O fruits_300.npy

 

ํŒŒ์ผ๋กœ๋“œ

๋„˜ํŒŒ์ด์—์„œ npy ํŒŒ์ผ์„ ๋กœ๋“œ ๋ฐฉ๋ฒ• : .load() ๋ฉ”์„œ๋“œ์— ํŒŒ์ผ ์ด๋ฆ„ ์ „๋‹ฌ

import numpy as np
import matplotlib.pyplot as plt

fruits = np.load('fruits_300.npy')

 

ํฌ๊ธฐ ํ™•์ธ

print(fruits.shape)

(300, 100, 100)

-> (์ƒ˜ํ”Œ ๊ฐœ์ˆ˜, ์ด๋ฏธ์ง€ ๋†’์ด, ์ด๋ฏธ์ง€ ๋„ˆ๋น„)

์ด๋ฏธ์ง€ ํฌ๊ธฐ๋Š” 100 x 100

 

์ฒซ ๋ฒˆ์งธ ์ด๋ฏธ์ง€ ์ฒซ ํ–‰ ์ถœ๋ ฅ

3์ฐจ์› ๋ฐฐ์—ด์ด๊ธฐ ๋•Œ๋ฌธ์— ์ฒ˜์Œ 2๊ฐœ์˜ ์ธ๋ฑ์Šค๋ฅผ 0์œผ๋กœ ์ง€์ •

๋งˆ์ง€๋ง‰ ์ธ๋ฑ์Šค ์ง€์ •ํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ์Šฌ๋ผ์ด์‹ฑ ์—ฐ์‚ฐ์ž -> ์ฒซ ๋ฒˆ์งธ ์ด๋ฏธ์ง€์˜ ์ฒซ ๋ฒˆ์งธ ํ–‰ ์ถœ๋ ฅ

print(fruits[0,0,:])

์ฒซ ๋ฒˆ์งธ ํ–‰์— ์žˆ๋Š” ํ”ฝ์…€ 100๊ฐœ์— ๋“ค์–ด ์žˆ๋Š” ๊ฐ’ ์ถœ๋ ฅ

์ด ๋„˜ํŒŒ์ด ๋ฐฐ์—ด ํ‘๋ฐฑ ์‚ฌ์ง„ ๋‹ด๊ณ  ์žˆ์Œ -> 0~ 255 ์ •์ˆซ๊ฐ’ ๊ฐ€์ง

 

 

์ด๋ฏธ์ง€ ๊ทธ๋ฆฌ๊ธฐ

๋งทํ”Œ๋กฏ๋ฆฝ์˜ imshow() ๋„˜ํŒŒ์ด ๋ฐฐ์—ด๋กœ ์ €์žฅ๋œ ์ด๋ฏธ์ง€ ๊ทธ๋ฆฌ๊ธฐ

ํ‘๋ฐฑ์ด๋ฏธ์ง€ -> cmap ๋งค๊ฐœ๋ณ€์ˆ˜ 'gray'

plt.imshow(fruits[0], cmap='gray')
plt.show()

0์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ๊ฒ€๊ฒŒ ๋‚˜ํƒ€๋‚˜๊ณ  ๋†’์€ ๊ฐ’์€ ๋ฐ๊ฒŒ ํ‘œ์‹œ

์ด ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋Š” ์‚ฌ์ง„์œผ๋กœ ์ฐ์€ ์ด๋ฏธ์ง€๋ฅผ ๋„˜ํŒŒ์ด ๋ฐฐ์—ด๋กœ ๋ณ€ํ™˜ํ•  ๋•Œ ๋ฐ˜์ „์‹œํ‚จ ๊ฒƒ

์‚ฌ์ง„์˜ ํฐ ๋ฐ”ํƒ•(๋†’์€ ๊ฐ’)์€ ๊ฒ€์€์ƒ‰(๋‚ฎ์€ ๊ฐ’)์œผ๋กœ ๋งŒ๋“ค๊ณ  

์‹ค์ œ ์‚ฌ๊ณผ๊ฐ€ ์žˆ์–ด ์ง™์€ ๋ถ€๋ถ„์€(๋‚ฎ์€ ๊ฐ’)์€ ๋ฐ์€ ์ƒ‰(๋†’์€ ๊ฐ’)์œผ๋กœ ๋ฐ”๊ฟˆ

 

๊ด€์‹ฌ ๋Œ€์ƒ์ด ๋ฐ”ํƒ•์ด ์•„๋‹Œ ์‚ฌ๊ณผ

ํ”ฝ์…€๊ฐ’์ด 0์ด๋ฉด ์ถœ๋ ฅ๋„ 0์ด ๋˜์–ด ์˜๋ฏธ๊ฐ€ ์—†์Œ

ํ”ฝ์…€๊ฐ’์ด ๋†’์œผ๋ฉด ์ถœ๋ ฅ๊ฐ’๋„ ์ปค์ง€๊ธฐ ๋•Œ๋ฌธ์— ์˜๋ฏธ ๋ถ€์—ฌํ•˜๊ธฐ ์ข‹์Œ

๊ด€์‹ฌ ๋Œ€์ƒ์˜ ์—ญ์—ญ์„ ๋†’์€ ๊ฐ’์œผ๋กœ ๋ฐ”๊พธ์—ˆ์ง€๋งŒ ๋งทํ”Œ๋กฏ๋ฆฝ์œผ๋กœ ์ถœ๋ ฅํ•  ๋•Œ ๋ฐ”ํƒ•์ด ๊ฒ€๊ฒŒ ๋‚˜์˜ด

 

gray_r ๋‹ค์‹œ ๋ฐ˜์ „

cmap ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ 'gray_r'๋กœ ์ง€์ •ํ•˜๋ฉด ๋‹ค์‹œ ๋ฐ˜์ „ํ•˜์—ฌ ์šฐ๋ฆฌ ๋ˆˆ์— ๋ณด๊ธฐ ์ข‹๊ฒŒ ์ถœ๋ ฅ

plt.imshow(fruits[0],cmap='gray_r')
plt.show()

์ด ๊ทธ๋ฆผ์—์„œ ๋ฐ์€ ๋ถ€๋ถ„์ด 0์— ๊ฐ€๊น๊ณ  ์ง™์€ ๋ถ€๋ถ„์ด 255์— ๊ฐ€๊นŒ์šด ๊ฐ’!

 

ํŒŒ์ธ์• ํ”Œ, ๋ฐ”๋‚˜๋‚˜ ์ด๋ฏธ์ง€ ์ถœ๋ ฅ

* subplots() : ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ฐฐ์—ด์ฒ˜๋Ÿผ ์Œ“์„ ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์คŒ

                      ๋‘ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๊ทธ๋ž˜ํ”„๋ฅผ ์Œ“์„ ํ–‰๊ณผ ์—ด ์ง€์ •

* axs : ์„œ๋ธŒ ๊ทธ๋ž˜ํ”„๋ฅผ ๋‹ด๊ณ  ์žˆ๋Š” ๋ฐฐ์—ด

fig, axs = plt.subplots(1,2)
axs[0].imshow(fruits[100], cmap='gray_r')
axs[1].imshow(fruits[200], cmap='gray_r')
plt.show()

์ฒ˜์Œ 100๊ฐœ ์‚ฌ๊ณผ, ๊ทธ ๋‹ค์Œ 100๊ฐœ ํŒŒ์ธ์• ํ”Œ, ๋งˆ์ง€๋ง‰ 100๊ฐœ ๋ฐ”๋‚˜๋‚˜

 

 

ํ”ฝ์…€๊ฐ’ ๋ถ„์„ํ•˜๊ธฐ

๋„˜ํŒŒ์ด ๋ฐฐ์—ด์„ ๋‚˜๋ˆŒ ๋•Œ 100 x 100 ์ด๋ฏธ์ง€๋ฅผ ํŽผ์ณ์„œ ๊ธธ์ด๊ฐ€ 10,000์ธ 1์ฐจ์› ๋ฐฐ์—ด๋กœ ๋ณ€๊ฒฝ

-> ์ด๋ฏธ์ง€๋ฅผ ์ถœ๋ ฅํ•˜๊ธด ์–ด๋ ต์ง€๋งŒ ๋ฐฐ์—ด์„ ๊ณ„์‚ฐํ•  ๋•Œ ํŽธ๋ฆฌ

 

1์ฐจ์› ๋ฐฐ์—ด๋กœ ๋ณ€๊ฒฝ

* .reshape : ์ฐจ์› ํ•ฉ์นจ / ์ฒซ๋ฒˆ์งธ ์ฐจ์› -1๋กœ ์ง€์ •์‹œ ์ž๋™์œผ๋กœ ๋‚จ์€ ์ฐจ์› ํ• ๋‹น

apple = fruits[0:100].reshape(-1, 100*100)
pineapple = fruits[100:200].reshape(-1, 100*100)
banana = fruits[200:300].reshape(-1, 100*100)

์ด์ œ ๋ฐฐ์—ด ํฌ๊ธฐ๋Š” (100, 10000)์œผ๋กœ ๋ฐ”๋€œ

 

 

์ƒ˜ํ”Œ์˜ ํ”ฝ์…€ ํ‰๊ท ๊ฐ’

* .mean() : ํ‰๊ท ๊ฐ’

์ƒ˜ํ”Œ๋งˆ๋‹ค ํ”ฝ์…€์˜ ํ‰๊ท ๊ฐ’ ๊ณ„์‚ฐ -> ํ‰๊ท  ๊ณ„์‚ฐํ•  ์ถ• ์ง€์ •

* axis = 0  ์ฒซ๋ฒˆ์งธ ์ถ•์ธ ํ–‰์„ ๋”ฐ๋ผ / axis = 1 ๋‘๋ฒˆ์งธ ์ถ•์ธ ์—ด์„ ๋”ฐ๋ผ ๊ณ„์‚ฐ

์šฐ๋ฆฌ๊ฐ€ ํ•„์š”ํ•œ ๊ฒƒ์˜ ์ƒ˜ํ”Œ์˜ ํ‰๊ท 

์ƒ˜ํ”Œ์„ ๋ชจ๋‘ ๊ฐ€๋กœ๋กœ ๊ฐ’์„ ๋‚˜์—ดํ–ˆ์œผ๋‹ˆ, axis = 1๋กœ ์ง€์ •ํ•˜์—ฌ ํ‰๊ท  ๊ณ„์‚ฐ

print(apple.mean(axis=1))

์‚ฌ๊ณผ ์ƒ˜ํ”Œ 100๊ฐœ์— ๋Œ€ํ•œ ํ”ฝ์…€ ํ‰๊ท ๊ฐ’

 

 

ํžˆ์Šคํ† ๊ทธ๋žจ ๊ทธ๋ ค ํ‰๊ท ๊ฐ’ ๋ถ„ํฌ ํ™•์ธ

* hist() : ํžˆ์Šคํ† ๊ทธ๋žจ ๊ทธ๋ฆฌ๊ธฐ

* alpha : 1๋ณด๋‹ค ์ž‘๊ฒŒํ•˜๋ฉด ํˆฌ๋ช…๋„

* legend() : ํžˆ์Šคํ† ๊ทธ๋žจ ๋ฒ”๋ก€

plt.hist(np.mean(apple, axis=1), alpha=0.8)
plt.hist(np.mean(pineapple, axis=1), alpha=0.8)
plt.hist(np.mean(banana, axis=1), alpha=0.8)
plt.legend(['apple', 'pineapple', 'banana'])
plt.show()

 

 

 

ํ”ฝ์…€๋ณ„ ํ‰๊ท ๊ฐ’ ๋น„๊ต

์ƒ˜ํ”Œ์˜ ํ‰๊ท ๊ฐ’์ด ์•„๋‹ˆ๋ผ ํ”ฝ์…€๋ณ„ ํ‰๊ท ๊ฐ’ ๋น„๊ต

=> ์ „์ฒด ์ƒ˜ํ”Œ์— ๋Œ€ํ•ด ๊ฐ ํ”ฝ์…€์˜ ํ‰๊ท  ๊ณ„์‚ฐ

* bar() : ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„

ํ”ฝ์…€์˜ ํ‰๊ท ์ด๋ฏ€๋กœ axis = 0

fig, axs = plt.subplots(1, 3, figsize=(20,5))
axs[0].bar(range(10000), np.mean(apple, axis=0))
axs[1].bar(range(10000), np.mean(pineapple, axis=0))
axs[2].bar(range(10000), np.mean(banana, axis=0))
plt.show()

๊ณผ์ผ๋งˆ๋‹ค ๊ฐ’์ด ๋†’์€ ๊ตฌ๊ฐ„์ด ๋‹ค๋ฆ„

 

ํ”ฝ์…€ ํ‰๊ท ๊ฐ’ ์ด๋ฏธ์ง€ ์ถœ๋ ฅ

ํ”ฝ์…€ ํ‰๊ท ๊ฐ’์„ 100 x 100 ํฌ๊ธฐ๋กœ ๋ฐ”๊ฟ”์„œ ์ด๋ฏธ์ฒ˜๋Ÿผ ์ถœ๋ ฅํ•˜์—ฌ ์œ„ ๊ทธ๋ž˜ํ”„์™€ ๋น„๊ต

apple_mean = np.mean(apple, axis=0).reshape(100, 100)
pineapple_mean = np.mean(pineapple, axis=0).reshape(100, 100)
banana_mean = np.mean(banana, axis=0).reshape(100, 100)
fig, axs = plt.subplots(1, 3, figsize=(20, 5))
axs[0].imshow(apple_mean, cmap='gray_r')
axs[1].imshow(pineapple_mean, cmap='gray_r')
axs[2].imshow(banana_mean, cmap='gray_r')
plt.show()

 

ํ”ฝ์…€ ์œ„์น˜์— ๋”ฐ๋ผ ๊ฐ’์˜ ํฌ๊ธฐ ์ฐจ์ด๋‚จ

 

 

ํ‰๊ท ๊ฐ’๊ณผ ๊ฐ€๊นŒ์šด ์‚ฌ์ง„ ๊ณ ๋ฅด๊ธฐ

์‚ฌ๊ณผ ์‚ฌ์ง„์˜ ํ‰๊ท ๊ฐ’์ธ apple_man๊ณผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์‚ฌ์ง„ ๊ณ ๋ฅด๊ธฐ

=> ์ ˆ๋Œ€๊ฐ’ ์˜ค์ฐจ : fruits ๋ฐฐ์—ด์— ์žˆ๋Š” ๋ชจ๋“  ์ƒ˜ํ”Œ์—์„œ apple_mean ๋บธ ์ ˆ๋Œ€๊ฐ’์˜ ํ‰๊ท  ๊ณ„์‚ฐ

*  abs() : ์ ˆ๋Œ“๊ฐ’ ๊ณ„์‚ฐ / ๋ฐฐ์—ด์„ ์ž…๋ ฅํ•˜๋ฉด ๋ชจ๋“  ์›์†Œ์˜ ์ ˆ๋Œ“๊ฐ’์„ ๊ณ„์‚ฐํ•˜์—ฌ ์ž…๋ ฅ๊ณผ ๋™์ผํ•œ ํฌ๊ธฐ์˜ ๋ฐฐ์—ด ๋ฐ˜ํ™˜

abs_diff =np.abs(fruits - apple_mean)
abs_mean = np.mean(abs_diff, axis=(1,2))
print(abs_mean.shape)

 

์˜ค์ฐจ ์ž‘์€ ์ˆœ์„œ๋Œ€๋กœ ๊ณจ๋ผ ๊ทธ๋ž˜ํ”„

* np.argsort() : ์ž‘์€ ๊ฒƒ์—์„œ ํฐ ์ˆœ์„œ๋Œ€๋กœ ๋‚˜์—ด

์ฒ˜์Œ 100๊ฐœ๋ฅผ ์„ ํƒํ•ด 10 x 10 ๊ฒฉ์ž๋กœ ์ด๋ฃจ์–ด์ง„ ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ

 

subplots() ํ•จ์ˆ˜๋กœ 10 x 10, ์ด 100๊ฐœ์˜ ์„œ๋ธŒ ๊ทธ๋ž˜ํ”„ ๋งŒ๋“ฆ

๊ทธ๋ž˜ํ”„๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ์ „์ฒด ๊ทธ๋ž˜ํ”„์˜ ํฌ๊ธฐ figsize=(10,10) ์ง€์ • (๊ธฐ๋ณธ๊ฐ’ (8,6))

for ๋ฐ˜๋ณต๋ฌธ ์ˆœํšŒํ•˜๋ฉฐ 10๊ฐœ์˜ ํ–‰๊ณผ ์—ด์— ์ด๋ฏธ์ง€ ์ถœ๋ ฅ

axis('off') ์ขŒํ‘œ์ถ• ๊ทธ๋ฆฌ์ง€ ์•Š์Œ

apple_index = np.argsort(abs_mean)[:100]
fig, axs = plt.subplots(10, 10, figsize=(10,10))
for i in range(10):
  for j in range(10):
    axs[i, j].imshow(fruits[apple_index[i*10 + j]], cmap='gray_r')
    axs[i, j].axis('off')
plt.show()

ํ‘๋ฐฑ ์‚ฌ์ง„์— ์žˆ๋Š” ํ”ฝ์…€๊ฐ’์„ ์‚ฌ์šฉํ•ด ๊ณผ์ผ ์‚ฌ์ง„ ๋ชจ์œผ๋Š” ์ž‘์—… ์™„๋ฃŒ

 

๊ตฐ์ง‘ : ๋น„์Šทํ•œ ์ƒ˜ํ”Œ๋ผ๋ฆฌ ๊ทธ๋ฃน์œผ๋กœ ๋ชจ์œผ๋Š” ์ž‘์—…

ํด๋Ÿฌ์Šคํ„ฐ : ๊ตฐ์ง‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ๋งŒ๋“  ๊ทธ๋ฃน

 


06-2.  K-ํ‰๊ท 

์ง„์งœ ๋น„์ง€๋„ ํ•™์Šต์—์„œ๋Š” ์‚ฌ์ง„์— ์–ด๋–ค ๊ณผ์ผ ๋“ค์–ด์žˆ๋Š”์ง€ ์•Œ์ง€ ๋ชป ํ•จ

=>  K-ํ‰๊ท  ๊ตฐ์ง‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ‰๊ท ๊ฐ’์„ ์ž๋™์œผ๋กœ ์ฐพ์•„์คŒ

์ด ํ‰๊ท ๊ฐ’์ด ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ค‘์‹ฌ์— ์œ„์น˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ ๋˜๋Š” ์„ผํŠธ๋กœ์ด๋“œ๋ผ ๋ถ€๋ฆ„

 

K-ํ‰๊ท  ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž‘๋™๋ฐฉ์‹

1. ๋ฌด์ž‘์œ„๋กœ k๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์„ ์ •ํ•จ

2. ๊ฐ ์ƒ˜ํ”Œ์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์„ ์ฐพ์•„ ํ•ด๋‹น ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ƒ˜ํ”Œ๋กœ ์ง€์ •

3. ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•œ ์ƒ˜ํ”Œ์˜ ํ‰๊ท ๊ฐ’์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ ๋ณ€๊ฒฝ

4. ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์— ๋ณ€ํ™”๊ฐ€ ์—†์„ ๋•Œ๊นŒ์ง€ 2๋ฒˆ์œผ๋กœ ๋Œ์•„๊ฐ€ ๋ฐ˜๋ณต

 

K-ํ‰๊ท  ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ฒ˜์Œ์—๋Š” ๋žœ๋คํ•˜๊ฒŒ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์„ ์„ ํƒํ•˜๊ณ  ์ ์ฐจ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ƒ˜ํ”Œ์˜ ์ค‘์‹ฌ์œผ๋กœ ์ด๋™ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜

 

KMeans ํด๋ž˜์Šค

!wget https://bit.ly/fruits_300 -O fruits_300.npy

 

๋ฐฐ์—ด ์ค€๋น„

k-ํ‰๊ท  ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•ด (์ƒ˜ํ”Œ ๊ฐœ์ˆ˜, ๋„ˆ๋น„, ๋†’์ด) ํฌ๊ธฐ์˜ 3์ฐจ์› ๋ฐฐ์—ด์„

(์ƒ˜ํ”Œ ๊ฐœ์ˆ˜, ๋„ˆ๋น„ x ๋†’์ด) ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง„ 2์ฐจ์› ๋ฐฐ์—ด๋กœ ๋ณ€๊ฒฝ

import numpy as np
fruits = np.load('fruits_300.npy')
fruits_2d = fruits.reshape(-1, 100*100)

 

KMeans

k-ํ‰๊ท  ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ sklearn.cluster ๋ชจ๋“ˆ ์•„๋ž˜ KMeans ํด๋ž˜์Šค

* n_clusters : ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜ ์ง€์ •

๋น„์ง€๋„ ํ•™์Šต์ด๋ฏ€๋กœ fit() ๋ฉ”์„œ๋“œ์—์„œ ํƒ€๊นƒ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ x

from sklearn.cluster import KMeans
km = KMeans(n_clusters=3, random_state=42)
km.fit(fruits_2d)

 

labels_

๊ตฐ์ง‘๋œ ๊ฒฐ๊ณผ๋Š” KMeans ํด๋ž˜์Šค ๊ฐ์ฒด์˜ labels_ ์†์„ฑ์— ์ €์žฅ

labels_ ๋ฐฐ์—ด์˜ ๊ธธ์ด๋Š” ์ƒ˜ํ”Œ ๊ฐœ์ˆ˜์™€ ๊ฐ™์Œ

n_clusters=3์œผ๋กœ ์ง€์ •ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— labels_ ๋ฐฐ์—ด์˜ ๊ฐ’์€ 0,1,2 ์ค‘ ํ•˜๋‚˜

print(km.labels_)

๋ ˆ์ด๋ธ”๊ฐ’ 0,1,2์™€ ๋ ˆ์ด๋ธ” ์ˆœ์„œ์—๋Š” ์–ด๋–ค ์˜๋ฏธ๋„ ์—†์Œ

 

๋ ˆ์ด๋ธ” 0,1,2๋กœ ๋ชจ์€ ์ƒ˜ํ”Œ์˜ ๊ฐœ์ˆ˜ ํ™•์ธ

print(np.unique(km.labels_, return_counts=True))

(array([0, 1, 2], dtype=int32), array([111, 98, 91]))

 

์ฒซ ๋ฒˆ์งธ ํด๋Ÿฌ์Šคํ„ฐ(๋ ˆ์ด๋ธ” 0)์ด 91๊ฐœ ์ƒ˜ํ”Œ

๋‘ ๋ฒˆ์งธ ํด๋Ÿฌ์Šคํ„ฐ(๋ ˆ์ด๋ธ” 1)๊ฐ€ 98๊ฐœ ์ƒ˜ํ”Œ

์„ธ ๋ฒˆ์งธ ํด๋Ÿฌ์Šคํ„ฐ(๋ ˆ์ด๋ธ” 2)๊ฐ€ 111๊ฐœ ์ƒ˜ํ”Œ ๋ชจ์Œ

 

๊ฐ ํด๋Ÿฌ์Šคํ„ฐ ์ด๋ฏธ์ง€ ์ถœ๋ ฅ

๊ฐ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์–ด๋–ค ์ด๋ฏธ์ง€๋ฅผ ๋‚˜ํƒ€๋ƒˆ๋Š”์ง€ ์ถœ๋ ฅํ•˜๊ธฐ ์œ„ํ•ด ์œ ํ‹ธ๋ฆฌํ‹ฐ ํ•จ์ˆ˜ ๋งŒ๋“ค๊ธฐ

draw_fruits() ํ•จ์ˆ˜๋Š” (์ƒ˜ํ”Œ ๊ฐœ์ˆ˜, ๋„ˆ๋น„, ๋†’์ด)์˜ 3์ฐจ์› ๋ฐฐ์—ด์„ ์ž…๋ ฅ๋ฐ›์•„ ๊ฐ€๋กœ๋กœ 10๊ฐœ์”ฉ ์ด๋ฏธ์ง€๋ฅผ ์ถœ๋ ฅfisize๋Š” ratio ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋น„๋ก€ํ•˜์—ฌ ์ปค์ง (ratio ๊ธฐ๋ณธ๊ฐ’ 1)

 

import matplotlib.pyplot as plt
def draw_fruits(arr, ratio=1):
  n = len(arr)         # n์€ ์ƒ˜ํ”Œ ๊ฐœ์ˆ˜
  
  # ํ•œ ์ค„์— 10๊ฐœ์”ฉ ์ด๋ฏธ์ง€๋ฅผ ๊ทธ๋ฆฐ๋‹ค. 
  # ์ƒ˜ํ”Œ ๊ฐœ์ˆ˜๋ฅผ 10์œผ๋กœ ๋‚˜๋ˆ„์–ด ์ „์ฒด ํ–‰ ๊ฐœ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.
  rows = int(np.ceil(n/10))

  # ํ–‰์ด 1๊ฐœ๋ฉด ์—ด์˜ ๊ฐœ์ˆ˜๋Š” ์ƒ˜ํ”Œ ๊ฐœ์ˆ˜์ด๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด 10๊ฐœ์ด๋‹ค.
  cols = n if rows < 2 else 10
  fig, axs = plt.subplots(rows, cols, figsize=(cols*ratio,rows*ratio),
                          squeeze=False)
  for i in range(rows):
    for j in range(cols):
      if i*10 + j < n:  # n ๊ฐœ๊นŒ์ง€๋งŒ ๊ทธ๋ฆฐ๋‹ค.
        axs[i, j].imshow(arr[i*10 + j], cmap='gray_r')
      axs[i, j].axis('off')
  plt.show()

 

๋ถˆ๋ฆฌ์–ธ ์ธ๋ฑ์‹ฑ

km.labels_==0 -> km.labels_ ๋ฐฐ์—ด์—์„œ ๊ฐ’์ด 0์ธ ์œ„์น˜๋Š” True, ๊ทธ ์™ธ ๋ชจ๋“œ False

๋ถˆ๋ฆฌ์–ธ ์ธ๋ฑ์‹ฑ : ๋ถˆ๋ฆฌ์–ธ ๋ฐฐ์—ด์„ ์‚ฌ์šฉํ•ด ์›์†Œ ์„ ํƒ

draw_fruits(fruits[km.labels_==0])

๋ ˆ์ด๋ธ” 0์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง๋œ 91๊ฐœ ์ด๋ฏธ์ง€ ๋ชจ๋‘ ์ถœ๋ ฅ

 

draw_fruits(fruits[km.labels_==1])

 

draw_fruits(fruits[km.labels_==2])

๋ ˆ์ด๋ธ” 1์ธ ํด๋Ÿฌ์Šคํ„ฐ๋Š” ๋ฐ”๋‚˜๋‚˜๋งŒ

๋ ˆ์ด๋ธ” 2์ธ ํด๋Ÿฌ์Šคํ„ฐ๋Š” ํŒŒ์ธ์• ํ”Œ์— ์‚ฌ๊ณผ 9๊ฐœ ๋ฐ”๋‚˜๋‚˜ 2๊ฐœ๊ฐ€ ์„ž์˜€์Œ

 

 

ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ

* cluster_centers_ : KMeans ํด๋ž˜์Šค๊ฐ€ ์ตœ์ข…์ ์œผ๋กœ ์ฐพ์€ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ ์ €์žฅ

์ด ๋ฐฐ์—ด์€ fruits_2d ์ƒ˜ํ”Œ์˜ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€๋กœ ์ถœ๋ ฅํ•˜๋ ค๋ฉด 100 x 100 ํฌ๊ธฐ์˜ 2์ฐจ์›์œผ๋กœ ๋ณ€๊ฒฝ

draw_fruits(km.cluster_centers_.reshape(-1,100,100), ratio=3)

 

* transform() : ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์—์„œ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ๊นŒ์ง€ ๊ฑฐ๋ฆฌ ๋ณ€ํ™˜

                        StandardScaler ํด๋ž˜์Šค์ฒ˜๋Ÿผ ํŠน์„ฑ๊ฐ’์„ ๋ณ€ํ™˜ํ•˜๋Š” ๋„๊ตฌ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์˜๋ฏธ

 

.fruits_2d[100]์ฒ˜๋Ÿผ ์“ฐ๋ฉด (10000,) ํฌ๊ธฐ์˜ ๋ฐฐ์—ด์ด ๋˜๋ฏ€๋กœ ์—๋Ÿฌ

์Šฌ๋ผ์ด์‹ฑ ์—ฐ์‚ฐ์ž ์‚ฌ์šฉํ•ด์„œ (1, 10000) ํฌ๊ธฐ์˜ ๋ฐฐ์—ด ์ „๋‹ฌ

print(km.transform(fruits_2d[100:101]))
[[5267.70439881 8837.37750892 3393.8136117 ]]

ํ•˜๋‚˜์˜ ์ƒ˜ํ”Œ์„ ์ „๋‹ฌํ–ˆ๊ธฐ ๋–„๋ฌธ์— ๋ฐ˜ํ™˜๋œ ๋ฐฐ์—ด์€ ํฌ๊ธฐ๊ฐ€ (1, ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜) ์ธ 2์ฐจ์› ๋ฐฐ์—ด

 

์˜ˆ์ธก

* predict() : ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์„ ์˜ˆ์ธก ํด๋ž˜์Šค๋กœ ์ถœ๋ ฅ
print(km.predict(fruits_2d[100:101]))

[2]

-> ๋ ˆ์ด๋ธ” 2๋กœ ์˜ˆ์ธก -> ํŒŒ์ธ์• ํ”Œ

 

draw_fruits(fruits[100:101])

* n_iter_ : ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋ฐ˜๋ณตํ•œ ํšŸ์ˆ˜ ์ €์žฅ

print(km.n_iter_)

3

 

 

์ตœ์ ์˜ k ์ฐพ๊ธฐ

 k-ํ‰๊ท  ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜ ์‚ฌ์ „์— ์ง€์ •ํ•ด์•ผ ํ•จ

์‹ค์ „์—๋Š” ๋ช‡ ๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์žˆ๋Š”์ง€ ์•Œ ์ˆ˜ ์—†์Œ ... ์ ์ ˆํ•œ k ๊ฐ’ ?

 

k-ํ‰๊ท  ์•Œ๊ณ ๋ฆฌ์ฆ˜์€  ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ๊ณผ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•œ ์ƒ˜ํ”Œ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์žด ์ˆ˜ ์žˆ๋‹ค

์ด๋‹ˆ์…” : ๊ฑฐ๋ฆฌ์˜ ์ œ๊ณฑ ํ•ฉ -> ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•œ ์ƒ˜ํ”Œ์ด ์–ผ๋งˆ๋‚˜ ๊ฐ€๊น๊ฒŒ ๋ชจ์—ฌ ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’

 

์—˜๋ณด์šฐ : ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜๋ฅผ ๋Š˜๋ ค๊ฐ€๋ฉด์„œ ์ด๋‹ˆ์…”์˜ ๋ณ€ํ™”๋ฅผ ๊ด€์ฐฐํ•˜์—ฌ ์ตœ์ ์˜ ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•

 

ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜๋ฅผ ์ฆ์‚ฌ๊ธฐํ‚ค๋ฉด์„œ ์ด๋‹ˆ์…”๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ๊ทธ๋ฆฌ๋ฉด ๊ฐ์†Œํ•˜๋Š” ์†๋„๊ฐ€ ๊บพ์ด๋Š” ์ง€์ ์ด ์žˆ๋‹ค

์ด ์ง€์ ๋ถ€ํ„ฐ๋Š” ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜๋ฅผ ๋Š˜๋ ค๋„ ํด๋Ÿฌ์Šคํ„ฐ์— ์ž˜ ๋ฐ€์ง‘๋œ ์ •๋„๊ฐ€ ํฌ๊ฒŒ ๊ฐœ์„ ๋˜์ง€ ์•Š์Œ

์ฆ‰, ์ด๋‹ˆ์…”๊ฐ€ ํฌ๊ฒŒ ์ค„์–ด๋“ค์ง€ ์•Š๋Š”๋‹ค.  -> ์ด ์ง€์ ์ด ํŒ”๊ฟˆ์น˜ ๋ชจ์–‘ ๊ฐ™์•„์„œ ์—˜๋ณด์šฐ ๋ฐฉ๋ฒ•์ด๋ผ ๋ถ€๋ฆ„

 

 

์ตœ์ ์˜ k ๊ฐ’  -  inertia

* inertia_ : ์ž๋™์œผ๋กœ ์ด๋„ˆ์…” ๊ณ„์‚ฐํ•ด์„œ ์ œ๊ณต

ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜ k๋ฅผ 2~6๊นŒ์ง€ ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ KMeans ํด๋ž˜์Šค๋ฅผ 5๋ฒˆ ํ›ˆ๋ จ

fit() ๋ฉ”์„œ๋“œ๋กœ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•œ ํ›„ inertia_ ์†์„ฑ์— ์ €์žฅ๋œ ์ด๋‹ˆ์…” ๊ฐ’์„ inertia ๋ฆฌ์ŠคํŠธ์— ์ถ”๊ฐ€

inertia ๋ฆฌ์ŠคํŠธ์— ์ €์žฅ๋œ ๊ฐ’์„ ๊ทธ๋ž˜ํ”„๋กœ ์ถœ๋ ฅ

inertia = []
for k in range(2, 7):
  km = KMeans(n_clusters=k, random_state=42)
  km.fit(fruits_2d)
  inertia.append(km.inertia_)
plt.plot(range(2, 7), inertia)
plt.xlabel('k')
plt.ylabel('inertia')
plt.show()

k = 3์—์„œ ๊ทธ๋ž˜ํ”„์˜ ๊ธฐ์šธ๊ธฐ๊ฐ€ ์กฐ๊ธˆ ๋ฐ”๋€œ

 

 


 

06-3.  ์ฃผ์„ฑ๋ถ„ ๋ถ„์„

๊ตฐ์ง‘์ด๋‚˜ ๋ถ„๋ฅ˜์— ์˜ํ–ฅ์„ ๋ผ์น˜์ง€ ์•Š์œผ๋ฉด์„œ ์—…๋กœ๋“œ๋œ ์‚ฌ์ง„์˜ ์šฉ๋Ÿ‰ ์ค„์ด๊ธฐ

 

์ฐจ์›๊ณผ ์ฐจ์› ์ถ•์†Œ

๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ€์ง„ ์†์„ฑ์„ 'ํŠน์„ฑ'์ด๋ผ๊ณ  ๋ถ€๋ฆ„

๊ณผ์ผ ์‚ฌ์ง„์˜ ๊ฒฝ์šฐ 10000๊ฐœ์˜ ํ”ฝ์…€์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— 10000๊ฐœ์˜ ํŠน์„ฑ์ด ์žˆ๋Š” ์…ˆ

๋จธ์‹ ๋Ÿฌ๋‹์—์„œ๋Š” ์ด๋ฅธ ํŠน์„ฑ์„ '์ฐจ์›'์ด๋ผ๊ณ  ๋ถ€๋ฆ„

10000๊ฐœ ํŠน์„ฑ = 10000๊ฐœ ์ฐจ์›  -> ์ฐจ์›์„ ์ค„์ด๋ฉด ์ €์žฅ๊ณต๊ฐ„ ์ ˆ์•ฝ!

 

์ฐจ์›์ถ•์†Œ : ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์žฅ ์ž˜ ๋‚˜ํƒ€๋‚ด๋Š” ์ผ๋ถ€ ํŠน์„ฑ์„ ์„ ํƒํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ณ 

                 ์ง€๋„ ํ•™์Šต ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•

 

์ค„์–ด๋“  ์ฐจ์›์—์„œ ๋‹ค์‹œ ์›๋ณธ ์ฐจ์›์œผ๋กœ ์†์‹ค์„ ์ตœ๋Œ€ํ•œ ์ค„์ด๋ฉด์„œ ๋ณต์›ํ•  ์ˆ˜๋„ ์žˆ์Œ

 

์ฃผ์„ฑ๋ถ„ ๋ถ„์„(PCA) : ๋Œ€ํ‘œ์ ์ธ ์ฐจ์› ์ถ•์†Œ ์•Œ๊ณ ๋ฆฌ์ฆ˜

 

์ฃผ์„ฑ๋ถ„ ๋ถ„์„

์ฃผ์„ฑ๋ถ„ ๋ถ„์„(PCA)์€ ๋ฐ์ดํ„ฐ์— ์žˆ๋Š” ๋ถ„์‚ฐ์ด ํฐ ๋ฐฉํ–ฅ์„ ์ฐพ๋Š” ๊ฒƒ

๋ถ„์‚ฐ : ๋ฐ์ดํ„ฐ๊ฐ€ ๋„๋ฆฌ ํผ์ ธ์žˆ๋Š” ์ •๋„

๋ถ„์‚ฐ์ด ํฐ ๋ฐฉํ–ฅ์ด๋ž€ ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ํ‘œํ˜„ํ•˜๋Š” ์–ด๋–ค ๋ฒกํ„ฐ

 

์ฃผ์„ฑ๋ถ„ ๋ฒกํ„ฐ : ์›๋ณธ ๋ฐ์ดํ„ฐ์— ์žˆ๋Š” ์–ด๋–ค ๋ฐฉํ–ฅ

                       ์ฃผ์„ฑ๋ถ„ ๋ฒกํ„ฐ์˜ ์›์†Œ ๊ฐœ์ˆ˜๋Š” ์›๋ณธ ๋ฐ์ดํ„ฐ์…‹์— ์žˆ๋Š” ํŠน์„ฑ ๊ฐœ์ˆ˜์™€ ๊ฐ™์Œ

 

์ฃผ์„ฑ๋ถ„์€ ์›๋ณธ ์ฐจ์›๊ณผ ๊ฐ™๊ณ , ์ฃผ์„ฑ๋ถ„์œผ๋กœ ๋ฐ”๊พผ ๋ฐ์ดํ„ฐ๋Š” ์ฐจ์›์ด ์ค„์–ด๋“ฌ!!

 

์ฒซ ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„์„ ์ฐพ์€ ๋‹ค์Œ ์ด ๋ฒกํ„ฐ์— ์ˆ˜์ง์ด๊ณ  ๋ถ„์‚ฐ์ด ๊ฐ€์žฅ ํฐ ๋‹ค์Œ ๋ฐฉํ–ฅ์„ ์ฐพ์Œ

์ด ๋ฒกํ„ฐ๊ฐ€ ๋‘ ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„

์ฃผ์„ฑ๋ถ„์€ ์›๋ณธ ํŠน์„ฑ์˜ ๊ฐœ์ˆ˜๋งŒํผ ์ฐพ์„ ์ˆ˜ ์žˆ์Œ

 

PCA ํด๋ž˜์Šค

!wget https://bit.ly/fruits_300 -O fruits_300.npy
import numpy as np
fruits = np.load('fruits_300.npy')
fruits_2d = fruits.reshape(-1, 100*100)

 

์‚ฌ์ดํ‚ท๋Ÿฐ skearn.decomposition ๋ชจ๋“ˆ ์•„๋ž˜ pcaํด๋ž˜์Šค๋กœ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ œ๊ณต

pca ํด๋ž˜์Šค ๋งŒ๋“ค ๋•Œ n_components ์ง€์ •ํ•ด์•ผ ํ•จ

* n_components :  ์ฃผ์„ฑ๋ถ„์˜ ๊ฐœ์ˆ˜ ์ง€์ •

๋น„์ง€๋„ ํ•™์Šต์ด๊ธฐ ๋•Œ๋ฌธ์— fit() ๋ฉ”์„œ๋“œ์— ํƒ€๊นƒ๊ฐ’ ์ œ๊ณต x

from sklearn.decomposition import PCA
pca = PCA(n_components=50)
pca.fit(fruits_2d)

 

* components_ : PCA ํด๋ž˜์Šค๊ฐ€ ์ฐพ์€ ์ฃผ์„ฑ๋ถ„ ์ €์žฅ

print(pca.components_.shape)

(50, 10000)

n_components = 50์œผ๋กœ ์ง€์ •ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— pca.components_ ๋ฐฐ์—ด์˜ ์ฒซ ๋ฒˆ์งธ ์ฐจ์›์ด 500

์ฆ‰, 50๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„์„ ์ฐพ์Œ

๋‘๋ฒˆ์งธ ์ฐจ์›์€ ํ•ญ์ƒ ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ ๊ฐœ์ˆ˜์™€ ๊ฐ™์€ 10000

 

draw_fruits() ์ฃผ์„ฑ๋ถ„ ๊ทธ๋ฆผ ๊ทธ๋ฆฌ๊ธฐ

draw_fruits(pca.components_.reshape(-1, 100, 100))

์ฃผ์„ฑ๋ถ„์€ ์›๋ณธ ๋ฐ์ดํ„ฐ์—์„œ ๊ฐ€์žฅ ๋ถ„์‚ฐ์ด ํฐ ๋ฐฉํ–ฅ์„ ์ˆœ์„œ๋Œ€๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ

๋ฐ์ดํ„ฐ ์…‹์— ์žˆ๋Š” ์–ด๋–ค ํŠน์ง•์„ ์žก์•„๋‚ธ ๊ฒƒ์ฒ˜๋Ÿผ ์ƒ๊ฐ

 

์ฃผ์„ฑ๋ถ„์„ ์ฐพ์•˜์œผ๋ฏ€๋กœ ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ์ฃผ์„ฑ๋ถ„์— ํˆฌ์˜ํ•˜์—ฌ ํŠน์„ฑ ๊ฐœ์ˆ˜๋ฅผ 10,000๊ฐœ์—์„œ 50๊ฐœ๋กœ ์ค„์ผ ์ˆ˜ ์žˆ์Œ

 

์ฃผ์„ฑ๋ถ„์œผ๋กœ ์ฐจ์› ์ค„์ด๊ธฐ

* transform() : ์ฐจ์› ์ค„์ด๊ธฐ

fruits_pca = pca.transform(fruits_2d)
print(fruits_pca.shape)

(300, 50)

 

fruits_2d๋Š” (300, 10000)  ์ด์—ˆ๋Š”๋ฐ 50๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„์„ ์ฐพ์€ pca ๋ชจ๋ธ ์ด์šฉ (300, 50) ํฌ๊ธฐ ๋ฐฐ์—ด๋กœ ๋ณ€ํ™˜
fruits_pca๋Š” 50๊ฐœ ํŠน์„ฑ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ

 

์›๋ณธ ๋ฐ์ดํ„ฐ ์žฌ๊ตฌ์„ฑ

10000๊ฐœ ํŠน์„ฑ์„ 50๊ฐœ๋กœ ์ค„์ž„ -> ์–ด๋А ์ •๋„ ์†์‹ค ๋ฐœ์ƒ

์ตœ๋Œ€ํ•œ ๋ถ„์‚ฐ์ด ํฐ ๋ฐฉํ–ฅ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํˆฌ์˜ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ๋‹น ๋ถ€๋ถ„ ์žฌ๊ตฌ์„ฑํ•  ์ˆ˜ ์ž‡์Œ

 

* inverse_transform() : ์›๋ณธ ๋ฐ์ดํ„ฐ ๋ณต์›

fruits_inverse = pca.inverse_transform(fruits_pca)
print(fruits_inverse.shape)

(300, 10000)

 

๋ณต์›ํ•œ ๋ฐ์ดํ„ฐ ๊ทธ๋ฆผ์ถœ๋ ฅ

fruits_reconstruct = fruits_inverse.reshape(-1, 100, 100)
for start in [0, 100, 200]:
  draw_fruits(fruits_reconstruct[start:start+100])
  print("\n")

50๊ฐœ์˜ ํŠน์„ฑ์ด ๋ถ„์‚ฐ์„ ๊ฐ€์žฅ ์ž˜ ๋ณด์กดํ•˜๋„๋ก ๋ณ€ํ™˜๋œ ๊ฒƒ์ด๊ธฐ ๋–„๋ฌธ์—

๊ฑฐ์˜ ๋‹ค ์ž˜ ๋ณต์› ๋จ

 

๋งŒ์•ฝ ์ฃผ์„ฑ๋ถ„์„ ์ตœ๋Œ€๋กœ ์‚ฌ์šฉํ–ˆ๋‹ค๋ฉด ์™„๋ฒฝํ•˜๊ฒŒ ์›๋ณธ ๋ฐ์ดํ„ฐ ์žฌ๊ตฌ์„ฑ ๊ฐ€๋Šฅ

 

 

์„ค๋ช…๋œ ๋ถ„์‚ฐ

์„ค๋ช…๋œ ๋ถ„์‚ฐ : ์ฃผ์„ฑ๋ถ„์ด ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ๋‚˜ํƒ€๋‚ด๋Š”์ง€ ๊ธฐ๋กํ•œ ๊ฐ’

* explained_variance_ratio_ : ์ฃผ์„ฑ๋ถ„์˜ ์„ค๋ช…๋œ ๋ถ„์‚ฐ ๋น„์œจ

์ฒซ ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„์˜ ์„ค๋ช…๋œ ๋ถ„์‚ฐ์ด ๊ฐ€์žฅ ํผ

๋ถ„์‚ฐ ๋น„์œจ์„ ๋ชจ๋‘ ๋”ํ•˜๋ฉด 50๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„์œผ๋กœ ํ‘œํ˜„ํ•˜๊ณ  ์žˆ๋Š” ์ด ๋ถ„์‚ฐ ๋น„์œจ์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ

print(np.sum(pca.explained_variance_ratio_))
plt.plot(pca.explained_variance_ratio_)
plt.show()

์ฒ˜์Œ 10๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„์ด ๋Œ€๋ถ€๋ถ„์˜ ๋ถ„์‚ฐ์„ ํ‘œํ˜„ํ•˜๊ณ  ์žˆ์Œ

๊ทธ ๋‹ค์Œ๋ถ€ํ„ฐ๋Š” ๊ฐ ์ฃผ์„ฑ๋ถ„์ด ์„ค๋ช…ํ•˜๊ณ  ์žˆ๋Š” ๋ถ„์‚ฐ ๋น„๊ต์  ์ž‘์Œ

 

๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ

3๊ฐœ์˜ ๊ณผ์ผ ์‚ฌ์ง„์„ ๋ถ„๋ฅ˜ -> ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ชจ๋ธ LogisticRegression 

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()

 

์ง€๋„ ํ•™์Šต ๋ชจ๋ธ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ํƒ€๊นƒ๊ฐ’ ์žˆ์–ด์•ผ ํ•จ

์—ฌ๊ธฐ์„œ ์‚ฌ๊ณผ 0, ํŒŒ์ธ์• ํ”Œ 1, ๋ฐ”๋‚˜๋‚˜ 2๋กœ ์ง€์ •

ํŒŒ์ด์ฌ ๋ฆฌ์ŠคํŠธ์™€ ์ •์ˆ˜๋ฅผ ๊ณฑํ•˜๋ฉด ๋ฆฌ์ŠคํŠธ ์•ˆ์˜ ์›์†Œ๋ฅผ ์ •์ˆ˜๋งŒํผ ๋ฐ˜๋ณต

target = np.array([0]*100 + [1]*100 + [2]*100)

 

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ชจ๋ธ์—์„œ ์„ฑ๋Šฅ ๊ฐ€๋Š ์œ„ํ•ด

* cross_validate() ๊ต์ฐจ๊ฒ€์ฆ

from sklearn.model_selection import cross_validate
scores = cross_validate(lr, fruits_2d, target)
print(np.mean(scores['test_score']))
print(np.mean(scores['fit_time']))

0.9966666666666667

0.9233800888061523

 

๊ต์ฐจ ๊ฒ€์ฆ ์ ์ˆ˜ 0.997 ์ •๋„๋กœ ๋งค์šฐ ๋†’์Œ

ํŠน์„ฑ 10000๊ฐœ๋‚˜ ๋˜๊ธฐ ๋•Œ๋ฌธ์— 300๊ฐœ ์ƒ˜ํ”Œ์—์„œ๋Š” ๊ธˆ๋ฐฉ ๊ณผ๋Œ€์ ํ•ฉ๋œ ๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ ์‰ฌ์›€

 

* fit_time : ๊ฐ ๊ต์ฐจ ๊ฒ€์ฆ ํด๋“œ์˜ ํ›ˆ๋ จ์‹œ๊ฐ„

0.94์ดˆ ์ •๋„ ๊ฑธ๋ฆผ

 

์ถ•์†Œํ•œ ๊ฐ’๊ณผ ๋น„๊ต

scores = cross_validate(lr, fruits_pca, target)
print(np.mean(scores['test_score']))
print(np.mean(scores['fit_time']))

1.0

0.02978053092956543

 

50๊ฐœ์˜ ํŠน์„ฑ๋งŒ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ๋„ ์ •ํ™•๋„ 100%

ํ›ˆ๋ จ์‹œ๊ฐ„ 0.03์ดˆ๋กœ ๊ฐ์†Œํ•จ

 

PCA๋กœ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์„ ์ถ•์†Œํ•˜๋ฉด ์ €์žฅ ๊ณต๊ฐ„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ํ›ˆ๋ จ ์†๋„๋„ ๋†’์ผ ์ˆ˜ ์žˆ์Œ

 

์„ค๋ช…๋œ ๋ถ„์‚ฐ ๋น„์œจ ์ž…๋ ฅ

* n_components : ์ฃผ์„ฑ๋ถ„ ๊ฐœ์ˆ˜ ์ง€์ •

                 ๋Œ€์‹  ์›ํ•˜๋Š” ์„ค๋ช…๋œ ๋ถ„์‚ฐ์˜ ๋น„์œจ ์ž…๋ ฅ ๊ฐ€๋Šฅ, PCA ํด๋ž˜์Šค ์ง€์ •๋œ ๋น„์œจ์— ๋„๋‹ฌํ•  ๋–„๊นŒ์ง€ ์ž๋™์œผ๋กœ ์ฃผ์„ฑ๋ถ„ ์ฐพ์Œ

                 ์ฃผ์„ฑ๋ถ„ ๊ฐœ์ˆ˜ ๋Œ€์‹  0~1 ์‚ฌ์ด์˜ ๋น„์œจ์„ ์‹ค์ˆ˜๋กœ ์ž…๋ ฅ

pca = PCA(n_components=0.5)
pca.fit(fruits_2d)

 

๋ช‡ ๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„ ์ฐพ์•˜๋Š”์ง€

print(pca.n_components_)

2

-> ๋‹จ 2๊ฐœ์˜ ํŠน์„ฑ๋งŒ์œผ๋กœ ์›๋ณธ ๋ฐ์ดํ„ฐ์— ์žˆ๋Š” ๋ถ„์‚ฐ์˜ 50%๊นŒ์ง€ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค

 

์›๋ณธ ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜

fruits_pca = pca.transform(fruits_2d)
print(fruits_pca.shape)

(300, 2)

์ฃผ์„ฑ๋ถ„์ด 2๊ฐœ์ด๋ฏ€๋กœ ๋ณ€ํ™˜๋œ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ (300, 2)

 

๊ต์ฐจ ๊ฒ€์ฆ ๊ฒฐ๊ณผ

scores = cross_validate(lr, fruits_pca, target)
print(np.mean(scores['test_score']))
print(np.mean(scores['fit_time']))

0.99

0.03771648406982422

2๊ฐœ์˜ ํŠน์„ฑ์„ ์‚ฌ์šฉํ–ˆ์„ ๋ฟ์ธ๋ฐ 99% ์ •ํ™•๋„ ๋‹ฌ์„ฑ

 

์ฐจ์› ์ถ•์†Œ๋œ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ k-ํ‰๊ท  ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ ์ฐพ๊ธฐ

from sklearn.cluster import KMeans
km = KMeans(n_clusters=3, random_state=42)
km.fit(fruits_pca)
print(np.unique(km.labels_, return_counts=True))

(array([0, 1, 2], dtype=int32), array([ 91, 99, 110]))

์ฐพ์€ ํด๋Ÿฌ์Šคํ„ฐ๋Š” ๊ฐ๊ฐ 91, 99, 110๊ฐœ์˜ ์ƒ˜ํ”Œ ํฌํ•จ

 

KMeans ์ฐพ์€ ๋ ˆ์ด๋ธ” ์ด๋ฏธ์ง€ ์ถœ๋ ฅ

for label in range(0,3):
  draw_fruits(fruits[km.labels_ == label])
  print("\n")

 

ํด๋Ÿฌ์Šคํ„ฐ๋ณ„ ์‚ฐ์ ๋„

์ฐจ์› ์ค„์ด๋ฉด ์–ป๋Š” ๋˜ ํ•˜๋‚˜์˜ ์žฅ์  : ์‹œ๊ฐํ™”!

3๊ฐœ ์ดํ•˜๋กœ ์ฐจ์›์„ ์ค„์ด๋ฉด ํ™”๋ฉด์— ์ถœ๋ ฅํ•˜๊ธฐ ๋น„๊ต์  ์‰ฌ์›€

 

km.labels_ ์‚ฌ์šฉ ํด๋Ÿฌ์Šคํ„ฐ๋ณ„๋กœ ๋‚˜๋ˆ„์–ด ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ

for label in range(0, 3):
  data = fruits_pca[km.labels_ == label]
  plt.scatter(data[:,0], data[:,1])
plt.legend(['apple', 'banana', 'pineapple'])
plt.show()

๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์˜ ์‚ฐ์ ๋„๊ฐ€ ์•„์ฃผ ์ž˜ ๊ตฌ๋ถ„๋จ

 

 

 


์ฐธ๊ณ ๋„์„œ : ํ˜ผ์ž๊ณต๋ถ€ํ•˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ + ๋”ฅ๋Ÿฌ๋‹, ๋ฐ•ํ•ด์„ , ํ•œ๋น›๋ฏธ๋””์–ด, 2020๋…„

๋ฐ˜์‘ํ˜•

BELATED ARTICLES

more