입출고, 반품 데이터

Excel 데이터 구조

시트 목록: info, date, 출고, 입고, 반품
특징 정리
- idx가 같으면 fruit, date도 같다.
- idx, fruit가 같아도 location은 다를 수 있다.
- type은 입고, 출고, 미출고, 반품이 있다.
- date는 입고 날짜를 의미
- 중복 데이터 가능

  
import pandas as pd
info = pd.read_excel('test.xlsx', sheet_name='info')
date = pd.read_excel('test.xlsx', sheet_name='date')
df_출고 = pd.read_excel('test.xlsx', sheet_name='출고')
df_입고 = pd.read_excel('test.xlsx', sheet_name='입고')
df_반품 = pd.read_excel('test.xlsx', sheet_name='반품')

작성일자 변경

date

index	연월	작성일
0	1111-11-01	1111-01-18

  
date['연월'] = '2025-06-01'
date['작성일'] = '2025-07-18'

date

index	연월	작성일
0	2025-06-01	2025-07-18

출고

idx 값이 비어있으면 info 시트에서 값 찾아 넣기

  
df_출고.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87 entries, 0 to 86
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   idx     77 non-null     object
 1   fruit   87 non-null     object
 2   date    87 non-null     int64 
 3   type    87 non-null     object

10개의 idx값이 없다.

  
a = info.set_index('fruit')['idx'].to_dict()
a

{'Apple': '02_Ap',
 'Strawberry': '03_St',
 'Lemon': '04_Le',
 'Mango': '05_Ma',
 'Grape': '06_Gr',
 'Orange': '07_Or',
 'Kiwi': '08_Ki',
 'Pineapple': '09_Pi',
 'Melon\xa0': '10_Me',
 'Peach\xa0': '11_Pe',
 'Watermelon\xa0': '12_Wa',
 'Guava': '25_Gu',
 'Pear': '14_Pe',
 'Banana': '15_Ba',
 'Blueberry': '16_Bl',
 'Coconut': '17_Co',
 'Cherry': '18_Ch',
 'Grapefruit': '19_Gr',
 'Durian': '20_Du',
 'Mangosteen': '21_Ma',
 'Apricot': '22_Ap',
 'Lychee': '23_Ly'} 

  
df_출고[df_출고['idx'].isnull()]

index	idx	fruit	date	type
6	NaN	Coconut	250717	출고
9	NaN	Mangosteen	250721	출고
14	NaN	Grapefruit	250719	출고
24	NaN	Mango	250705	출고
29	NaN	Kiwi	250708	출고
40	NaN	Grapefruit	250719	출고
46	NaN	Kiwi	250708	출고
61	NaN	Blueberry	250716	출고
83	NaN	Pear	250714	출고
84	NaN	Banana	250715	출고

  
df_출고['idx'] = df_출고['idx'].fillna(df_출고['fruit'].map(a)) 

  
df_출고[df_출고['idx'].isnull()]

index	idx	fruit	date	type

  
df_출고[df_출고['type']=='미출고']

index	idx	fruit	date	type
11	15_Ba	Banana	250715	미출고
43	19_Gr	Grapefruit	250719	미출고
57	11_Pe	Peach	250711	미출고
70	09_Pi	Pineapple	250709	미출고
86	18_Ch	Cherry	250718	미출고

  
df_출고.loc[df_출고['type'] == '미출고', 'idx'] = '-'

*처음에 df_출고[df_출고['type']=='미출고']['idx']='-'로 작성했었다.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead 라고 뜬다.

  
df_출고[df_출고['type']=='미출고']

index	idx	fruit	date	type
11	-	Banana	250715	미출고
43	-	Grapefruit	250719	미출고
57	-	Peach	250711	미출고
70	-	Pineapple	250709	미출고
86	-	Cherry	250718	미출고

입고

type이 반품인 경우 returns 값이 비어있을 경우 [반품] 에서 찾아 입력

  
df_입고[(df_입고['type'] == '반품') & (df_입고['returns'].isna())]

index	idx	fruit	date	returns	type
4	08_Ki	Kiwi	250708	NaN	반품
12	25_Gu	Guava	250713	NaN	반품
20	15_Ba	Banana	250715	NaN	반품
37	12_Wa	Watermelon	250712	NaN	반품
56	18_Ch	Cherry	250718	NaN	반품
58	15_Ba	Banana	250715	NaN	반품
78	14_Pe	Pear	250714	NaN	반품
80	16_Bl	Blueberry	250716	NaN	반품
102	12_Wa	Watermelon	250712	NaN	반품

  
lo = df_반품.groupby(['fruit','location']).size().unstack(fill_value=0)
lo['count'] = lo.sum(axis=1)
lo

fruit	1호점	2호점	3호점	4호점	5호점	count
Apple	1	0	0	0	1	2
Apricot	0	0	0	0	1	1
Banana	1	1	0	1	1	4
Blueberry	0	1	0	2	0	3
Cherry	1	1	0	1	1	4
Durian	1	2	1	0	0	4
Grape	1	0	1	1	0	3
Grapefruit	0	0	0	1	0	1
Guava	0	0	0	2	4	6
Kiwi	0	0	0	2	0	2
Lychee	2	0	0	0	0	2
Mango	0	0	0	0	1	1
Melon	0	0	0	2	0	2
Orange	0	0	1	1	2	4
Peach	0	1	1	0	1	3
Pear	1	0	1	0	0	2
Strawberry	0	0	0	0	1	1
Watermelon	0	0	0	0	3	3

참고) Counter

각 원소가 몇 번씩 나오는지 출력

  
from collections import Counter

df = x.copy()
for fruit in df['fruit'].unique(): 
  all_returns = df_반품[df_반품['fruit'] == fruit]['location'].dropna().tolist()
  all_counter = Counter(all_returns)
  print(all_counter)

Counter({'4호점': 2})
Counter({'5호점': 4, '4호점': 2})
Counter({'5호점': 1, '1호점': 1, '2호점': 1, '4호점': 1})
Counter({'5호점': 3})
Counter({'4호점': 1, '2호점': 1, '1호점': 1, '5호점': 1})
Counter({'1호점': 1, '3호점': 1})
Counter({'4호점': 2, '2호점': 1})

입고 시트에서 type이 반품인 경우 returns 값이 비어있을 때

반품 시트의 과일명과 일치하는 location을 찾아 returns의 데이터에 값을 넣는 과정을 진행했다.

여기서 문제는 같은 과일이더라도 location이 다르다 라는 것이다.

어떤 식으로 값을 넣을까 고민하다가 Counter를 통해서

반품 시트에서 과일별 location 개수를 Counter로 저장하고

입고 시트에서 returns 값이 이미 있는 경우 해당 location의 개수를 -1한다.

Counter에서 value가 0이상인 값들만 결측치에 하나씩 넣는다.

  
for fruit in df_입고[df_입고['type'] == '반품']['fruit'].unique():

    refund_locs = df_반품[df_반품['fruit'] == fruit]['location'].tolist()
    refund_counter = Counter(refund_locs) # {location : count}

    existing_returns = df_입고[
        (df_입고['fruit'] == fruit) & (df_입고['type'] == '반품') 
        & (df_입고['returns'].notna())]['returns'].tolist()

    for loc in existing_returns: # notna_returns list 
        refund_counter[loc] -= 1

    fill_loc = []
    for loc, count in refund_counter.items(): # 데이터의 key, value 값 튜플
        if count > 0:
            fill_loc.extend([loc] * count)

    nan_index = df_입고[
        (df_입고['fruit'] == fruit) & (df_입고['type'] == '반품') 
        & (df_입고['returns'].isna())].index.tolist()

    for idx, loc in zip(nan_index, fill_loc):
        df_입고.at[idx, 'returns'] = loc

  
expected = df_반품.groupby(['fruit', 'location']).size().unstack(fill_value=0)
actual = df_입고[df_입고['type'] == '반품'].groupby(['fruit', 'returns']).size().unstack(fill_value=0)

comparison = expected.subtract(actual, fill_value=0)
comparison['diff'] = comparison.abs().sum(axis=1)
comparison

fruit	1호점	2호점	3호점	4호점	5호점	diff
Apple	0	0	0	0	0	0
Apricot	0	0	0	0	0	0
Banana	0	0	0	0	0	0
Blueberry	0	0	0	0	0	0
Cherry	0	0	0	0	0	0
Durian	0	0	0	0	0	0
Grape	0	0	0	0	0	0
Grapefruit	0	0	0	0	0	0
Guava	0	0	0	0	0	0
Kiwi	0	0	0	0	0	0
Lychee	0	0	0	0	0	0
Mango	0	0	0	0	0	0
Melon	0	0	0	0	0	0
Orange	0	0	0	0	0	0
Peach	0	0	0	0	0	0
Pear	0	0	0	0	0	0
Strawberry	0	0	0	0	0	0
Watermelon	0	0	0	0	0	0

모두 알맞게 들어갔다.

반품이 아닌 경우 returns ‘-‘로 표시하기

df_입고

.	idx	fruit	date	returns	type
0	02_Ap	Apple	250702	1호점	반품
1	03_St	Strawberry	250703	2호점	입고
2	05_Ma	Mango	250705	4호점	입고
3	06_Gr	Grape	250706	3호점	반품
4	08_Ki	Kiwi	250708	4호점	반품
…	…	…	…	…	…
99	05_Ma	Mango	250705	3호점	입고
100	06_Gr	Grape	250706	1호점	입고
101	07_Or	Orange	250707	5호점	-
102	12_Wa	Watermelon	250712	5호점	반품
103	21_Ma	Mangosteen	250721	4호점	입고

104 rows × 5 columns

  
df_입고['type'].value_counts()

type	count
입고	54
반품	48
’-‘	2

총 56개를 수정해야한다.

  
df_입고['returns'].value_counts()

returns	count
5호점	27
4호점	22
3호점	21
1호점	19
2호점	15

returns는 1~5호점까지 존재한다.

  
df_입고.loc[df_입고['type'] != '반품', 'returns'] = '-'

  
df_입고['returns'].value_counts()

returns	count
’-‘	56
5호점	16
4호점	13
1호점	8
2호점	6
3호점	5

56개의 데이터가 ‘-‘ 로 표시된 것을 확인할 수 있다.

type이 ‘-‘인 경우 idx도 ‘-‘ 표시

  
df_입고['type'].value_counts()

type	count
입고	54
반품	48
’-‘	2

’-‘로 되어있는 데이터가 2개 있다.

  
df_입고['idx'].unique()

array(['02_Ap', '03_St', '05_Ma', '06_Gr', '08_Ki', '10_Me', '11_Pe',
       '25_Gu', '15_Ba', '20_Du', '22_Ap', '23_Ly', '18_Ch', '04_Le',
       '07_Or', '12_Wa', '09_Pi', '21_Ma', '14_Pe', '16_Bl', '17_Co',
       '19_Gr'], dtype=object)

  
df_입고.loc[df_입고['type']=='-', 'idx'] = '-'

  
df_입고['idx'].unique()

array(['02_Ap', '03_St', '05_Ma', '06_Gr', '08_Ki', '10_Me', '11_Pe',
       '25_Gu', '15_Ba', '20_Du', '22_Ap', '23_Ly', '18_Ch', '04_Le', '-',
       '07_Or', '12_Wa', '09_Pi', '21_Ma', '14_Pe', '16_Bl', '17_Co',
       '19_Gr'], dtype=object)

’-‘ 가 추가되었다.

참고) Counter

  
from collections import Counter

c = Counter(['A', 'B', 'A', 'C'])
print(c)  # Counter({'A': 2, 'B': 1, 'C': 1})

print(c['A'])  # 2

print(c['D'])  # 0

c['A'] -= 1
print(c)  # Counter({'A': 1, 'B': 1, 'C': 1})

REFERENCE

Counter

파이썬 collections 모듈의 Counter 사용법

입출고, 반품 데이터

Excel 데이터 구조

작성일자 변경

출고

idx 값이 비어있으면 info 시트에서 값 찾아 넣기

입고

type이 반품인 경우 returns 값이 비어있을 경우 [반품] 에서 찾아 입력

참고) Counter

반품이 아닌 경우 returns ‘-‘로 표시하기

type이 ‘-‘인 경우 idx도 ‘-‘ 표시

참고) Counter

Further Reading

상호작용과 성과 분석

데이터 분석

분석 방법 정리