Python Parallel로 DB 데이터 읽어오기

Programming

Python Parallel로 DB 데이터 읽어오기

알 수 없는 사용자 2016. 4. 18. 23:00

테스트 환경 : CentOS 7 / MariaDB 10.1.12 / Python 3.5

테스트 전 test 테이블에 1926144건에 해당하는 데이터를 입력했다.

테스트1 – 전체 데이터 읽기

 
import pymysql
import sys
import pp

def fet(n):
  conn = pymysql.connect(host='192.168.219.153', port=3306, user='root', passwd='root', db='test',charset='utf8',autocommit=True)
  cur = conn.cursor()
  sql = "select * from test”
  #sql += n
  return cur.execute(sql)

ppservers = ()

if len(sys.argv) > 1:
    ncpus = int(sys.argv[1])
    # Creates jobserver with ncpus workers
    job_server = pp.Server(ncpus, ppservers=ppservers)
else:
    # Creates jobserver with automatically detected number of workers
    job_server = pp.Server(ppservers=ppservers)

print("Starting pp with %s workers" % job_server.get_ncpus())

job1 = job_server.submit(fet, ('005380', ),(),("pymysql",))


result = job1()

print("result : %s" % result)

job_server.print_stats()

테이블의 전체 데이터를 읽는 함수를 생성해서 병렬로 수행할 수 있도록 작성했다.

함수 내에서 모듈을 사용하면 다음과 같이 모듈을 넣어주어야 에러가 발생하지 않는다.

job1 = job_server.submit(fet, ('005380', ),(),("pymysql",))

[python@localhost test]$ python test4.py 1

Starting pp with 1 workers

result : 1926144

Job execution statistics:

job count | % of all jobs | job time sum | time per job | job server

1 | 100.00 | 37.7016 | 37.701622 | local

Time elapsed since server creation 37.70819640159607

0 active tasks, 1 cores

[python@localhost ~]$ ps -ef | grep ppworker | grep -v grep

python 5925 5924 99 01:09 pts/1 00:00:22 /usr/local/bin/python3.5 -u -m ppworker 2>/dev/null

[python@localhost test]$ python test4.py 2

Starting pp with 2 workers

result : 1926144

Job execution statistics:

job count | % of all jobs | job time sum | time per job | job server

1 | 100.00 | 35.6303 | 35.630299 | local

Time elapsed since server creation 35.63610649108887

0 active tasks, 2 cores

[python@localhost ~]$ ps -ef | grep ppworker | grep -v grep

python 5914 5913 99 01:07 pts/1 00:00:03 /usr/local/bin/python3.5 -u -m ppworker 2>/dev/null

python 5915 5913 1 01:07 pts/1 00:00:00 /usr/local/bin/python3.5 -u -m ppworker 2>/dev/null

설정한 프로세서만큼 ppworker를 실행시키지만 실제로 mariadb에 접속한 프로세스는 하나인 것을 확인할 수 있다.

MariaDB [(none)]> show processlist \G

*************************** 1. row ***************************

Id: 65

User: root

Host: localhost

db: NULL

Command: Query

Time: 0

State: init

Info: show processlist

Progress: 0.000

*************************** 2. row ***************************

Id: 66

User: root

Host: 192.168.219.153:53260

db: test

Command: Query

Time: 1

State: Writing to net

Info: select * from test

Progress: 0.000

2 rows in set (0.00 sec)

테스트2 – 데이터를 나누어서 전체 데이터 읽기

 
import pymysql
import sys
import pp

def fet(n):
  conn = pymysql.connect(host='192.168.219.153', port=3306, user='root', passwd='root', db='test',charset='utf8',autocommit=True)
  cur = conn.cursor()
  sql = "select * from test where CD="
  sql += n
  return cur.execute(sql)

ppservers = ()

if len(sys.argv) > 1:
    ncpus = int(sys.argv[1])
    # Creates jobserver with ncpus workers
    job_server = pp.Server(ncpus, ppservers=ppservers)
else:
    # Creates jobserver with automatically detected number of workers
    job_server = pp.Server(ppservers=ppservers)

print("Starting pp with %s workers" % job_server.get_ncpus())

job1 = job_server.submit(fet, ('005380', ),(),("pymysql",))
job2 = job_server.submit(fet, ('015760', ),(),("pymysql",))
job3 = job_server.submit(fet, ('015760', ),(),("pymysql",))
job4 = job_server.submit(fet, ('051910', ),(),("pymysql",))

result1 = job1()
result2 = job2()
result3 = job3()
result4 = job4()

print("result : %s" % result1)
print("result : %s" % result2)
print("result : %s" % result3)
print("result : %s" % result4)


job_server.print_stats()

조건절에 해당하는 변수를 매개변수로 받아 총 4개의 작업을 진행해서 전체 데이터를 가져올 수 있도록 했다.

[python@localhost test]$ python test5.py 1

Starting pp with 1 workers

result : 420864

result : 443648

result : 418944

Job execution statistics:

job count | % of all jobs | job time sum | time per job | job server

4 | 100.00 | 33.5512 | 8.387789 | local

Time elapsed since server creation 33.55826544761658

0 active tasks, 1 cores

[python@localhost test]$ python test5.py 2

Starting pp with 2 workers

result : 420864

result : 443648

result : 418944

Job execution statistics:

job count | % of all jobs | job time sum | time per job | job server

4 | 100.00 | 40.8357 | 10.208929 | local

Time elapsed since server creation 20.490312337875366

0 active tasks, 2 cores

테스트 1과 다르게 병렬 프로세서를 사용해서 데이터를 가져오는 것이 속도가 빠름을 확인했으며 MariaDB에 프로세스 수만큼 접속을 한 것을 확인할 수 있다.

MariaDB [(none)]> show processlist \G

*************************** 1. row ***************************

Id: 67

User: root

Host: 192.168.219.153:53261

db: test

Command: Query

Time: 3

State: Writing to net

Info: select * from test where CD=005380

Progress: 0.000

*************************** 2. row ***************************

Id: 68

User: root

Host: 192.168.219.153:53262

db: test

Command: Query

Time: 3

State: Sending data

Info: select * from test where CD=015760

Progress: 0.000

*************************** 3. row ***************************

Id: 69

User: root

Host: localhost

db: NULL

Command: Query

Time: 0

State: init

Info: show processlist

Progress: 0.000

3 rows in set (0.00 sec)

결론
Python Parallel에서는 함수를 이용해서 병렬작업을 진행하는데, 함수를 한 번 호출할 때마다 하나의 프로세서를 사용하는 것을 확인할 수 있었다.
따라서 병렬로 작업 할 때는 작업을 나누어서 진행해야 효과를 볼 수 있다.

'Programming' 카테고리의 다른 글

업종 별 종가 합계 구하기 (0)	2016.04.24
[Python]주식 종목 데이터 및 일별 시세 데이터 DB적재 (0)	2016.04.22
Python Parallel 예제 소스 분석 (0)	2016.04.17
Python Parallel 설치하기 (0)	2016.04.17
종목의 데이터 추출하기 (0)	2016.04.14

현재글Python Parallel로 DB 데이터 읽어오기

aDBanced Team

MongoDB #WiredTiger,

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

aDBanced Team