본문 바로가기

IT To do and To was

22년 1월 5일_크롤링, 다시 학교 수업 시작

728x90
반응형

수요일[젤네일]

 

1. api로 브라우저 정보 가지고 오기

2. 젤네일 한다ㅎㅎ

3. 왜이렇게 불안할까..

 

1.  가지고 오는 방법!

import requests
import json
# 가지고 올 때
response = requests.get('https://api.github.com/')

# print(type(response))

# response.status_code

# if response.status_code == 200 :
# print(response.content)
print(response.text)
json_object = json.dumps(response.text)
# print(json_object['current_user_url'])  
    
with open("json_object.json","w",encoding="utf-8") as a :
    ee = a.write(json_object)

 

import requests
from bs4 import BeautifulSoup


response = requests.get("http://www.summet.com/dmsi/html/codesamples/addresses.html")

html_doc = response.text

soup = BeautifulSoup(html_doc, "html.parser")
print(soup.title)  # str이 아닌 html문서로 찾는 방법
# print(soup.title.name) # 태그의 이름
# print(soup.title.string)#태그의 값
# print(soup.title.text)  #위에랑 동일한 태그의 값

# # print(soup.ul.li)
# print(soup.find('h1'))
# print(soup.find_all('li')) #여러개 찾아주는 것, return type list



# print(response.content ) 네이버 보는 것,
# print(response.text )
# response = requests.get("https://httpbin.org/get")  이 사이트 get한 것
# response = requests.get("https://httpbin.org/put")
# response = requests.get("https://httpbin.org/ip")
# print(response, type(response))
# print(response.status_code)
# print(response.encoding)
# print(response.content)
# print(response.text)
<title>Sample Addresses!</title>
In [56]:
li_list = soup.find_all('li')

# print(li_list, type(li_list))


# for i in li_list :
    # print(type(i))    
    # print(i.get_text())
In [225]:
response = requests.get("https://search.naver.com/search.naver?where=nexearch&sm=top_hty&fbm=1&ie=utf8&query=bts")
html_doc = response.text

soup = BeautifulSoup(html_doc, "html.parser")
# soup.find_all('a',{'href':'#newsstand'})  #, 하고 속성값을 넣을 수 있음
# soup.find_all('button', {'type' :'button'})
# soup.select('a') #자식을 찾기 쉬움 >을 사용하면 됨
soup.select('#header > div.special_bg > div > div.service_area > a.link_jrnaver > span ') #카피(우측클릭) 셀렉터를 하면 됨
# soup.select('#main_pack > section.sc_new.sp_nnews._prs_nws_all > div > div.group_news > ul')
# soup.find_all('a',{'class' : 'news_tit'})
tag = soup.select_one('#main_pack > section.sc_new.sp_nnews._prs_nws_all > div > div.group_news > ul')

news_list = tag.select('li > div > div > a') # 찾아가기 위함

for n in news_list :
    print(n.get_text()) #str
    print(n.get_attribute_list('href'))
    # print(n.get_attribute_list('title')) list로 담음
한 벌에 12만원…BTS도 놀란 ‘BTS 잠옷’ 가격
['https://www.chosun.com/economy/industry-company/2022/01/04/5RBS7KTE6JFWVNJX4BCKC4VKJ4/?utm_source=naver&utm_medium=referral&utm_campaign=naver-news']
BTS 뷔가 부른 OST, 한국 첫 빌보드 ‘핫 100‘ 진입
['https://www.seoul.co.kr/news/newsView.php?id=20220105500019&wlog_tag3=naver']
네이버웹툰, BTS 슈퍼캐스팅 옥외광고 코엑스 케이팝 스퀘어에 공개
['http://www.fnnews.com/news/202201040847465379']
BTS 후보 오른 그래미 시상식, 오미크론 여파 연기 가능성
['http://yna.kr/AKR20220105005100075?did=1195m']
In [186]:
response = requests.get("https://www.daangn.com/search/%EB%85%B8%ED%8A%B8%EB%B6%81")
html_doc = response.text
soup = BeautifulSoup(html_doc, "html.parser")  #BeautifulSoup이란 str을 html 문서로 변경해주는 것


tag = soup.select_one('#result > div:nth-child(1)')
# print(tag)
# new_list = tag.select('div > p > article > a > div > img > div > div > span')
news_list = tag.select('span.article-title ')
for n in news_list :
    print(n.get_text())
    
soup.find_all('span',{'class' : 'article-title'})
삼성 노트북, 미니노트북, 11인치 노트북, 아티브북m(nt110s1j-k11s) 12만원
고사양 노트북, 게이밍노트북, 노트북, 영상편집 노트북
노트북 HP 파빌리온 14 N291TX  i5 노트북(+냉각팬달린 노트북받침)
캉골 크로스백 노트북백 노트북크로스백 노트북가방
고급형 144hz 게이밍 노트북 MSI I7 8750H  GTX1060 노트북 팝니다.
뉴발란스노트북가방, 뉴발란스노트북백팩
Out[186]:
[<span class="article-title">삼성 노트북, 미니노트북, 11인치 노트북, 아티브북m(nt110s1j-k11s) 12만원</span>,
 <span class="article-title">고사양 노트북, 게이밍노트북, 노트북, 영상편집 노트북</span>,
 <span class="article-title">노트북 HP 파빌리온 14 N291TX  i5 노트북(+냉각팬달린 노트북받침)</span>,
 <span class="article-title">캉골 크로스백 노트북백 노트북크로스백 노트북가방</span>,
 <span class="article-title">고급형 144hz 게이밍 노트북 MSI I7 8750H  GTX1060 노트북 팝니다.</span>,
 <span class="article-title">뉴발란스노트북가방, 뉴발란스노트북백팩</span>]
In [195]:
!pip install selenium
Requirement already satisfied: selenium in c:\users\bit\anaconda3\lib\site-packages (4.1.0)
Requirement already satisfied: trio~=0.17 in c:\users\bit\anaconda3\lib\site-packages (from selenium) (0.19.0)
Requirement already satisfied: trio-websocket~=0.9 in c:\users\bit\anaconda3\lib\site-packages (from selenium) (0.9.2)
Requirement already satisfied: urllib3[secure]~=1.26 in c:\users\bit\anaconda3\lib\site-packages (from selenium) (1.26.7)
Requirement already satisfied: attrs>=19.2.0 in c:\users\bit\anaconda3\lib\site-packages (from trio~=0.17->selenium) (21.2.0)
Requirement already satisfied: sortedcontainers in c:\users\bit\anaconda3\lib\site-packages (from trio~=0.17->selenium) (2.4.0)
Requirement already satisfied: sniffio in c:\users\bit\anaconda3\lib\site-packages (from trio~=0.17->selenium) (1.2.0)
Requirement already satisfied: cffi>=1.14 in c:\users\bit\anaconda3\lib\site-packages (from trio~=0.17->selenium) (1.14.6)
Requirement already satisfied: idna in c:\users\bit\anaconda3\lib\site-packages (from trio~=0.17->selenium) (3.2)
Requirement already satisfied: async-generator>=1.9 in c:\users\bit\anaconda3\lib\site-packages (from trio~=0.17->selenium) (1.10)
Requirement already satisfied: outcome in c:\users\bit\anaconda3\lib\site-packages (from trio~=0.17->selenium) (1.1.0)
Requirement already satisfied: pycparser in c:\users\bit\anaconda3\lib\site-packages (from cffi>=1.14->trio~=0.17->selenium) (2.20)
Requirement already satisfied: wsproto>=0.14 in c:\users\bit\anaconda3\lib\site-packages (from trio-websocket~=0.9->selenium) (1.0.0)
Requirement already satisfied: pyOpenSSL>=0.14 in c:\users\bit\anaconda3\lib\site-packages (from urllib3[secure]~=1.26->selenium) (21.0.0)
Requirement already satisfied: cryptography>=1.3.4 in c:\users\bit\anaconda3\lib\site-packages (from urllib3[secure]~=1.26->selenium) (3.4.8)
Requirement already satisfied: certifi in c:\users\bit\anaconda3\lib\site-packages (from urllib3[secure]~=1.26->selenium) (2021.10.8)
Requirement already satisfied: six>=1.5.2 in c:\users\bit\anaconda3\lib\site-packages (from pyOpenSSL>=0.14->urllib3[secure]~=1.26->selenium) (1.16.0)
Requirement already satisfied: h11<1,>=0.9.0 in c:\users\bit\anaconda3\lib\site-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium) (0.12.0)
In [217]:
from selenium import webdriver
driver = webdriver.Chrome('driver/chromedriver') #하위 객체에 있어야 함
driver.get("https://nid.naver.com/nidlogin.login?mode=form&url=https%3A%2F%2Fwww.naver.com")

# driver.save_screenshot('001.png') #화면 캡처
elem_login = driver.find_element_by_id('id')
elem_login.clear()
elem_login.send_keys("haryulpark")

elem_login = driver.find_element_by_id("pw")
elem_login.clear() #기존 값을 지우는 뜻
elem_login.send_keys("@a37983705")
xpath = """//*[@id="log.login"]"""
driver.find_element_by_xpath(xpath).click()
xpath = """/html/body/div/div/div[1]/div[1]/a[1]"""
driver.find_element_by_xpath(xpath).click()
C:\Users\BIT\AppData\Local\Temp/ipykernel_8512/1579777362.py:2: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
  driver = webdriver.Chrome('driver/chromedriver') #하위 객체에 있어야 함
C:\Users\BIT\AppData\Local\Temp/ipykernel_8512/1579777362.py:6: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
  elem_login = driver.find_element_by_id('id')
C:\Users\BIT\AppData\Local\Temp/ipykernel_8512/1579777362.py:10: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
  elem_login = driver.find_element_by_id("pw")
C:\Users\BIT\AppData\Local\Temp/ipykernel_8512/1579777362.py:14: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
  driver.find_element_by_xpath(xpath).click()
C:\Users\BIT\AppData\Local\Temp/ipykernel_8512/1579777362.py:16: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
  driver.find_element_by_xpath(xpath).click()
In [236]:
import requests
from selenium import webdriver
import re
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome('driver/chromedriver') #하위 객체에 있어야 함
driver.get("https://comic.naver.com/index")

clickAcomic = driver.find_element_by_xpath('//*[@id="genreRecommand"]/li[3]/div[1]/a/span')

clickAcomic.click() #스파이더맨 선택

contantOpen = driver.find_element_by_xpath('//*[@id="content"]/div[2]/div/a[1]')

contantOpen.click() # 두번째 장 선택

contantselect = driver.find_element_by_xpath('//*[@id="content"]/table/tbody/tr[2]/td[1]/a/span')

contantselect.click() #임의의 comic select

driver.execute_script('window.scrollTo(0, document.body.scrollHeight);') #최하단으로 내려가는 코드

r = driver.current_url #현재 url얻기
response = requests.get(r)
html_doc = response.text
soup = BeautifulSoup(html_doc, "html.parser")

tag = soup.select_one('#cbox_module_wai_u_cbox_content_wrap_tabpanel > ul > li.u_cbox_comment.cbox_module__comment_425681884._user_id_no_4Lp3h > div.u_cbox_comment_box > div > div.u_cbox_text_wrap')

# o = response.text
# p = re.compile(".*[가-힣]+.*")
# print(p.findall(o))
               
# with open("hi.txt", "w", encoding="utf-8") as comment :
#     for n in phone_num :
#         phone.write('%s \n' %n)
        
        
# search_box.send_keys('11')
# driver.implicitly_wait(3)

# search_box.send_keys(Keys.RETURN)

 

2. 준오라버니가 혹시 좋아할 수 있으니 젤네일을 해본다..

 

3. 이렇게 불안한 이유는 무엇일까. 내가 지금 잘하고 잇는 건지 불안해서? 잘 따라가고 있는건지 몰라서? 이런게 도움이 될지 잘 모르겠어서..? 셋 다일까..?

 

tomorrow wish list

 

. 수업 3개 이상 듣기

. 미정이와 한 약속 스케줄링

. 어떤식으로 말을 꺼낼지 고민 (30분 이상하지않기)

. python 복습

728x90
반응형