
Parsing s pomoschyu Python. Veb-skraping v deystvii
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Eto obnovlennoe trete izdanie knigi ne tolko poznakomit vas s veb-skrapingom, no i posluzhit ischerpyvayuschim rukovodstvom po sboru prakticheski lyubyh vidov dannyh v sovremennom Internete.
V chasti I osnovnoe vnimanie udeleno mehanike veb-skrapinga: kak s pomoschyu Python otpravlyat zaprosy veb-serveram, obrabatyvat otvety i avtomatizirovat vzaimodeystvie s saytami.
V chasti II issleduyutsya bolee konkretnye instrumenty i prilozheniya, kotorye prigodyatsya pri lyubom stsenarii veb-skrapinga. Kniga pokazhet, kak:
. analizirovat slozhnye HTML-stranitsy;
. razrabatyvat veb-skanery s pomoschyu freymvorka Scrapy;
. hranit dannye, poluchennye s pomoschyu skrapinga;
. chitat i izvlekat dannye iz dokumentov;
. ochischat i normalizovyvat ploho otformatirovannye dannye;
. chitat i zapisyvat informatsiyu na estestvennyh yazykah;
. vypolnyat poisk po formam i stranitsam vhoda;
. vypolnyat skraping JavaScript-koda i rabotat s API;
. pisat i ispolzovat programmy dlya preobrazovaniya izobrazheniy v tekst;
. obhodit protivoskrapingovye lovushki i blokatory botov;
. testirovat svoi veb-sayty s pomoschyu skrapinga.
More details
Persons
System requirements
File format: ePUB
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reading software that can process the file format ePUB: e.g., Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Before downloading, install the free app Adobe Digital Editions (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.