linw1995 / data_extractor

Data Extractor

Combine XPath, CSS Selectors and JSONPath for Web data extracting.

Quickstarts

Installation

Install the stable version from PYPI.

pip install "data-extractor[jsonpath-extractor]"  # for extracting JSON data
pip install "data-extractor[lxml]"  # for extracting HTML data

Or install the latest version from Github.

pip install "data-extractor[jsonpath-extractor] @ git+https://github.com/linw1995/data_extractor.git@master"

Extract JSON data

Currently supports to extract JSON data with below optional dependencies

install one dependency of them to extract JSON data.

Extract HTML(XML) data

Currently supports to extract HTML(XML) data with below optional dependencies

lxml for using XPath
cssselect for using CSS-Selectors

Usage

from data_extractor import Field, Item, JSONExtractor


class Count(Item):
    followings = Field(JSONExtractor("countFollowings"))
    fans = Field(JSONExtractor("countFans"))


class User(Item):
    name_ = Field(JSONExtractor("name"), name="name")
    age = Field(JSONExtractor("age"), default=17)
    count = Count()


assert User(JSONExtractor("data.users[*]"), is_many=True).extract(
    {
        "data": {
            "users": [
                {
                    "name": "john",
                    "age": 19,
                    "countFollowings": 14,
                    "countFans": 212,
                },
                {
                    "name": "jack",
                    "description": "",
                    "countFollowings": 54,
                    "countFans": 312,
                },
            ]
        }
    }
) == [
    {"name": "john", "age": 19, "count": {"followings": 14, "fans": 212}},
    {"name": "jack", "age": 17, "count": {"followings": 54, "fans": 312}},
]

Changelog

v0.7.0

65d1fce Fix:Create JSONExtractor with wrong subtype
407cd78 New:Make lxml and cssselect optional (#61)

Aug	SEP	Oct
	05
2019	2020	2021

linw1995 / data_extractor

README.rst

Data Extractor

Quickstarts

Installation

Extract JSON data

Extract HTML(XML) data

Usage

Changelog

v0.7.0

About

Releases 33

Packages

Contributors 3

Languages

linw1995 / data_extractor

Join GitHub today

Clone with HTTPS

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Git stats

Files

README.rst

Data Extractor

Quickstarts

Installation

Extract JSON data

Extract HTML(XML) data

Usage

Changelog

v0.7.0

About

Topics

Resources

License

Releases 33

Packages 0

Contributors 3

Languages

Packages