The Wayback Machine - https://web.archive.org/web/20200905065713/https://github.com/linw1995/data_extractor/
Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

README.rst

Data Extractor

license Pypi Status Python version Package version PyPI - Downloads GitHub last commit Code style: black Build Status codecov Documentation Status

Combine XPath, CSS Selectors and JSONPath for Web data extracting.

Quickstarts

Installation

Install the stable version from PYPI.

pip install "data-extractor[jsonpath-extractor]"  # for extracting JSON data
pip install "data-extractor[lxml]"  # for extracting HTML data

Or install the latest version from Github.

pip install "data-extractor[jsonpath-extractor] @ git+https://github.com/linw1995/data_extractor.git@master"

Extract JSON data

Currently supports to extract JSON data with below optional dependencies

install one dependency of them to extract JSON data.

Extract HTML(XML) data

Currently supports to extract HTML(XML) data with below optional dependencies

Usage

from data_extractor import Field, Item, JSONExtractor


class Count(Item):
    followings = Field(JSONExtractor("countFollowings"))
    fans = Field(JSONExtractor("countFans"))


class User(Item):
    name_ = Field(JSONExtractor("name"), name="name")
    age = Field(JSONExtractor("age"), default=17)
    count = Count()


assert User(JSONExtractor("data.users[*]"), is_many=True).extract(
    {
        "data": {
            "users": [
                {
                    "name": "john",
                    "age": 19,
                    "countFollowings": 14,
                    "countFans": 212,
                },
                {
                    "name": "jack",
                    "description": "",
                    "countFollowings": 54,
                    "countFans": 312,
                },
            ]
        }
    }
) == [
    {"name": "john", "age": 19, "count": {"followings": 14, "fans": 212}},
    {"name": "jack", "age": 17, "count": {"followings": 54, "fans": 312}},
]

Changelog

v0.7.0

  • 65d1fce Fix:Create JSONExtractor with wrong subtype
  • 407cd78 New:Make lxml and cssselect optional (#61)
You can’t perform that action at this time.