The Wayback Machine - https://web.archive.org/web/20200908091239/https://github.com/NVIDIA/NVTabular/pull/260
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Update documentation #260

Merged
merged 1 commit into from Sep 1, 2020
Merged

[REVIEW] Update documentation #260

merged 1 commit into from Sep 1, 2020

Conversation

@benfred
Copy link
Collaborator

benfred commented Sep 1, 2020

  • Fix missing images in intro/readme
  • Add Dataset api documentation
  • Add JoinExternal op
  • Fix dataloader docs to point to new nvtabular.loader module
  • Fix ColumnSimilarity / TensorFlow / PyTorch paths
  • Other minor RST fixes
* Fix missing images in intro/readme
* Add Dataset api documentation
* Add JoinExternal op
* Fix dataloader docs to point to new nvtabular.loader module
* Fix ColumnSimilarity / TensorFlow / PyTorch paths
* Other minor RST fixes
@benfred benfred changed the title Update documentation [REVIEW] Update documentation Sep 1, 2020
@nvidia-merlin-bot
Copy link
Collaborator

nvidia-merlin-bot commented Sep 1, 2020

Click to view CI Results
GitHub pull request #260 of commit 14f502ac7a32e6bdfec8b721817a12c90d66b871, no merge conflicts.
Running as SYSTEM
Setting status of 14f502ac7a32e6bdfec8b721817a12c90d66b871 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/719/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/260/*:refs/remotes/origin/pr/260/* # timeout=10
 > git rev-parse 14f502ac7a32e6bdfec8b721817a12c90d66b871^{commit} # timeout=10
Checking out Revision 14f502ac7a32e6bdfec8b721817a12c90d66b871 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 14f502ac7a32e6bdfec8b721817a12c90d66b871 # timeout=10
Commit message: "Update documentation"
 > git rev-list --no-walk 619a6d8f9de3a6579f7281d2200d0bd293ee4f96 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins461013832381397249.sh
Could not find conda environment: rapids
You can list all discoverable environments with `conda info --envs`.

Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6997215041172362619.sh
Could not find conda environment: rapids
You can list all discoverable environments with conda info --envs.

@jperez999
Copy link
Collaborator

jperez999 commented Sep 1, 2020

rerun tests

@nvidia-merlin-bot
Copy link
Collaborator

nvidia-merlin-bot commented Sep 1, 2020

Click to view CI Results
GitHub pull request #260 of commit 14f502ac7a32e6bdfec8b721817a12c90d66b871, no merge conflicts.
Running as SYSTEM
Setting status of 14f502ac7a32e6bdfec8b721817a12c90d66b871 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/720/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/260/*:refs/remotes/origin/pr/260/* # timeout=10
 > git rev-parse 14f502ac7a32e6bdfec8b721817a12c90d66b871^{commit} # timeout=10
Checking out Revision 14f502ac7a32e6bdfec8b721817a12c90d66b871 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 14f502ac7a32e6bdfec8b721817a12c90d66b871 # timeout=10
Commit message: "Update documentation"
 > git rev-list --no-walk 14f502ac7a32e6bdfec8b721817a12c90d66b871 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6385913297262911330.sh
Could not find conda environment: rapids
You can list all discoverable environments with `conda info --envs`.

Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6827982333217709989.sh
Could not find conda environment: rapids
You can list all discoverable environments with conda info --envs.

@jperez999
Copy link
Collaborator

jperez999 commented Sep 1, 2020

rerun tests

@nvidia-merlin-bot
Copy link
Collaborator

nvidia-merlin-bot commented Sep 1, 2020

Click to view CI Results
GitHub pull request #260 of commit 14f502ac7a32e6bdfec8b721817a12c90d66b871, no merge conflicts.
Running as SYSTEM
Setting status of 14f502ac7a32e6bdfec8b721817a12c90d66b871 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/721/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/260/*:refs/remotes/origin/pr/260/* # timeout=10
 > git rev-parse 14f502ac7a32e6bdfec8b721817a12c90d66b871^{commit} # timeout=10
Checking out Revision 14f502ac7a32e6bdfec8b721817a12c90d66b871 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 14f502ac7a32e6bdfec8b721817a12c90d66b871 # timeout=10
Commit message: "Update documentation"
 > git rev-list --no-walk 14f502ac7a32e6bdfec8b721817a12c90d66b871 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2279265822925618214.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1
collected 435 items

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 13%]
tests/unit/test_io.py .................................................. [ 25%]
............................ [ 31%]
tests/unit/test_notebooks.py ..F. [ 32%]
tests/unit/test_ops.py ................................................. [ 43%]
........................................................................ [ 60%]
...................................... [ 69%]
tests/unit/test_s3.py .. [ 69%]
tests/unit/test_tf_dataloader.py ............ [ 72%]
tests/unit/test_torch_dataloader.py ..................... [ 77%]
tests/unit/test_workflow.py ............................................ [ 87%]
....................................................... [100%]

=================================== FAILURES ===================================
_____________________________ test_rossman_example _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_rossman_example0')

def test_rossman_example(tmpdir):
    pytest.importorskip("nvtabular.loader.tensorflow")
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "train.csv"))
    _get_random_rossmann_data(1000).to_csv(os.path.join(tmpdir, "valid.csv"))
    os.environ["OUTPUT_DATA_DIR"] = str(tmpdir)

    notebook_path = os.path.join(
        dirname(TEST_PATH), "examples", "rossmann-store-sales-example.ipynb"
    )
  _run_notebook(tmpdir, notebook_path, lambda line: line.replace("EPOCHS = 25", "EPOCHS = 1"))

tests/unit/test_notebooks.py:51:


tests/unit/test_notebooks.py:92: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/opt/conda/envs/rapids/lib/python3.7/subprocess.py:411: in check_output
**kwargs).stdout


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/opt/conda/envs/rapids/bin/python', '/tmp/pytest-of-jenkins/pytest-0/test_rossman_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f2220b24290>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
            raise CalledProcessError(retcode, process.args,
                                   output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/opt/conda/envs/rapids/bin/python', '/tmp/pytest-of-jenkins/pytest-0/test_rossman_example0/notebook.py']' returned non-zero exit status 1.

/opt/conda/envs/rapids/lib/python3.7/subprocess.py:512: CalledProcessError
----------------------------- Captured stderr call -----------------------------
2020-09-01 17:27:34.885932: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/lib:/opt/conda/envs/rapids/lib
2020-09-01 17:27:34.886062: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/lib:/opt/conda/envs/rapids/lib
2020-09-01 17:27:34.886077: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('http://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))
/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning: �[1m
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('http://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
warnings.warn(errors.NumbaWarning(msg))
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-0/test_rossman_example0/notebook.py", line 100, in
from nvtabular.loader.tensorflow import KerasSequenceLoader, KerasSequenceValidater
File "/var/jenkins_home/workspace/nvtab_docs/nvtabular/nvtabular/loader/tensorflow.py", line 24, in
from nvtabular.loader.tf_utils import configure_tensorflow, get_dataset_schema_from_feature_columns
File "/var/jenkins_home/workspace/nvtab_docs/nvtabular/nvtabular/loader/tf_utils.py", line 8, in
from ..io import device_mem_size
ImportError: cannot import name 'device_mem_size' from 'nvtabular.io' (/var/jenkins_home/workspace/nvtab_docs/nvtabular/nvtabular/io/init.py)
=============================== warnings summary ===============================
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41129 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 28392 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 29960 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 30912 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 31276 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 29344 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 30240 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:97: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 7 0 0 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 80 3 32 6 92% 154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178
nvtabular/io/dataframe_engine.py 12 2 4 1 81% 31->32, 32, 37
nvtabular/io/dataset.py 99 11 46 9 85% 94->95, 95, 100->102, 102, 107->108, 108, 116->117, 117, 125->137, 130->135, 135-137, 212->213, 213, 226->227, 227-229, 247->248, 248
nvtabular/io/dataset_engine.py 12 0 0 0 100%
nvtabular/io/hugectr.py 42 1 18 1 97% 64->87, 91
nvtabular/io/parquet.py 153 5 50 5 95% 100->102, 102-104, 112->114, 114, 140->141, 141, 236->238, 244->249
nvtabular/io/shuffle.py 25 2 10 2 89% 38->39, 39, 43->46, 46
nvtabular/io/writer.py 119 9 42 2 92% 29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 188 11 60 9 92% 71->72, 72, 90->91, 91, 95->87, 115->116, 116, 135->136, 136, 146-147, 158, 233->235, 248->249, 249, 253->254, 254, 271->272, 272-273
nvtabular/loader/tensorflow.py 112 35 46 11 65% 39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 76->77, 77, 78->83, 83, 242-253, 264->265, 265, 284->285, 285, 292->293, 293, 294->297, 297, 302->303, 303, 311-313, 316-318, 326, 329-337
nvtabular/loader/tf_utils.py 51 24 20 5 45% 13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48, 60-65, 75-88
nvtabular/loader/torch.py 33 1 4 0 97% 115
nvtabular/ops/init.py 20 0 0 0 100%
nvtabular/ops/categorify.py 356 63 188 37 79% 147->148, 148, 156->161, 161, 171->172, 172, 216->217, 217, 260->261, 261, 264->270, 286-287, 291-296, 300, 340->341, 341-343, 345->346, 346, 347->348, 348, 366->369, 369, 380->381, 381, 387->390, 413->414, 414-415, 417->418, 418-419, 421->422, 422-438, 440->444, 444, 448->449, 449, 450->451, 451, 458->459, 459, 462->465, 465->466, 466, 469->473, 473-476, 486->487, 487, 489->492, 494->511, 511-514, 537->538, 538, 541->542, 542, 543->544, 544, 551->552, 552, 553->556, 556, 663->664, 664, 665->666, 666, 687->702, 727->732, 730->731, 731, 741->738, 746->738
nvtabular/ops/clip.py 25 3 10 4 80% 36->37, 37, 45->46, 46, 50->52, 52->53, 53
nvtabular/ops/column_similarity.py 89 21 28 4 70% 171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238
nvtabular/ops/difference_lag.py 21 1 4 1 92% 71->72, 72
nvtabular/ops/dropna.py 14 0 0 0 100%
nvtabular/ops/fill.py 36 2 10 2 91% 55->56, 56, 85->86, 86
nvtabular/ops/filter.py 17 1 2 1 89% 42->43, 43
nvtabular/ops/groupby_statistics.py 79 3 30 3 94% 145->146, 146, 150->172, 179->180, 180, 204
nvtabular/ops/hash_bucket.py 30 4 16 2 83% 31->32, 32-34, 35->38, 38
nvtabular/ops/join_external.py 66 4 26 5 90% 81->82, 82, 83->84, 84, 98->101, 101, 114->118, 155->156, 156
nvtabular/ops/join_groupby.py 56 0 18 0 100%
nvtabular/ops/lambdaop.py 24 2 8 2 88% 49->50, 50, 51->52, 52
nvtabular/ops/logop.py 17 1 4 1 90% 43->44, 44
nvtabular/ops/median.py 24 1 2 0 96% 52
nvtabular/ops/minmax.py 30 1 2 0 97% 56
nvtabular/ops/moments.py 33 1 2 0 97% 60
nvtabular/ops/normalize.py 49 4 14 4 84% 49->50, 50, 57->56, 90->91, 91, 100->102, 102-103
nvtabular/ops/operator.py 19 1 8 2 89% 43->42, 45->46, 46
nvtabular/ops/stat_operator.py 9 0 0 0 100%
nvtabular/ops/target_encoding.py 70 2 10 2 95% 150->151, 151-152, 202->205
nvtabular/ops/transform_operator.py 41 3 10 2 90% 42-46, 69->71, 88->89, 89
nvtabular/utils.py 17 3 6 3 74% 22->23, 23, 25->26, 26, 33->34, 34
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 420 47 230 24 86% 99->103, 103, 106->108, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 367->366, 375->378, 378, 399-414, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 658->649, 724->735, 735, 758-788, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905
setup.py 2 2 0 0 0% 18-20

TOTAL 2596 278 1000 155 86%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 86.01%
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_rossman_example - subprocess.Called...
============ 1 failed, 434 passed, 20 warnings in 435.85s (0:07:15) ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7100975337793644757.sh

@benfred
Copy link
Collaborator Author

benfred commented Sep 1, 2020

rerun tests

@nvidia-merlin-bot
Copy link
Collaborator

nvidia-merlin-bot commented Sep 1, 2020

Click to view CI Results
GitHub pull request #260 of commit 14f502ac7a32e6bdfec8b721817a12c90d66b871, no merge conflicts.
Running as SYSTEM
Setting status of 14f502ac7a32e6bdfec8b721817a12c90d66b871 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/722/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/260/*:refs/remotes/origin/pr/260/* # timeout=10
 > git rev-parse 14f502ac7a32e6bdfec8b721817a12c90d66b871^{commit} # timeout=10
Checking out Revision 14f502ac7a32e6bdfec8b721817a12c90d66b871 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 14f502ac7a32e6bdfec8b721817a12c90d66b871 # timeout=10
Commit message: "Update documentation"
 > git rev-list --no-walk 14f502ac7a32e6bdfec8b721817a12c90d66b871 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7116433485589701120.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1
collected 435 items

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 13%]
tests/unit/test_io.py .................................................. [ 25%]
............................ [ 31%]
tests/unit/test_notebooks.py .... [ 32%]
tests/unit/test_ops.py ................................................. [ 43%]
........................................................................ [ 60%]
...................................... [ 69%]
tests/unit/test_s3.py .. [ 69%]
tests/unit/test_tf_dataloader.py ............ [ 72%]
tests/unit/test_torch_dataloader.py ..................... [ 77%]
tests/unit/test_workflow.py ............................................ [ 87%]
....................................................... [100%]

=============================== warnings summary ===============================
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37165 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 32088 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 29960 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 29568 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 29204 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 31136 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[devices1-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 30240 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:76: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:97: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 7 0 0 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 80 3 32 6 92% 154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178
nvtabular/io/dataframe_engine.py 12 2 4 1 81% 31->32, 32, 37
nvtabular/io/dataset.py 99 9 46 8 88% 94->95, 95, 107->108, 108, 116->117, 117, 125->137, 130->135, 135-137, 212->213, 213, 227->228, 228-229, 247->248, 248
nvtabular/io/dataset_engine.py 12 0 0 0 100%
nvtabular/io/hugectr.py 42 1 18 1 97% 64->87, 91
nvtabular/io/parquet.py 153 5 50 5 95% 100->102, 102-104, 112->114, 114, 140->141, 141, 236->238, 244->249
nvtabular/io/shuffle.py 25 2 10 2 89% 38->39, 39, 43->46, 46
nvtabular/io/writer.py 119 9 42 2 92% 29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 188 8 60 5 95% 71->72, 72, 135->136, 136, 146-147, 158, 233->235, 248->249, 249, 271->272, 272-273
nvtabular/loader/tensorflow.py 112 16 46 10 82% 39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 264->265, 265, 284->285, 285, 292->293, 293, 294->297, 297, 302->303, 303
nvtabular/loader/tf_utils.py 51 7 20 5 83% 13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48
nvtabular/loader/torch.py 33 0 4 0 100%
nvtabular/ops/init.py 20 0 0 0 100%
nvtabular/ops/categorify.py 356 54 188 37 81% 147->148, 148, 156->161, 161, 171->172, 172, 216->217, 217, 260->261, 261, 264->270, 340->341, 341-343, 345->346, 346, 347->348, 348, 366->369, 369, 380->381, 381, 387->390, 413->414, 414-415, 417->418, 418-419, 421->422, 422-438, 440->444, 444, 448->449, 449, 450->451, 451, 458->459, 459, 462->465, 465->466, 466, 469->473, 473-476, 486->487, 487, 489->492, 494->511, 511-514, 537->538, 538, 541->542, 542, 543->544, 544, 551->552, 552, 553->556, 556, 663->664, 664, 665->666, 666, 687->702, 727->732, 730->731, 731, 741->738, 746->738
nvtabular/ops/clip.py 25 3 10 4 80% 36->37, 37, 45->46, 46, 50->52, 52->53, 53
nvtabular/ops/column_similarity.py 89 21 28 4 70% 171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238
nvtabular/ops/difference_lag.py 21 1 4 1 92% 71->72, 72
nvtabular/ops/dropna.py 14 0 0 0 100%
nvtabular/ops/fill.py 36 2 10 2 91% 55->56, 56, 85->86, 86
nvtabular/ops/filter.py 17 1 2 1 89% 42->43, 43
nvtabular/ops/groupby_statistics.py 79 3 30 3 94% 145->146, 146, 150->172, 179->180, 180, 204
nvtabular/ops/hash_bucket.py 30 4 16 2 83% 31->32, 32-34, 35->38, 38
nvtabular/ops/join_external.py 66 4 26 5 90% 81->82, 82, 83->84, 84, 98->101, 101, 114->118, 155->156, 156
nvtabular/ops/join_groupby.py 56 0 18 0 100%
nvtabular/ops/lambdaop.py 24 2 8 2 88% 49->50, 50, 51->52, 52
nvtabular/ops/logop.py 17 1 4 1 90% 43->44, 44
nvtabular/ops/median.py 24 1 2 0 96% 52
nvtabular/ops/minmax.py 30 1 2 0 97% 56
nvtabular/ops/moments.py 33 1 2 0 97% 60
nvtabular/ops/normalize.py 49 4 14 4 84% 49->50, 50, 57->56, 90->91, 91, 100->102, 102-103
nvtabular/ops/operator.py 19 1 8 2 89% 43->42, 45->46, 46
nvtabular/ops/stat_operator.py 9 0 0 0 100%
nvtabular/ops/target_encoding.py 70 2 10 2 95% 150->151, 151-152, 202->205
nvtabular/ops/transform_operator.py 41 3 10 2 90% 42-46, 69->71, 88->89, 89
nvtabular/utils.py 17 3 6 3 74% 22->23, 23, 25->26, 26, 33->34, 34
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 420 38 230 24 89% 99->103, 103, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 375->378, 378, 403->404, 404, 410->413, 413, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 658->649, 724->735, 735, 758-788, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905
setup.py 2 2 0 0 0% 18-20

TOTAL 2596 218 1000 149 89%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 88.68%
================= 435 passed, 20 warnings in 463.57s (0:07:43) =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6533288000845737064.sh

@benfred benfred requested a review from jperez999 Sep 1, 2020
@benfred benfred merged commit a2c87ed into NVIDIA:main Sep 1, 2020
1 check passed
1 check passed
Jenkins Unit Test Run Success
Details
@benfred benfred deleted the benfred:update_docs branch Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.