8

Refer to the post here: Spark structured streaming with python I would like to import 'col' in python 3.5

from pyspark.sql.functions import col

However I got an error saying unresolved reference to col. I've installed pyspark library, so just wondering has the 'col' been removed from pyspark library? And how can I import the 'col' then.

1

4 Answers 4

14

Try installing 'pyspark-stubs', I had the same problem in PyCharm and by doing it I resolved it.

Sign up to request clarification or add additional context in comments.

2 Comments

conda install -c conda-forge pyspark-stubs worked for me
Thanks, it helped me to get rid of this annoying issue.
5

It turns out to be IntelliJ IDEA's problem. Even though it shows unresolved reference, my program still runs without any problem in the command line.

2 Comments

This answer explains this behaviour very well.
The reason is correct, but it doesn't provide the actual solution
2

The function like col is not an explicit function defined in python code, but rather dynamically generated.

It will also report an error by static analysis tool like pylint

So the easiest way to use it shall be something like this

from pyspark.sql import functions as F

F.col("colname")

The following code in python/pyspark/sql/functions.py

_functions = {
    'lit': _lit_doc,
    'col': 'Returns a :class:`Column` based on the given column name.',
    'column': 'Returns a :class:`Column` based on the given column name.',
    'asc': 'Returns a sort expression based on the ascending order of the given column name.',
    'desc': 'Returns a sort expression based on the descending order of the given column name.',

    'upper': 'Converts a string expression to upper case.',
    'lower': 'Converts a string expression to upper case.',
    'sqrt': 'Computes the square root of the specified float value.',
    'abs': 'Computes the absolute value.',

    'max': 'Aggregate function: returns the maximum value of the expression in a group.',
    'min': 'Aggregate function: returns the minimum value of the expression in a group.',
    'count': 'Aggregate function: returns the number of items in a group.',
    'sum': 'Aggregate function: returns the sum of all values in the expression.',
    'avg': 'Aggregate function: returns the average of the values in a group.',
    'mean': 'Aggregate function: returns the average of the values in a group.',
    'sumDistinct': 'Aggregate function: returns the sum of distinct values in the expression.',
}

def _create_function(name, doc=""):
    """ Create a function for aggregator by name"""
    def _(col):
        sc = SparkContext._active_spark_context
        jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
        return Column(jc)
    _.__name__ = name
    _.__doc__ = doc
    return _

for _name, _doc in _functions.items():
    globals()[_name] = since(1.3)(_create_function(_name, _doc))

Comments

-2

It seems to be issue with PyCharm editor, me too able to run program with trim() through Python console.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.