The integration of Spark and Delta Lake tables is seamless and smooth for the most part. Ran into some issues with unit-tests concerning the creation and update of the tables when run in conjunction with all existing unit-tests:
@classmethod
@since(0.4)
def isDeltaTable(cls, sparkSession, identifier):
"""
Check if the provided `identifier` string, in this case a file path,
is the root of a Delta table using the given SparkSession.
:param sparkSession: SparkSession to use to perform the check
:param path: location of the table
:return: If the table is a delta table or not
:rtype: bool
Example::
DeltaTable.isDeltaTable(spark, "/path/to/table")
"""
assert sparkSession is not None
> return sparkSession._sc._jvm.io.delta.tables.DeltaTable.isDeltaTable(
sparkSession._jsparkSession, identifier)
E TypeError: 'JavaPackage' object is not callable
../../../anaconda3/envs/rfa/lib/python3.8/site-packages/delta/tables.py:433: TypeError
The call to isDeltaTable
is blowing up.
The spark session is global to the entire process running. In this case started by pytest
. Below is the relevant Spark Delta Lake table configuration:
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.databricks.delta.schema.autoMerge.enabled", "true") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
Side Note:
Add the following or replacing spark.sql.catalog.spark_catalog
with below yields varying results.
.config("spark.sql.catalog.local", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
Workaround
The current workaround is to run the Delta Lake specific unit-tests with a separate pytest call.
pytest . --ignore=path\to\test\delta_lake_tests.py
pytest path\to\test\delta_lake_tests.py
It should be noted that adding pytest custom markers to categorize and running those tests by marker (-m
) will also fail even though only the selected tests are run. The collection of the tests seems to "pollute" the Spark session.
@pytest.mark.delta_table
def test__write(self):
...
pytest . -m delta_table