Worksheets
Working inside Snowflake, is the activity of running data quality checks using Snowflake Python Worksheets.
Requirements
-
stageunder data schema in Snowflake -
cuallee.whlthe wheel distribution from PiPy index with the latest version ofcuallee -
wheel_loader.pya python script developed by the Snowflake Labs -
anacondadependencies added to your worksheet -
warehousewhere to run the python worksheet
Steps
- Make sure that you enabled and accept the Anaconda Python Packages terms and conditions under
Admin > Billing & Terms - Select a schema in your Snowflake instance and create a
stage, it does not matter if it is internal or external. Let's call itDEMO_STAGE - Proceed to the PiPy index and proceed to download the built distribution of
cuallee. At the time of this writing the file available is:cuallee-0.8.5-py3-none-any.whl - Upload your
.whlfile into theDEMO_STAGEeither via theclior through the UI - Proceed to download the
wheel_loader.pyavailable here - Upload your
wheel_loader.pyfile into theDEMO_STAGEeither via theclior through the UI - Create a new worksheet using the
+sign in Snowflake Worksheets and selectPython Worksheet - In the top right corner of your worksheet, don't forget to select the warehouse to be used to execute this worksheet
- In the top left corner of your worksheet, select the database schema that contains the
DEMO_STAGE - Next to the schema selection, and the settings drop down menu, press on the packages drop-down menu
-
2tabs will be available: Anaconda Packages and Stage Packages - In the Anaconda Packages add the following library dependencies required by
cuallee:colorama==0.4.6pandas==1.5.3pygments==2.15.1requests==2.31.0toolz==0.12.0snowflake-snowpark-python==1.11.1
- In the Stage Packages add the following library dependencies to use
cuallee:@demo_stage/cuallee-0.8.5-py3-none-any.whl@demo_stage/wheel_loader.py
- After completing the package setup for both Anaconda and Stage, the added libraries should appear under the bottom of the drop-down inside the Installed Packages
- At this point you are ready to go! below a snippet to test the use of
cualleeinside Snowflake
# cuallee # checks inside snowflake demo import snowflake.snowpark as snowpark import wheel_loader def main(session: snowpark.Session): # Your code goes here, inside the "main" handler. wheel_loader.load('cuallee-0.8.5-py3-none-any.whl') from cuallee import Check, CheckLevel, Control check = Check(CheckLevel.WARNING, "Custom", session=session) tableName = 'snowflake_sample_data.tpch_sf100.lineitem' dataframe = session.table(tableName) check.is_greater_than("L_QUANTITY", 2) check.is_legit("L_COMMENT") # Return value will appear in the Results tab. return Control.completeness(dataframe, session=session).union(check.validate(dataframe))
Notebooks
Requirements
pip install cuallee-
pip install cuallee[snowpark]orpip install snowflake-snowpark-python - Set environment variables to start a session
-
SF_ACCOUNTobtained by clicking into the bottom left part of your snowflake account and selectingCopy account url - Then remove the
https://part and also thesnowflakecomputing.compart of the URL - It should end up in something like this:
SF_ACCOUNT=1234567.region-name.cloud -
SF_USERyour snowflake username -
SF_PASSWORDyour snowflake password -
SF_ROLEyour snowflake role i.e.ACCOUNTADMIN -
SF_WAREHOUSEyour designated warehouse for running data quality checks i.e.COMPUTE_WH -
SF_DATABASEyour database selection for running checks i.e.SNOWFLAKE_SAMPLE_DATA
-