#pandasprofiling #python #pandas #dataquality #azuredatabricks #azuredatafactory #azuredataengineer #databricks #dataanalysis
In this session we discussed on how to perform data profiling using ydata-profiling library. For Demo purpose , we have used Jupyter, you can also apply this on your databricks and data stored in your Azure Storage Location
Link for ydata-profiling page : https://pypi.org/project/ydata-profil...
Link for csv data set : https://www.kaggle.com/datasets/matto...
Sample Code :
pip install ydata-profiling
import pandas as pd
df1 = pd.read_csv(r"D:\Data_Quality\Selected_Online_Sport_Wagering_Data.csv")
from ydata_profiling import ProfileReport
from ydata_profiling.utils.cache import cache_file
report=ProfileReport(df1,title="Quality_Test", explorative=True)
report.to_file("D:\Data_Quality\Data_results.html")
#dataprofiling
#dataengineeringessentials
#dataengineering
#dataengineer
#pandas #pyspark
#KnowledgeShare
#ydata-quality
#dataquality
#python
#automateddataprofiling
コメント