Big Data · Data Engineering · programming · Uncategorized

Spark cheatsheet

Mount S3 bucket def mountBucket(accesskey, secretkey, bucketName, mountFolder): ACCESS_KEY_ID = accesskey SECRET_ACCESS_KEY = secretkey print (“Mounting”, bucketName) try: # Unmount the data in case it was already mounted. dbutils.fs.unmount(mountFolder) except: # If it fails to unmount it most likely wasn’t mounted in the first place print (“Directory not unmounted: “, mountFolder ) finally: # Lastly,… Continue reading Spark cheatsheet

Big Data · Data Engineering · industry

Data in Real Scenarios

Log Management: logs are events with timestamps. Multi-cloud for regulatory reasons, and can handle the fail over of a single cloud. Elastic Search: full text searching, distributed? implicit index URBN: fashion product attribution. planning & forecasting; search correlation (Fashion MNIST data set) Linkedin: scaled of machine learning using graph database (neo4j), second degree. predicting future… Continue reading Data in Real Scenarios