Big Data · Data Engineering · industry

Data in Real Scenarios

Log Management: logs are events with timestamps. Multi-cloud for regulatory reasons, and can handle the fail over of a single cloud. Elastic Search: full text searching, distributed? implicit index URBN: fashion product attribution. planning & forecasting; search correlation (Fashion MNIST data set) Linkedin: scaled of machine learning using graph database (neo4j), second degree. predicting future… Continue reading Data in Real Scenarios

Big Data · data science · machine learning · programming

Apache Hadoop (projects)

QUESTIONS setInputFormat comparator top k frequent words HADOOP SYSTEM Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. HDFS(Hadoop distributed file system): data storage (data split and data replication) Map Reduce(data processing): how to leverage job; how do nodes communicate; how to deal with node… Continue reading Apache Hadoop (projects)

Academic · Big Data · machine learning · Networking

Academic Activities and Meetups

By attending academic activities,  we can get access to the most cutting-edge research topics and techniques. Even though we cannot accomplish the cool works presented by those big names, it is still worth our time to understand the basic ideas of the research community, which may help us to keep the curiosity about the area.… Continue reading Academic Activities and Meetups