Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Social Networking Analytics for Calbee (SNAC) Team #6 Bill Cheng Sabina Del Rosso Stephen Hom Omede Firouz Stacy Hsueh Wei Jiang Thoranis Karnasuta CLIENT EER/SCHEMA NORMALIZATION QUERIES Professor Ken Goldberg. IEOR 115. December 9, 2011. DATABASE Client Background: Calbee San Francisco • CALBEE, Inc. is one of the largest snack companies in Japan – Company based on the premise of good health • Calbee, San Francisco is the company’s first US-based flagship store – Founded in early 2011 – Located in Westfield Mall • Active in social media – Website, Facebook, Twitter CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Image from calbeeshop.com Current Infrastructure • Currently do not keep track of social media hits on any site • Use Point of Sale for sales data and employee clock-ins CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Image from http://www.unrealstudio.com Database Objectives • Handle future expansion into e-commerce • Increase social media marketing in targeted demographics • View effect of promotions on sales and social media to help better cater future promotions • Provide a foundation to maximize profits – Logistic management using integer programming – Data mining and machine learning to predict sales CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE EER Diagram CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Relational Design Schema Relational Design Schema (46 relations) Promotion/Sales/Retail: Relations Numbered 0-9 1. PRODUCT(ProdID, Name, IsSour, IsSweet, IsSalty, IsSavory, ManufCost, RetailPrice) 2. PURCHASE(PurchaseID, ProdID1, PromoID3a, CustID6a, StoreID4a, EmpID5a, Timestamp, ipAddress) 3a. PROMOTION(PromoID, PromoCode, StoreID, StartDate, EndDate, Discount) 3b. PROMOTION_SPREAD_VIA_TWITTER(PromoID3a, TweetID10c) 3c. PROMOTION_SPREAD_VIA_F(PromoID3a, F_CID11c) 3d. PROMOTION_SPREAD_VIA_G+(PromoID3a, G_CID12c) 3e. PROMOTION_SPREAD_VIA_S(PromoID3a, S_DID13c) 3f. PROMOTION_SPREAD_VIA_B(PromoID3a, BPost_ID14a) 3g. PROMOTION_INFO_VIA_W(PromoID3a, url15) 4a. STORE(StoreID, AddressNo,StreetName, City, Country, ZipCode, PhoneNo) 4b. STORE_CARRIES(StoreID4a, ProdID1, Stock) 5a. EMPLOYEE(EmpID, LName, FName, Position, FavProdID1, StoreID4, AddressNo,StreetName, City, State, Country, ZipCode, SSN) 5b. EMPLOYEE_IS_FRIEND(EmpID5a, T_UID10a, F_UID11a, G_UID12a, S_UID13a) 5c. EMPLOYEE_IS_CUSTOMER(EmpID5a, CustID6a) 6a. CUSTOMER(CustID, LName, FName, AddressNo, StreetName, City, State, Country, ZipCode, FavProd1, BirthDate) 6b. CUSTOMER_IS_FRIEND(CustID6a, T_UID10a, F_UID11a, G_UID12a, S_UID13a) 8a. PRODUCT_AD(P_Ad_ID, ProductID1, DateBeginAd, DateEndAd, F_or_G_Ad) 8b. STORE_AD(S_Ad_ID, Store_ID, DateBeginAd, DateEndAd, F_or_G_Ad) 8c. F_P_AD_CLICKED(P_Ad, F_UID, Timestamp, ipAddress) 8d. G_P_AD_CLICKED(P_Ad_ID, G_UID, Timestamp, ipAddress) 8e. F_S_AD_CLICKED(S_Ad_ID, F_UID, Timestamp, ipAddress) 8f. G_S_AD_CLICKED(S_Ad_ID, G_UID, Timestamp, ipAddress) CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Relational Design Schema Cont. Social Media: Relations Numbered 10-19 10a. T_USER(T_UID, T_Username, Fname, Lname, City, State, BirthDate, Email) 10b. T_FOLLOWING(T_UID10a, Follower_T_UID10a, DateBeganFollowing) 10c. TWEET(TweetID, T_UID10a, Auth_T_UID10a, TextStr, Timestamp) 11a. F_USER(F_UID, Fname, Lname, City, State, BirthDate, Email) 11b. F_FRIENDS(F_UID11a, Friend_F_UID11a, DateBecameFriends) 11c. F_COMMENT(F_CID, Auth_F_UID11a, On_F_CID11c, TextStr, Timestamp) 11d. F_LIKE(F_CID11c, F_UID11a, Timestamp) 12a. G_USER(G_UID, Fname, Lname, City, State, BirthDate, Email) 12b. G_FRIENDS(G_UID12a, Friend_G_UID12a, DateBecameFriends) 12c. G_COMMENT(G_CID, Auth_G_UID12a, On_G_CID12c, TextStr, Timestamp) 12d. G_LIKE(G_CID12c, G_UID12a, Timestamp) 13a. S_USER(S_UID, Fname, Lname, City, State, BirthDate, Email) 13b. S_FOLLOWING(S_UID13a, Follower_S_UID13a, DateBeganFollowing) 13c. S_DISCOVERY(S_DID, S_UID13a, url, Timestamp) 13d. S_REVIEW(S_DID13c, S_UID13a, TextStr, Like/Dislike, Timestamp) 14a. BLOG_POST(url, BPost_ID, Author_Emp_ID5a, TextStr, Timestamp) 14b. BLOG_COMMENT(BComment_ID, url, BPost_ID14a, TextStr, Timestamp, ipAddress) 14c. ASSOCIATE_IP_T(T_UID10a, Timestamp, ipAddress) 14d. ASSOCIATE_IP_FB(F_UID11a, Timestamp, ipAddress) 14e. ASSOCIATE_IP_G(G_UID12a, Timestamp, ipAddress) 14f. ASSOCIATE_IP_S(S_UID13a, Timestamp, ipAddress) 15. MAIN_WEBSITE(url, link_to_html_file, Timestamp) CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Relational Design Schema Cont. Other Data: Relations Numbered 20-29 20a. GOOGLE_TREND(GT_ID, word, city, country, day, hits) 20b. RELATED_TREND(word, Related_Prod_ID1) CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Access Table Relationships CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Access Table Relationships Cont. CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Normalization Analysis: 1NF Removal of a multi-valued attribute (flavor): CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Normalization Analysis: 2NF Removal of a partial FD: {PromoID} {PromoCode, StoreID, StartDate, EndDate, Discount} CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Normalization Analysis: 3NF Removal of a transitive FD: {T_UID} {T_Username, Fname, Lname, City, State, BirthDate, Email} CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Normalization Analysis: BCNF Removal of a FD with a non-superkey attribute on the LHS: {PromoCode} {StartDate, EndDate, Discount} CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 1: Popular Product Stock Find out the most talked about products in a city and their quantities (stock). This will help us determine which products to move around to balance inventories in expectation of sale increases. Data can be exported to a solver to do a shipment problem. CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 1: SQL SELECT Product.ProdID, Product.ProdName, (SELECT COUNT(F_Comment.F_CID) FROM F_Comment WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*') AS Hits, Store.City, Store_Carries.StoreID AS Store, Store_Carries.Stock AS Stock FROM Product, Store, Store_Carries WHERE (((Product.ProdID)=[Store_Carries].[ProdID]) AND ((Store.StoreID)=[Store_Carries].[StoreID])) ORDER BY Product.ProdName; CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 1: Output CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 1: Data Analysis • We have a list of stores and their stock of different products – Transportation problem to encourage similar levels of stock – Minimize shipments, shipping costs, etc. • Subject to: No outliers (stores with low stock) Possible shipment constraints Possible traffic constraints Etc. CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 1: Data Analysis (AMPL) CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 1: Data Analysis CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 2: Promo Social Networking Consider a promotion. Compare product social network comments in a given city two weeks before, during, and two week after a promotion to judge its effectiveness. Order by the return on the investment. CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 2: SQL SELECT Promotion.PromoID, (SELECT COUNT(*) FROM F_Comment, Product WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*' AND Product.ProdID = Promotion.ProdID AND F_Comment.Timestamp < Promotion.StartDate AND F_Comment.Timestamp > Promotion.StartDate - 14) AS HitsBefore, (SELECT COUNT(*) FROM F_Comment, Product WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*' AND Product.ProdID = Promotion.ProdID AND F_Comment.Timestamp < Promotion.EndDate AND F_Comment.Timestamp > Promotion.StartDate) AS HitsDuring, (SELECT COUNT(*) FROM F_Comment, Product WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*' AND Product.ProdID = Promotion.ProdID AND F_Comment.Timestamp < Promotion.EndDate + 14 AND F_Comment.Timestamp > Promotion.EndDate) AS HitsAfter, (SELECT SUM(Promotion.Discount*Product.RetailPrice) FROM Product) AS PromoCost FROM Promotion ORDER BY PromoCost; CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 3: Priority Customers Use friendship data to rate friends by how many recommendations they have made. Determine how many of a person's friends became friends with us after they became friends with us. In this way, we identify possible priority customers of Calbee to target for special advertisements and promotions. CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 3: SQL SELECT F.F_UID, ( SELECT COUNT(*) FROM F_Friends AS F2 WHERE F2.F_UID = F.F_UID AND EXISTS( SELECT F3.DateBecameFriends FROM F_FRIENDS F3 WHERE F3.Friend_F_UID = 1 AND F3.F_UID = F2.Friend_F_UID AND F3.DateBecameFriends > F.DateBecameFriends)) AS friendCount FROM F_Friends AS F WHERE F.Friend_F_UID = 1; CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 3: Data Analysis CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 4: Google Trends and Stocks Determine priority stores which don’t stock products that they should, as determined by google trend word popularity. For a given google trend word, find the top 5 cities in which the word is most searched in year 2011. Then, find stores in those cities and which related products they do not stock. This will help us identify how to improve inventory. CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 4: SQL SELECT Store.StoreID AS Store, Store.City AS City, Product.ProdID AS Prod FROM Store, Store_Carries, Product WHERE Store.City IN (SELECT TOP 5 Google_Trend.City FROM Google_Trend WHERE Google_Trend.Word = 'test') AND Store_Carries.StoreID = Store.StoreID AND Store_Carries.ProdID = Product.ProdID AND Store_Carries.Stock = 0 AND Product.ProdID IN (SELECT Related_Trend.Related_Prod_ID FROM Related_Trend WHERE Related_Trend.Word = 'test'); CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 5: Social Network and Purchases Gather Social Networking, Google Trend, and Purchase data over time to formulate predictive models. For a given product, find the number of social network hits of a product, the related trend word hits, and the number of purchases in that product for a given city on a given day. In this way, we can use social network 'buzz' and trend data to predict purchases as a function of time and city. Order by product then timestamp. CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 5: SQL SELECT Product.ProdID, Product.ProdName, Purchase.Timestamp, (SELECT COUNT(F_Comment.F_CID) FROM F_Comment WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*' AND F_Comment.Timestamp = Purchase.Timestamp) AS SocialNetworkHits, (SELECT SUM(Google_Trend.hits) FROM Google_Trend WHERE Google_Trend.word = Product.ProdName AND Google_Trend.Timestamp = Purchase.Timestamp) AS TrendHits FROM Product, Purchase WHERE Purchase.ProdID = Product.ProdID ORDER BY Purchase.ProdID, Timestamp; CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 5: Output CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 5: Data Analysis CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 5: Data Analysis • Social media networking, Google Trends, and Purchases data used predictively – Group into weekly vectors – Extract significant data using Principle Component Analysis to project onto 2 dimensions. – Cluster data using K-Means See if we can predict future sales using machine learning CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Query 5: Data Analysis CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Login Interface Employees login here: CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Switchboard Allows employees to insert data in forms or run selected query CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Forms: New Employee Enter information on new Calbee employees CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE Questions? Thank you! CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE