GOJEK: A MASTER CASE STUDY
YOU WON'T AFFORD TO NOT READ THIS !!!!
Gojek is an Indonesian firm that offers a wide range of services through its mobile app, contributing over $7 billion to the economy with 900K registered merchants, 190M app downloads, and 2M drivers fulfilling 180K orders in 120 minutes. The company provides transport and logistics services like Go-ride, Go-car, Go-send, Go-box, and Go-transit. For food and shopping, there are options like Go-mall, Go-mart, and Go-med. Payments can be made using Go-pay, Go-bills, Paylater, Go-pulsa, and more. Daily needs are met through GoFitness, while businesses can use Go-biz for growth. News and entertainment services include Go-tix, Go-play, Go-games, and Go-news.
The Central Analytics and Science Team (CAST) at Gojek works on analyzing data generated through these services, providing insights and solutions for business problems. The team includes analysts, data scientists, and engineers who develop deep analytics solutions and machine learning systems. They focus on solving day-to-day business problems, conducting root cause analysis, and informing top management about key metrics and product decisions.
Through working in CAST, learning objectives include conducting root cause analysis on growth drivers and challenges faced by organizations using tools like Pandas for data analysis. Additionally, optimizing marketing budgets based on profit as the main metric is emphasized. This approach helps address business problems effectively and drive sustainable growth for the organization.
The Data Science Blogathon article discusses using the Pulp solver to solve LP problems with clear instructions. It also involves a simple regression exercise, part of which includes predicting total_cbv using multiple linear regression and cross-validation. The main problem discussed is maximizing revenue for GOJEK in Q2 2016. The article provides steps to create models for each service, with a focus on pre-processing steps like removing GO-TIX, keeping only cancelled orders, and imputing missing values.
The evaluation metric used is MAPE Validation with a 3-fold scheme, and questions are posed regarding the values of predictors for specific services and dates, one-hot encoded variables, pre-processed data for a particular service, and forecast-period MAPE for each service. The article also instructs to create graphs showing the performance of each validation fold.
Overall, the article presents a comprehensive analysis of the data to understand what happened in Q1 2016 and how to maximize revenue in Q2 2016 for GOJEK. It covers problem statements, solutions, and detailed steps for data analysis using linear regression, cross-validation, and pre-processing techniques. The focus is on utilizing data science methodologies to provide insights and recommendations for management decisions.
The GO-FOOD service in Surabaya had a 20% increase in completed orders last month compared to the previous month. The manager needs to analyze the sudden growth to ensure continued success in the future. To evaluate this growth quantitatively, methods such as regression analysis can be used. Understanding customer behavior is crucial, and datasets such as Table A, Table B, and others provide valuable information for analysis. Researching blogs and whitepapers on the company’s website can offer insights into the company’s expectations.
Root Cause Analysis (RCA) involves importing and analyzing sales data to identify growth drivers and challenges faced by the organization. Data is imported using Pandas and Numpy libraries, and key insights are derived, such as converting date formats, filtering data for specific months, and renaming months for clarity. By understanding the trends and patterns in the data, the manager can make informed decisions to sustain the growth of the GO-FOOD service in Surabaya.
In the first quarter of 2016, overall revenue at the group level increased by 14%, showing positive growth. The revenue breakdown by various services revealed that Ride, Food, Shop, and Send collectively contribute to over 90% of net revenue each month. However, only Ride, Food, and Send are essential in contributing to the revenue, highlighting potential growth opportunities for other services. The analysis focused on the top 3 services based on the 80:20 rule, emphasizing Ride and Send’s growth compared to a decline in Food revenue. The completed rides data showed a 19% growth for Ride and 25% for Send, while Food experienced a 7% decline. These findings indicate areas of success and concern within the services, emphasizing the need for strategic planning to maximize revenue and growth potential.
The analysis of canceled rides in Q1 2016 showed a 6% increase in lost revenue, prompting a recommendation to reduce this to less than 5%. The examination of orders revealed that in March, Food, Ride, and Send had 17%, 15%, and 13% of total orders canceled respectively. Notably, Food improved its order completion rate from 69% in January to 83% in March, indicating significant progress.
Further investigation indicated that in March, Ride had the highest share of canceled orders at 72%, followed by Food (17%) and Send (6%). Recommendations for optimizing budget spends in Q2 include focusing on reducing cancelations in Ride through product interventions and new features, increasing net revenue in Food by reducing costs and cancelations, and addressing concerns in Send to improve revenue growth.
The Business team has allocated a budget of 40 billion for Q2, with specific growth targets for each service. For Go-Box, it costs 40M to acquire 100 more bookings, with a maximum growth target of 7% in Q2. By optimizing budget spends and implementing targeted strategies, the company can maximize profits and drive growth across its services.
The budget data is imported and analyzed for null values, shape, and information. Sales data is transformed to include a new column for the month and filtered for the first quarter of the year and completed orders. The pivot table is created to analyze the revenue and number of orders for each service in the first quarter. The sales data is merged with the budget data to create an optimization DataFrame. It shows the revenue, cost per 100 incremental bookings, maximum growth rate, and completed orders for each service.
To achieve the maximum growth rate for all services with a budget of 40B, an additional 247B is required. This means that the growth targets for all services cannot be achieved with the given budget. However, to achieve at least 10% of the maximum growth rate for all services within the budget, the total budget remaining will be displayed, which is significantly higher. The percentage of budget utilization will also be shown. The DataFrame includes columns for the percentage of achieving 10% of maximum growth rate, the maximum growth rate to be achieved, and the Q1 average order value for each service.
The data is optimized for Linear Optimization using the pulp library in Python. The process involves preparing the data for simulation and optimization. The data includes information on services offered, cost per 100 incremental bookings, Q1 order completion, minimum and maximum growth percentages required and allowed, profit per order, and growth percentages for simulation. The optimization process calculates the total new orders for Q2 based on growth, incremental orders, profits in Q2, costs for Q2, and overall profitability. The objective is to maximize profits while ensuring that the cash burn does not exceed 40 billion. The data is filtered and processed to meet the constraints and objectives of the optimization process. The pre-optimization data pipeline involves creating a simulation for optimization by generating all percentage growth values and merging them with the optimization data. The process aims to achieve the maximum growth rate across all services within the budget limitations and business constraints. The optimization process ensures that the budget is utilized effectively and the growth rates are optimized to maximize profits.
The optimization dataset involves determining the best growth rate for a product called Box, with constraints in place. The minimum growth required for Box is 0.7%, with a maximum set at 7%. The optimizer selects the optimal growth rate between 1% and 7% based on these constraints. Achieving 1% growth results in a cash burn of 255M and an overall profit of 2.4B. The objective is to maximize profit in this scenario.
The optimization problem is initialized as a maximization problem. Various variables are defined, such as different growth percentages, costs, and decision variables for optimization. The objective function aims to maximize profit in the dataset. Constraints are set to ensure that each service has a unique growth percentage and that the total budget is not exceeded. The optimization problem is then solved, and the optimal solution is displayed.
In summary, the dataset aims to optimize the growth rate for the Box product while maximizing profit within specified constraints, resulting in an optimal solution.
The objective of the problem is to maximize profit with an available budget of 98731060158.842 at a 10% profit per order. To solve this linear programming problem, the first step is to initialize the problem as a maximization problem using the LpMaximize function. Next, variables for the decision function are created using LpVariable.dicts with a continuous variable type. There are 60 unique growth percentages considered, ranging from 1% to 60%. The objective function is added to the problem, calculating the profit based on the growth percentages. Constraints are then added, ensuring that only one growth percentage is selected for each service and that the total cost does not exceed the budget. Overall, understanding how to write an LP problem is crucial in solving it effectively and achieving the objective of maximizing profit within the given constraints.
The equation _C12 represents the total budget constraint of 40 billion across 279 different services. There is no restriction on individual service spending, with the only constraint being this overall budget limit. Each service must meet a minimum growth percentage requirement as defined in the optimization dataframe, creating 279 separate constraints. The optimization process identifies the optimal solution, resulting in a profit of 98,731,060,158.842. The analysis then categorizes services based on their maximum growth rates, with corresponding percentages listed for each service category. The total cash burn amounts to 39,999,532,404.0, with an underutilized budget of 467,596.0 and a maximized profit of 98,731,060,158.0. The subsequent solution entails importing and processing a dataset, converting dates to pandas datetime, deriving month and training/testing columns, and analyzing order statuses and service categories. Further data manipulation involves merging datasets, handling null values, and adding a day of the week column for deeper analysis.
In Q1 2016 data frame 3, the date column was used to derive the day of the month and whether it is a weekend or not. The data frame was filtered for only canceled orders, and for all services, cross-joined with dates from January 1 to April 1 to have predictions for all days. NULL values were replaced with 0, and additional columns for the day of the week and a binary weekend/weekday column were created. The column for GO-TIX service was filtered out, and one-hot encoding was used for the Month and day of the week columns.
Necessary libraries were imported, and a list of columns, training, predictor, and target variables were defined. Model 1 focused on the GO-FOOD service, where separate train and test data frames were created. The data was standardized using StandardScaler. A custom function for NMAPE (Normalized Mean Absolute Percentage Error) was defined, converted into a scorer, and used in a KFold cross-validation procedure. A Linear Regression model was created and evaluated using the NMAPE scorer, and the performance was reported with the mean and standard deviation of the scores.
Successful case studies can positively impact businesses by showcasing analytical skills, reasoning, and practical knowledge. Recruiters focus on the approach taken, structure, and business analytics expertise rather than just answers. This article offers a clear framework for data analysts through a real business case study example. Key points include utilizing a bottom-up approach when unfamiliar with data, analyzing sales numbers to identify growth challenges, providing concise recommendations, and letting data convey a story rather than just numbers. For instance, top services contribute most revenue, but issues like ride completion and cancellations need addressing. Optimization using pulp can simplify complex problems by initially breaking them down on paper before coding solutions. Overall, case studies can demonstrate valuable analytical skills and decision-making abilities to benefit businesses.