— Scene I —
What days and times tend to be busiest at Plato's Pizzas?

Intro

Maven Analytics regularly posts sample datasets and associated challenges available for budding data analysts to practice and hone their skills. In order to further develop my own and craft an enticing portfolio I am analysing, visualising, and discussing their Pizza Place Sales dataset.

As an experienced software engineer I find working with spreadsheets more difficult and tedious than programming equivalents. However, I've decided to eat my frog on this one and answer the first question in the challenge using spreadsheets.

I'm using Google Sheets as the basic facilities are free and I can open up the project to the world.

Objective

Reading the Pizza Challenge's full description I have extracted the following quote which gives us a high level goal with clear outcomes. In my experience a clear goal or set of desired outcomes is essential for directing, constraining, and focusing any work; data analysis or otherwise. It also aids in determining when we've done enough to deliver and present:

Hoping you can analyze the data and put together a report to help us find opportunities to drive more sales and work more efficiently.

— Mario Maven (Manager, Plato's Pizza)

Commentary

It's highly beneficial to know the organisational goals too, for example Increase revenue and profit or Create a better society for pizza lovers. I will assume the former is priority.

To provide a good report and set of recommendations I would need these overarching intentions otherwise my suggestions may be counter productive or I may fail to communicate relevant information.

For example, one may be able to drive more sales  by reducing the price of each pizza but the increase in customers might not offset the price reduction. If the objective is to Make great pizza more accessible  then we're progressing towards our goal. But not so great if the objective is to Increase revenue and profit.

These high level intentions guide initiative as to which leads are likely fruitful and which are dead ends. A common mistake among leaders and managers is the failure to fully, effectively, and honestly communincate clear purpose and intention when they assign tasks.

If those actually working on the task don't precisely know why they are doing it or have little confidence it will contibute to the achieving the desired outcomes then motivation is likely to suffer and they will be restricted in their ability to innovate in the face of unexpected obstacles.

Skills

  • General spreadsheet competency (Google Sheets)
  • Creating and using pivot tables
  • Static cell and cross sheet referencing
  • Basic statistics and statistical considerations
  • Basic spreadsheet functions
    • CONCATERNATE
    • VLOOKUP
    • QUERY
    • TEXT
    • IF
    • ISOWEEKNUM
    • ...more...
  • Generating and editing visualisations
    • Line graphs
    • Bar charts
    • Pie charts

Menu

— Act I —

The Data

The data is licensed under Public Domain and was sourced from the Pizza Place Sales dataset in the Maven Analytics Data Playground. Maven Analytics references Vincent Arel-Bundock as their source. He pulled together a broad selection of *Rdatasets available at vincentarelbundock.github.io/Rdatasets.

*Vincent Arel-Bundock: Rdatasets is a collection of 1892 datasets which were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.

Entity relationship diagram of the data tables

orders

Field Description
order_id Unique identifier for each order placed by a table
date Date the order was placed (entered into the system prior to cooking & serving)
time Time the order was placed (entered into the system prior to cooking & serving)

order_details

Field Description
order_details_id Unique identifier for each pizza placed within each order (pizzas of the same type and size are kept in the same row, and the quantity increases)
order_id Foreign key that ties the details in each order to the order itself
pizza_id Foreign key that ties the pizza ordered to its details, like size and price
quantity Quantity ordered for each pizza of the same type and size

pizzas

Field Description
pizza_id Unique identifier for each pizza (constituted by its type and size)
pizza_type_id Foreign key that ties each pizza to its broader pizza type
size Size of the pizza (Small, Medium, Large, X Large, or XX Large)
price Price of the pizza in USD

pizza_types

Field Description
pizza_type_id Unique identifier for each pizza type
name Name of the pizza as shown in the menu
category Category that the pizza fall under in the menu (Classic, Chicken, Supreme, or Veggie)
ingredients Comma-delimited ingredients used in the pizza as shown in the menu (they all include Mozzarella Cheese, even if not specified; and they all include Tomato Sauce, unless another sauce is specified)

Relationships

A brief inspection reveals the orders and order_details are a normalistion of a combined table. The same abstraction seems to have been applied to pizzas and pizza_types . If I were working in SQL I'd be joining these tables regularly during analysis.

I also see further normalisation is possible if needed. For example, pizza_types.category could easily be its own dictionary table linked through primary and foreign keys. pizza_types.ingredients likewise with a new linking table to map each pizza type to a set of ingredients. However, I wouldn't recommend embarking on further normalisation until a clear need arises.

What is an 'order'?

I'd like to draw your attention to the term order . It does not mean a single pizza but a single order of pizzas placed by a table; one or more pizzas per order. The data does not include information about the tables and seating so we will have to assume each group of customers only makes a single order of pizzas per visit.

When are orders placed?

A restaurant may use one of many payment models and ordering technologies, e.g. some require payment before eating while others after. The metadata describes the date and time , within the orders table, as being entered into the system prior to cooking & serving which I missed on the first pass. This is important information. Why? Because it tells us the time spent waiting for food, eating, and after meal chats will not some calculations.

I'd prefer to gather this information directly from floor staff (the cooks, servers, and onsite managers) because the tidy processes described on paper are often not followed in practice. This might be due to laziness, lack of training, ambiguousness, poor communication, or just general uncorrected drift overtime. But often it's because neat and highly structured processes, usually put in place by management lacking localised knowledge, do not allow floor staff to simultaneously do their job, meet performance expectations, and stay sane.

If we want accurate and relevant results from our analysis then we must know the truth, no matter how painful, as to what the actual processes used are. Not what management thinks is going on or what ought to be the case.

— Act II —

Aggregate by the hour

After downloading and importing the CSV files into Google Sheets, I produced a pivot table grouping all orders over the course of the year by the hour they were placed within in the day.

After summing each group I employed a naive but good enough computation (total_orders ÷ number of days the pizzeria was open within the year) to compute each hour's mean.

Pivot table of average pizza orders for each hour in the day

A quick count of distinct days gives 358 for the year (2015). Only seven days when the restaurant was shut as 2015 was not a leap year. In total we get ~59.6 as the mean number of orders per day.

I can't move on without mentioning some obvious outliers. The very low total orders at 09:00, 10:00, and 23:00 is something I should investigate. While I expected low order volumes at the start and end of the day, I did not expect such extreme values. However, the data points seem to have negligible impact on the presented aggregation and upcoming visualisation so I've left them in.

— Act III —

Visualise the day

Visualising the previous table as a line graph we get a rough picture of a typical day at Plato's Pizzas.

Line graph of average pizza orders for each hour in the day

Choosing charts

I went with the line graph but a 15 or 30 minute interval histogram would also work well here. I used simple shaded blocks to annotate the key areas. You can imagine adjusting these to suit your definition of lunch and dinner.

Mongolian camel

What appears to be a poorly drawn Mongolian camel is actually a bimodal distribution of pizza order times over the course of a working day. The peeks appearing around 12:30 and 18:00.

I'm quite confident they represent the generalised lunch and dinner times of customers. While it's not surprising, it is useful to confirm that Plato's customers match the standard eating profile of western cutlure.

I also note the difference in time ranges for lunch and dinner. Dinner times are more spread out while only a little shy of lunch's peak. This indicates dinner is busier overall, however, I wouldn't push that conclusion without some calculations and a visualisation to better show it.

— Act IV —

The dividing line

Before we can determine if lunch or dinners are more popular we need to define when lunch time ends and dinner time begins. This is not a difficult question in the circumstances but we shouldn't be complacent. There needs to be a decision, maybe an agreement, and always a communication of definitions as preferences and assumptions vary.

I'm going to pick 14:30 as the dividing line between lunch and dinner. 14:30 is the trough of the two modes and the times align fairly well with western tradition. I dismissed the idea of an overlapping divide as it would be overly complicated and confusing for all stakeholders, particularly myself. Lesser factors can be taken into account but their impact would be negligible. For example, a worker's lunch is often a student's breakfast. We can't account for everything.

Pizza chart

Pie chart of meal type for each order, lunch or dinner Pivot table of meal type for each order, lunch or dinner

The above table and derived pie chart (pizza chart!) show the distribution of lunches and dinners over the course of 2015 using the mentioned time ranges. It's clear from the pivot table that pizza dinners are much more common than lunches but a pie chart intuitively comunicates the proportions of each.

I think pie charts are great for communicating simple proportions such as these but they get excessive aggro. While I agree they're limited as a visualisation, I disagree with those who call for their eradication. When we only have two or three proportions to compare a pie chart remains effective without clutter.

Limitations of aggregations

The analysis so far averages all orders over the whole year. While this provides an overall picture, it won't show potentially important variability within and across the weeks and months. Lets looker deeper.

— Act V —

Visualise the year

The bar chart below presents the total [adjusted] number of orders in each month. Adjustments are for their length (31 ÷ number of days in month × total orders in month) .

Normally one would calculate and plot the means but I wanted to include the total orders each month while also enabling easier judgement of month-on-month and seasonal differences.

Bar chart showing that number of orders in each month adjusted for the number of days in the month

Noiselessness

This is interesting. Totals for January through June are exceedingly consistent. I'm not sure what to make of it*. On the one hand consistency is favoured for it enables confident projections and good for tracking generalised improvements. On the other hand, no specific area shines out as high growth potential.

*As an after thought. This is probably a machine generated dataset so we should expect unrealistic levels of noiselessness in some places.

Baselines

There's some variation in the second half of the year (July through December) but only by a small amount. Is this due to exploitable factors or just noise? If we had access to data from prior years we could set a baseline and perhaps account for noise.

Baselines are essential, you can't derive much good insight without them. More than that, they are core to anyone hoping to make effective judgements. Objectively derived baselines help mitigate the noise and sometimes bias in our evaluations by providing a verified anchor for comparison.

Speculation

In the circumstance, speculating on reasons for small variations with such little knowledge of the pizza restaurant industry and the pizzeria's local culture is unlikely to be fruitful. I'd be better off continuing to answer the question and possible an explanation will appear. Failing that I'd present the current analysis to Plato's Pizza staff and ask for their input. Being on the frontline they will have valuable local knowledge not obtainable through generalised data.

My time working with user researchers and user experience professionals has taught me customer facing staff have a much better grasp of unquantifiable knowledge about customer behaviours which should never be dismissed without good reason. It is in combining generalised knowledge, that gained from collating and analysing pizza sales, and localised knowledge, the kind you get from working on the restaurant floor and interacting with people, that the most valuable insights are unearthed.

Each kind of knowledge is useful alone but in combining them we shore up the weaknesses of one with the strengths of the other. We can illuminate connections hidden from our individual myopic perspectives and often gain ground in resolving conflicts of opinion.

— Act VI —

A year of pizza

Separating the visualisation from Act III into the individual months of the year gives us the set of graphs below.

Dreary me

It's not particularly interesting. At a glance we can see the bimodal distribution is present within each month. There are noticable variations at peak dinner time but there doesn't appear to be a strong discernable deviation from the yearly aggregation.

In my opinion the level of variation shown is expected. If we assume a mean of about five orders per hour at approximately peak dinner time (~4pm to ~8pm) then that would total at ~600 data points (5 orders * 4 hours * 30 days) or ~150 data points for each dinner time hour. While 600 is enough to be confident in the circumstances, I'd still expect a perceptible amount of noise from month to month. We could try to quantify variabilities, certainties, and error margins but I really don't think it's needed.

Trade offs

Good economisation of time and effort has a big impact in software engineering world. The majority of leaders and engineers decide on such trade offs in a rather intuitive and somewhat unconscious manner. As a mere apprentice in the realm of analytics I may be asked to look into an area of interest or answer some questions, however, I must make decisions at the lower levels as to precisely what should be investigated and analysed to maximise value while minimising time and effort.

My time is a finite resource and I have limited mental energy to use in a given day so I want to minimise time spent walking down wrong paths. I really like optimising myself economically. One of the outcomes I am hoping for, in a data focused working environment, is the ability to intuitively make quick and effective decisions about which promising data leads to follow and which to skip.

Outliers or inliers

The monthly breakdown also gives some more insight into the outliers mentioned at the start of the analysis. They are spread across the months rather than clustered. This is evidence there is nothing special beyond the small size of their time category. Perhaps they are orders by staff before the resturant opens to the public or maybe early orders at these times are allowed but quite rare.

Knowing the precise opening and closing times for each day would really help a lot here. For instance, if the restaurant actually opens at 09:00 or 10:00 then we might recommend opening at 11:00 instead. The cost of staffing the earlier hours is unlikely to be outweighing the revenue drop. However, the staff may still be required to come in earlier to prepare bases and ingredients for the lunch time rush. In which case they may as well open and serve the few morning pizza enthusiasts.

So should I keep or remove them? I've no evidence that they are not part of normal business so they shall remain for now. However, I may be able to find some insight if I include the specifics of the order. Joining the orders and order_details tables is perhaps the next port of call after looking at the typical week. I will likely be switching to a more programtic approach for that.

A mystery wrapped in an enigma

If they are orders that don't have any details associated with them then it's a good argument for their removal. However, it's possible that orders were taken but the specific details failed to be recorded. For this we could view some system logs or ask some one familiar with the system and it's quirks. Being a sample dataset we have no such luxury but it might be prudent to analyse and visualise both scenarios. If there is a difference in results the best outcome is that they make no difference to the conclusions and recommendations.

I'm coming to the conclusion that being able to chase and incorporate information outside neat rows and columns is essential to good data analysis. This is where my past focus on agile and lean skills for navigating uncertainty become highly advantageous. A skilled analyst is also a good detective; combining knowledge of vastly different kinds to gainer insights and draw more accurate and useful conclusions.

— Act VII —

Visualise the month

The line graph below presents the [adjusted] average number of pizza orders in each day of the month for the entire dataset. The last few days have been adjusted for the varying length of different months. As before, this is to enable an easier visual comparison between days. The precise weightings were calculated as:

  • 1st - 28th: total orders ÷ 12 months
  • 29th & 30th: total orders ÷ 11 months (no February)
  • 31st: total orders ÷ 7 months (only those with 31 days)
Line graph showing that number of orders on each day of a typical month adjusted for the varying number of days in different months

Nothing strikes me as particularly interesting here. Most variation can probably be attributed to noise. I expected some spiking or troughing on North American national holidays; Plato's Pizza is in New Jersey. While this is true for some dates, Christmas as an example, there doesn't seem to be much of a correlation at all.

Peaks

There are three notable peaks, the 1st, 15th, and 27th. Looking closer I can't find any reason for the slight peak in orders. I can speculate or, as a good scientist, produce some hypotheses and attempt to disprove them in accordance with Karl Popper's falsifiability principle.

The 1st day of the month has some significance as the beginning of a new month and the new year; the 15th is considered the halfway point in the year; and finally, the 27th could be explained by pizza lovers receiving their end of the month paychecks and possible pizza withdrawal cravings after closures on specific dates in September, October, and December. While these aren't particularly convincing explanations they could form part of a discussion with Plato's Pizza staff about the busyness of particular days.

Troughs

Looking into the dates I see four of the seven days Plato's was closed in 2015 were in October. Specifically, the 5th, 12th, 19th, and 26th which match up to the visual troughs on the graph. Looking back at Act V we now have an explaination for why October has the lowest order volume. In September there is no data for the 24th and 25th which explains why the 25th has the greatest trough of them all (also closed on Christmas day).

— Act VIII —

Visualise the week

The bar chart below presents the average orders per day of the week across the whole year. It's adjusted to account for closed days for a better visual comparison.

Bar chart showing average orders per day of week adjusted so days the restaurant was closed do not distort results

Thank God it's Friday

It's clear that cravings for pizza pick up towards the end of the week with a Friday peak. While Saturday is slightly busier than most days it's still lower than both Thursday and Friday. I would have expected Saturday to be on par with Friday and at least beating Thursday considering people tend to be more free at weekends than weekdays.

For customers who prefer a quieter experience a Sunday is a good choice. Although, we should check the distribution of orders over the course of the day before communicating this. It's possible certain times on busier days are actually quieter than at any point on less busier days.

Oh God its Monday

The pivot table below shows the data used to produce the bar chart.

Pivot table showing average orders per day of week

The Work weeks containing day column contains the number of weeks in the year in which Plato's Pizza was open for that particular day of the week. We require this to calculate the Average orders per day of the week correctly. Dividing all Total orders by 52 would unfairly include days where the restaurant was closed.

This table also provides unintended but interesting insight into the closed days. Four of seven are on Monday, one on Thursday, and two on Friday. Noting that Christmas day in 2015 was on a Friday, two thirds of the remaining closed days are Mondays. These Mondays are all in October so this rules out bank holidays as a reason for shutting (except perhaps for Columbus Day). Unfortunately, this only makes the closures more mysterious.

— Act IX —

A week of pizza

Separating the visualisation from Act III into the individual days of the week gives the set of graphs below.

Lost lunch

Monday to Friday have the familiar bimodal distribution seen in previous orders plots. However, the difference between the two peaks is noticable greater. Lunch clearly being busier during the weekdays.

Saturday and Sunday is where things start to get interesting. Pizza lunches lose favour at the weekends while pizza dinners maintain their popularity. This also suggests the average weekend day has less orders than the average weekday. Lets see.

— Act X —

Weekdays Ⲭ Weekends

By aggregating the weekdays and weekends we get a clearer view of the lower weekend lunchtime demand. I currently see the reduced volume of weekend lunch time pizza eaters as unmet capacity with potential.

Slow weekends

Below we have confirmation that the average weekday does indeed receive more orders than the average weekend. Not by much but perhaps enough to justify a weekend lunch marketing initiative.

Bar chart showing that the average weekday pizza order count exceeds the average weekend day

Chasing reasons

We should first check the number of pizzas requested per order. It is entirely possible that the reduction in orders is the result of less seating available as bigger parties of pizza lovers congregate on their day off. The order_details table will need to be consulted on this one.

The answer might not lie within the provided dataset at all. My immediate thought is to survey lunch time customers across the whole week about why they choose Plato's Pizza on that day and why not another day, weekday or weekend. We can't just ask weekend customers why they visited on the weekend as we'd be prone to survivorship bias. Asking weekday visitors why they didn't choose to come at the weekend will help avoid such traps.

Another level

The need for more sophisticated analysis and tooling mounts. I can do SQL joins half asleep but the thought of trying to correctly combine tables with groupings in spreadsheet software does not motiviate.

— Act XI —

Summary of discovery

To summarise what we've discovered about Plato's Pizza and their customers ordering habits in 2015:

  • The restaurant was open for 358 days out of 365. Closed for 7 days.
  • With an average of 59.6 customers per day without much variation.
  • We found some possible outlying data points during the hours of 09:00, 10:00, and 23:00. They were spread out across the year with no discernable pattern or reason as to why those times contained only a few orders.
  • We discovered that order times are bimodally distrbuted. Peaks appear at lunch and dinner which is around 12:30 and 18:00 respectively.
  • Customers are twice as likely to order a pizza dinner than lunch but the variation of dinner time orders is significantly greater than lunch.
  • *The monthly total order volumes were unnervingly consistent across the whole year once the closures were accounted for, i.e. almost devoid of noise.
  • We didn't come to a conclusion as to why Plato's was closed on a majority of the days it was. However, we did discover that the restaurant was shut on every Monday in October representing two thirds of the unexplained closures.
  • We did find out that pizza orders increase towards the end of the week, peaking on Friday and dipping on Sunday.
  • Finally, we discovered the average weekday has more orders than the average weekend day. Furthermore, it was the lack of weekend lunch orders that brought the weekend order totals down. The is potential for Plato's Pizzas to drive more sales here.

*Actually, we shouldn't be that surprised as this is probably a machine generated dataset.

Back to top