Power And Dollar

What is the Culminated Infected in USA on 3/12?

Globalization has brought us advantages as wells disadvantages, although we have seldom considered the disadvantages other than higher liquidity of jobs (i.e. moving job opportunities to the other side of the globe).  Epidemic is now realized as another disadvantage as global travel becomes more common.  Not long ago, Coronavirus was still a distant harm.  It is now a clear and present danger to anyone regardless of geography.

In the earlier articles, we predicted the fatality rate among the concluded patients as well as the proportion of patients remain in treatment.  Can we predict the culminated infected in the near future?  How far ahead can we predicted?

The earlier articles used two methods for prediction:

  • Second order difference moving average
  • Exponential function

In this article, we realize that second order difference moving average method is sensitive to input data.  In this case, the data are too unstable to enable this method to give us good results. From the below graph (Figure 1), one can easily conclude that exponential function is a good choice for prediction.  Therefore, we will focus on the predictions by exponential function.

CV_USA_1

Most prefer to have as many data records as possible.  The general idea is that if we have more data, we will generate better prediction.  We will check if this idea works to our favor in this article.

However, having too much data also has a trade-off: the model become insensitive to the more recent data.  Therefore, in this evolving coronavirus scenario, it becomes even more important that we have to identify the optimal days of records we want to use to generate the statistical results based on how many days ahead we want to predict.

We use three days of data to predict the next few days’ culminated infected in the States.  We then use four days for predictions, five days, up to fourteen days to see the error rates.  Therefore, if we want to make predictions for March 7th, we collect the data from March 4th to March 6th (3 days) to make prediction for March 7th, also March 3th to March 6th (4 days) to make prediction for March 7th, March 2nd to March 6th (5 days) to make prediction for March 7th, etc.  Thus, there are 12 experiments where the first prediction date is March 7th.  Just in case March 7th happens to be more favorable to the data set, we also try another 12 experiments where the first prediction date is March 6th, ranging from 3 days in a sample to 14 days in a sample.  We try six different first prediction dates.  In other words, we have 6 6 experiments of 3 days sample, 6 experiments of 4 days sample, etc.  Therefore, we have a total of 72 experiments to test what makes better.

These experiments then make predictions to new data which are not part of the data samples.  Errors are then measured to examine their accuracies.  The last day of data used for these experiments is March 6th.  The last day of data to be used to make predictions for March 12th is March 11th.

The way to measure if one is better than another is by comparing the difference between the maximum and minimum prediction’s error percentages (MaxMin) within the sample against another.

In the below Figure 2, we observe four different lines where each one measures the MaxMin.  A MaxMin line connects the averages of the MaxMin among those 6 experiments having the same quantity of days in the sample across the varying from 3 days to 14 days in the sample.

So, we find that for Day 1 predictions, having only the most recent data (namely 3 and 4) does not give us better predictions since those 3 day samples give us a MaxMin that above 20%.  Five days samples give us better results.  When we increase the sample from 6, we get high MaxMin and eventually peak at 10.  MaxMin then decreases as we get more data records in the sample.

In fact, samples of 5 days of data give us lower MaxMin for Day 2 predictions, Day 3 and Day 4 too.  However, having 5 days of data maybe too aggressive.  Therefore, we opt to take 11 days and 12 days samples to make our predictions.

They give us MaxMin much lower than the peaks, the data are not as long as 14 days and not as aggressive as 5 days.

CV_USA_2

Exponential function typically over predicts because the more recent data greater in magnitude and minimizing the errors means bigger numbers in the data set have a larger influence, be it exponential increase or exponential decay.  Therefore, we use the predictions as upper bounds.  We then use the average MaxMin as the lower bounds.  As culminated infected increases, the error produced by exponential function increases at a greater speed.  Therefore, we are interested at looking for a lower bound as well.

The predictions from 3 days sample to 14 days sample are presented in Figure 3.

CV_USA_3

The lowest upper bound is produced 983 by the 14 day sample.  The largest upper bound is 2451 by the 4 day sample.  The lowest lower bound is 639 by the 14 day sample whereas the largest lower bound is 2069 by the 4 day sample.  The 11 day and 12 day samples produce 1011 and 1273 lower bounds.

Human intervention will affect the culminated infected predictions, such as quarantine.  Quarantine may be in different forms other than the kind practiced in China where people are mandated to remain at home to be enforced by police.  NY has called on National Guards at New Rochelle.  Companies have mandated employees to work from home.  Some schools are closed in New York which effectively make parents stay at home as well.  These are not quarantine per se but are definitely human intervention which affect the culminated infected.

 

March 11, 2020 - Posted by | Current Events

No comments yet.

Leave a comment