Monte Carlo Simulation with Python: building a histogram on the fly
I have a conceptual question on building a histogram on the fly with
Python. I am trying to figure out if there is a good algorithm or maybe an
existing package.
I wrote a function, which runs a Monte Carlo simulation, gets called
1,000,000,000 times, and returns a 64 bit floating number at the end of
each run. Below is the said function:
def MonteCarlo(df,head,span):
# Pick initial truck
rnd_truck = np.random.randint(0,len(df))
full_length = df['length'][rnd_truck]
full_weight = df['gvw'][rnd_truck]
# Loop using other random trucks until the bridge is full
while True:
rnd_truck = np.random.randint(0,len(df))
full_length += head + df['length'][rnd_truck]
if full_length > span:
break
else:
full_weight += df['gvw'][rnd_truck]
# Return average weight per feet on the bridge
return(full_weight/span)
df is a Pandas dataframe object having columns labeled as 'length' and
'gvw', which are truck lengths and weights, respectively. head is the
distance between two consecutive trucks, span is the bridge length. The
function randomly places trucks on the bridge as long as the total length
of the truck train is less than the bridge length. Finally, calculates the
average weight of the trucks existing on the bridge per foot (total weight
existing on the bridge divided by the bridge length).
As a result I would like to build a tabular histogram showing the
distribution of the returned values, which can be plotted later. I had
some ideas in mind:
Keep collecting the returned values in a numpy vector, then use existing
histogram functions once the MonteCarlo analysis is completed. This would
not be feasable, since if my calculation is correct, I would need 7.5 GB
of memory for that vector only (1,000,000,000 64 bit floats ~ 7.5 GB)
Initialize a numpy array with a given range and number of bins. Increase
the number of items in the matching bin by one at the end of each run. The
problem is, I do not know the range of values I would get. Setting up a
histogram with a range and an appropriate bin size is an unknown. I also
have to figure out how to assign values to the correct bins, but I think
it is doable.
Do it somehow on the fly. Modify ranges and bin sizes each time the
function returns a number. This would be too tricky to write from scratch
I think.
Well, I bet there may be a better way to handle this problem. Any ideas
would be welcome!
On a second note, I tested running the above function for 1,000,000,000
times only to get the largest value that is computed (the code snippet is
below). And this takes around an hour when span = 200. The computation
time would increase if I run it for longer spans (the while loop runs
longer to fill the bridge with trucks). Is there a way to optimize this
you think?
max_w = 0
i = 1
while i < 1000000000:
if max_w < MonteCarlo(df_basic, 15., 200.):
max_w = MonteCarlo(df_basic, 15., 200.)
i += 1
print max_w
Thanks!
No comments:
Post a Comment