Я построил цепь Маркова, с помощью которой я могу моделировать повседневную жизнь людей (схемы активности). Каждый день симуляции делится на 144 временных шага, и человек может выполнять одно из четырнадцати действий. Это: В гостях - работа (1) В гостях - отдых (2) В гостях - шопинг (3) Сон (4) Готовка (5) Использование посудомоечной машины (6) Стирка (7) Уборка пылесосом (8) Просмотр телевизора (9) Использование ПК (10) Личная гигиена (11) Глажка (12) Прослушивание музыки (13) Другое (14)
Я рассчитал переходные вероятности для цепи Маркова по набору данных, содержащему в общей сложности 226 дневников людей, которые документировали свою деятельность в течение 10-минутного интервала. Набор данных можно найти здесь: https://www.dropbox.com/s/tyjvh810eg5brkr/data?dl=0
Теперь я смоделировал 4000 дней с помощью цепи Маркова и хочу проверить результаты с исходным набором данных.
Для этого я анализирую результаты для действий по отдельности и рассчитываю ожидаемое значение, а также доверительный интервал 95% для каждого временного шага. Затем я проверяю, лежит ли среднее значение активности из исходного набора данных в пределах этого интервала, и вычисляю количество всех точек, которые не лежат в пределах верхнего или нижнего порога доверительного интервала.
Однако для каждой симуляции я получаю отклонения разной высоты, иногда в диапазоне 1%, иногда более 10%.
Интересно, как это возможно и возможно ли вообще?
Я смоделировал 4x 4000 дней и получил следующие результаты:
Первая часть кода здесь (извините, но она слишком длинная):
# import of libraries
# First, create a 143xarray for transition probability matrix since the first state of the activity pattern will be determined by the starting probability which will be calculated later.
data= data(Dataframe vom dataset see link above)
# in case activity is not observed in time step t, return 0 instead divide by 0
def a1(x, y):
try:
return x / y
except ZeroDivisionError:
return 0
#create matrix for transition probabilities
matrix = np.empty(shape=(143, 14, 14))
#iterate through each time step and calculate probabilities for each activity (from state j to state m)
#save matrix as 3D-array
h=data
for i in range(0, 143):
for j in range(1, 15):
for m in range(1, 15):
a = a1(len(h[(h.iloc[:, i] == j) & (h.iloc[:, i + 1] == m)]), len(h[h.iloc[:, i] == j]))
matrix[i, j - 1, m - 1] = a
np.save(einp_wi_week, matrix)
# now calculate starting probabilities for time step 0 (initial start activity)
matrix_original_probability_starting_state= np.empty(shape=(14, 1))
for i in range(0, 1):
for j in range(1, 15):
a = a1(len(h[(h.iloc[:, i] == j)]), len(h.iloc[:, i]))
matrix_original_probability_starting_state[j - 1, i] = a
np.save(einp_wi_week_start, matrix_original_probability_starting_state)
# import transition probabilities for markov-chain (I have several worksheets to keep track of my approach and to document the results)
# MARKOV-CHAINS TO GENERATE ACTIVITY PATTERNS
# for every time step, the markov-chain can be either in one of these states / activities
states = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"]
# Possible sequences of state changes (since transferring from state 1 to state 11, for instance has the same code as transferring from 11 to 1, all state transfers for 1 to a number higher than 9 is adjusted in the transfer code)
transitionName = [["11", "12", "13", "14", "15", "16", "17", "18", "19", "0110", "0111", "0112", "0113", "0114"], ["21", "22", "23", "24", "25", "26", "27", "28", "29", "210", "211", "212", "213", "214"], ["31", "32", "33", "34", "35", "36", "37", "38", "39", "310", "311", "312", "313", "314"], ["41", "42", "43", "44", "45", "46", "47", "48", "49", "410", "411", "412", "413", "414"], ["51", "52", "53", "54", "55", "56", "57", "58", "59", "510", "511", "512", "513", "514"], ["61", "62", "63", "64", "65", "66", "67", "68", "69", "610", "611", "612", "613", "614"], ["71", "72", "73", "74", "75", "76", "77", "78", "79", "710", "711", "712", "713", "714"], ["81", "82", "83", "84", "85", "86", "87", "88", "89", "810", "811", "812", "813", "814"], ["91", "92", "93", "94", "95", "96", "97", "98", "99", "910", "911", "912", "913", "914"], ["101", "102", "103", "104", "105", "106", "107", "108", "109", "1010", "1011", "1012", "1013", "1014"], ["111", "112", "113", "114", "115", "116", "117", "118", "119", "1110", "1111", "1112", "1113", "1114"], ["121", "122", "123", "124", "125", "126", "127", "128", "129", "1210", "1211", "1212", "1213", "1214"], ["131", "132", "133", "134", "135", "136", "137", "138", "139", "1310", "1311", "1312", "1313", "1314"], ["141", "142", "143", "144", "145", "146", "147", "148", "149", "1410", "1411", "1412", "1413", "1414"]]
def activity_forecast(simulation_days):
# activitypattern is an empty array that stores the sequence of states in the simulation.
activitypattern = []
for m in range(0,simulation_days):
# for each simulation day, the activitypattern for all 144 time steps (10-min resolution) must be generated
# pro determines the transition probabilities matrix that is used for the simulation day and can be changed with regards to the day simulated (weekday, Friday, Saturday, Sunday).
# Need to be changed for the validation in order to validate activity patterns for weekdays, Fridays, Saturdays, and Sundays (see transition probability matrices above and comments later)
pro = einp_wi_week
#Determine starting activity. Therefore use vector with starting probabilities for all activites, generated in "Clustern_Ausgangsdaten..."
activityToday = np.random.choice(states,replace=True,p=einp_wi_week_start[:,0])
# create array were the sequence of states for simulation day is stored. This array will be appended to the activity pattern array
activityList = [activityToday]
#since the first activity state for each simulation day is determined, we have to generate activity states for the 143 following time steps
for i in range(0,143):
#create sequence of activities
if activityToday == "1":
change = np.random.choice(transitionName[0],replace=True,p=pro[i,0,:])
if change == "11":
activityList.append("1")
pass
elif change == "12":
activityToday = "2"
activityList.append("2")
elif change == "13":
activityToday = "3"
activityList.append("3")
elif change == "14":
activityToday = "4"
activityList.append("4")
elif change == "15":
activityToday = "5"
activityList.append("5")
elif change == "16":
activityToday = "6"
activityList.append("6")
elif change == "17":
activityToday = "7"
activityList.append("7")
elif change == "18":
activityToday = "8"
activityList.append("8")
elif change == "19":
activityToday = "9"
activityList.append("9")
elif change == "0110":
activityToday = "10"
activityList.append("10")
elif change == "0111":
activityToday = "11"
activityList.append("11")
elif change == "0112":
activityToday = "12"
activityList.append("12")
elif change == "0113":
activityToday = "13"
activityList.append("13")
else:
activityToday = "14"
activityList.append("14")
elif activityToday == "2":
change = np.random.choice(transitionName[1],replace=True,p=pro[i,1,:])
if change == "22":
activityList.append("2")
pass
elif change == "21":
activityToday = "1"
activityList.append("1")
elif change == "23":
activityToday = "3"
activityList.append("3")
elif change == "24":
activityToday = "4"
activityList.append("4")
elif change == "25":
activityToday = "5"
activityList.append("5")
elif change == "26":
activityToday = "6"
activityList.append("6")
elif change == "27":
activityToday = "7"
activityList.append("7")
elif change == "28":
activityToday = "8"
activityList.append("8")
elif change == "29":
activityToday = "9"
activityList.append("9")
elif change == "210":
activityToday = "10"
activityList.append("10")
elif change == "211":
activityToday = "11"
activityList.append("11")
elif change == "212":
activityToday = "12"
activityList.append("12")
elif change == "213":
activityToday = "13"
activityList.append("13")
else:
activityToday = "14"
activityList.append("14")
[code is too long - code needs to be written for each activity (1-14)]
activitypattern.append(activityList)
# save the files in order to use them later for comparison(and also for documentation since they can change for each simulation)
df=pd.DataFrame(activitypattern)
df.to_csv("Activitypatterns_synthetic_one_person_ft_aggregated_week_4000_days_new",index=False)
#call function
activity_forecast(4000)
Буду рад любым советам/отзывам.
Спасибо, Феликс