Опрос результатов марковского моделирования - помощь и отзывы приветствуются

Я построил цепь Маркова, с помощью которой я могу моделировать повседневную жизнь людей (схемы активности). Каждый день симуляции делится на 144 временных шага, и человек может выполнять одно из четырнадцати действий. Это: В гостях - работа (1) В гостях - отдых (2) В гостях - шопинг (3) Сон (4) Готовка (5) Использование посудомоечной машины (6) Стирка (7) Уборка пылесосом (8) Просмотр телевизора (9) Использование ПК (10) Личная гигиена (11) Глажка (12) Прослушивание музыки (13) Другое (14)

Я рассчитал переходные вероятности для цепи Маркова по набору данных, содержащему в общей сложности 226 дневников людей, которые документировали свою деятельность в течение 10-минутного интервала. Набор данных можно найти здесь: https://www.dropbox.com/s/tyjvh810eg5brkr/data?dl=0

Теперь я смоделировал 4000 дней с помощью цепи Маркова и хочу проверить результаты с исходным набором данных.

Для этого я анализирую результаты для действий по отдельности и рассчитываю ожидаемое значение, а также доверительный интервал 95% для каждого временного шага. Затем я проверяю, лежит ли среднее значение активности из исходного набора данных в пределах этого интервала, и вычисляю количество всех точек, которые не лежат в пределах верхнего или нижнего порога доверительного интервала.

Однако для каждой симуляции я получаю отклонения разной высоты, иногда в диапазоне 1%, иногда более 10%.

Интересно, как это возможно и возможно ли вообще?

Я смоделировал 4x 4000 дней и получил следующие результаты:

Графические результаты первого моделирования

Результаты моделирования

Первая часть кода здесь (извините, но она слишком длинная):

# import of libraries

# First, create a 143xarray for transition probability matrix since the first state of the activity pattern will be determined by the starting probability which will be calculated later.

data= data(Dataframe vom dataset see link above)

# in case activity is not observed in time step t, return 0 instead divide by 0

def a1(x, y):
    try:
        return x / y
    except ZeroDivisionError:
        return 0

#create matrix for transition probabilities
matrix = np.empty(shape=(143, 14, 14))

#iterate through each time step and calculate probabilities for each activity (from state j to state m)

#save matrix as 3D-array
h=data
for i in range(0, 143):
    for j in range(1, 15):
        for m in range(1, 15):
            a = a1(len(h[(h.iloc[:, i] == j) & (h.iloc[:, i + 1] == m)]), len(h[h.iloc[:, i] == j]))
            matrix[i, j - 1, m - 1] = a
np.save(einp_wi_week, matrix)

# now calculate starting probabilities for time step 0 (initial start activity)

matrix_original_probability_starting_state= np.empty(shape=(14, 1))
for i in range(0, 1):
    for j in range(1, 15):
        a = a1(len(h[(h.iloc[:, i] == j)]), len(h.iloc[:, i]))
        matrix_original_probability_starting_state[j - 1, i] = a
np.save(einp_wi_week_start, matrix_original_probability_starting_state)


# import transition probabilities for markov-chain (I have several worksheets to keep track of my approach and to document the results)


# MARKOV-CHAINS TO GENERATE ACTIVITY PATTERNS


# for every time step, the markov-chain can be either in one of these states / activities
states = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"]

# Possible sequences of state changes (since transferring from state 1 to state 11, for instance has the same code as transferring from 11 to 1, all state transfers for 1 to a number higher than 9 is adjusted in the transfer code)
transitionName = [["11", "12", "13", "14", "15", "16", "17", "18", "19", "0110", "0111", "0112", "0113", "0114"], ["21", "22", "23", "24", "25", "26", "27", "28", "29", "210", "211", "212", "213", "214"], ["31", "32", "33", "34", "35", "36", "37", "38", "39", "310", "311", "312", "313", "314"], ["41", "42", "43", "44", "45", "46", "47", "48", "49", "410", "411", "412", "413", "414"], ["51", "52", "53", "54", "55", "56", "57", "58", "59", "510", "511", "512", "513", "514"], ["61", "62", "63", "64", "65", "66", "67", "68", "69", "610", "611", "612", "613", "614"], ["71", "72", "73", "74", "75", "76", "77", "78", "79", "710", "711", "712", "713", "714"], ["81", "82", "83", "84", "85", "86", "87", "88", "89", "810", "811", "812", "813", "814"], ["91", "92", "93", "94", "95", "96", "97", "98", "99", "910", "911", "912", "913", "914"], ["101", "102", "103", "104", "105", "106", "107", "108", "109", "1010", "1011", "1012", "1013", "1014"], ["111", "112", "113", "114", "115", "116", "117", "118", "119", "1110", "1111", "1112", "1113", "1114"], ["121", "122", "123", "124", "125", "126", "127", "128", "129", "1210", "1211", "1212", "1213", "1214"], ["131", "132", "133", "134", "135", "136", "137", "138", "139", "1310", "1311", "1312", "1313", "1314"], ["141", "142", "143", "144", "145", "146", "147", "148", "149", "1410", "1411", "1412", "1413", "1414"]]


def activity_forecast(simulation_days):
    # activitypattern is an empty array that stores the sequence of states in the simulation.
    activitypattern = []
    for m in range(0,simulation_days):
        # for each simulation day, the activitypattern for all 144 time steps (10-min resolution) must be generated

        # pro determines the transition probabilities matrix that is used for the simulation day and can be changed with regards to the day simulated (weekday, Friday, Saturday, Sunday). 

        # Need to be changed for the validation in order to validate activity patterns for weekdays, Fridays, Saturdays, and Sundays (see transition probability matrices above and comments later)

        pro = einp_wi_week

        #Determine starting activity. Therefore use vector with starting probabilities for all activites, generated in "Clustern_Ausgangsdaten..."
        activityToday = np.random.choice(states,replace=True,p=einp_wi_week_start[:,0])



        # create array were the sequence of states for simulation day is stored. This array will be appended to the activity pattern array

        activityList = [activityToday]

        #since the first activity state for each simulation day is determined, we have to generate activity states for the 143 following time steps
        for i in range(0,143):
            #create sequence of activities
            if activityToday == "1":
                change = np.random.choice(transitionName[0],replace=True,p=pro[i,0,:])
                if change == "11":
                    activityList.append("1")
                    pass
                elif change == "12":
                    activityToday = "2"
                    activityList.append("2")
                elif change == "13":
                    activityToday = "3"
                    activityList.append("3")
                elif change == "14":
                    activityToday = "4"
                    activityList.append("4")
                elif change == "15":
                    activityToday = "5"
                    activityList.append("5")
                elif change == "16":
                    activityToday = "6"
                    activityList.append("6")
                elif change == "17":
                    activityToday = "7"
                    activityList.append("7")
                elif change == "18":
                    activityToday = "8"
                    activityList.append("8")
                elif change == "19":
                    activityToday = "9"
                    activityList.append("9")
                elif change == "0110":
                    activityToday = "10"
                    activityList.append("10")
                elif change == "0111":
                    activityToday = "11"
                    activityList.append("11")
                elif change == "0112":
                    activityToday = "12"
                    activityList.append("12")
                elif change == "0113":
                    activityToday = "13"
                    activityList.append("13")
                else:
                    activityToday = "14"
                    activityList.append("14")
            elif activityToday == "2":
                change = np.random.choice(transitionName[1],replace=True,p=pro[i,1,:])           
                if change == "22":
                    activityList.append("2")
                    pass
                elif change == "21":
                    activityToday = "1"
                    activityList.append("1")
                elif change == "23":
                    activityToday = "3"
                    activityList.append("3")
                elif change == "24":
                    activityToday = "4"
                    activityList.append("4")
                elif change == "25":
                    activityToday = "5"
                    activityList.append("5")
                elif change == "26":
                    activityToday = "6"
                    activityList.append("6")
                elif change == "27":
                    activityToday = "7"
                    activityList.append("7")
                elif change == "28":
                    activityToday = "8"
                    activityList.append("8")
                elif change == "29":
                    activityToday = "9"
                    activityList.append("9")
                elif change == "210":
                    activityToday = "10"
                    activityList.append("10")
                elif change == "211":
                    activityToday = "11"
                    activityList.append("11")
                elif change == "212":
                    activityToday = "12"
                    activityList.append("12")
                elif change == "213":
                    activityToday = "13"
                    activityList.append("13")
                else:
                    activityToday = "14"
                    activityList.append("14")
            [code is too long - code needs to be written for each activity (1-14)]

        activitypattern.append(activityList)
    # save the files in order to use them later for comparison(and also for documentation since they can change for each simulation)
    df=pd.DataFrame(activitypattern)
    df.to_csv("Activitypatterns_synthetic_one_person_ft_aggregated_week_4000_days_new",index=False)

#call function
activity_forecast(4000)

Буду рад любым советам/отзывам.

Спасибо, Феликс


person Felix Ziegler    schedule 30.03.2019    source источник
comment
Феликс, здесь очень интересное моделирование и анализ. Можете ли вы опубликовать код, который вы написали для получения этих результатов? В частности, расчет отклонения симуляции?   -  person wbadart    schedule 30.03.2019
comment
Конечно, но код слишком длинный. Попробую разделить на несколько частей. Я надеюсь, что это хорошо :)   -  person Felix Ziegler    schedule 30.03.2019
comment
Привет @wbadart. У тебя уже было время проверить код? Что-нибудь здесь может быть не так?   -  person Felix Ziegler    schedule 04.04.2019


Ответы (1)


Вот еще код :)

# Activity patterns have been saved

# Now, use original data and calculate the probability for each time step

k=data
matrix_original_one_person_wi_week= np.empty(shape=(14, 144))
for i in range (0,144):
    for j in range (1,15):
        a=a1(len(k[(k.iloc[:,i]==j)]),len(k.iloc[:,i]))
        matrix_original_one_person_wi_week[j-1,i]=a

#create dataframe
matrix_df_activities_original_one_person_winter_week=pd.DataFrame(matrix_original_one_person_wi_week)
print(matrix_df_activities_original_one_person_winter_week)


# Activity profiles have been generated for [4000] simulation days and stored as csv file.

# loading data
activities_one_person_winter_week_4000_days=pd.read_csv("XXX", delimiter=";")

# load timesteps csv (10-min-resolution array) to create array with 10-min time steps as x-axis
timesteps = pd.read_csv("XXX", delimiter=";", header=None)
timesteps_array = timesteps[0].values.tolist()

# now transform to dataframe
df_activities_one_person_winter_week_4000_days=pd.DataFrame(activities_one_person_winter_week_4000_days)

# upper and lower threshold for the 95% confidence interval are stored within a single array.
confidence_upper = []
confidence_lower = []

# now iterate through each time step to get the mean value and the upper and lower threshold

# this is done for each activity

# for each column of the dataframe, change values of activities to 1 (activity observed) or 0 (other activity observed)

# for changing the activity for which the values should be calculated, just change the the first value within np.where (a==)

for i in range(0,144):
    a = df_activities_one_person_winter_week_4000_days.iloc[:,i]

    #look for specifc value (first number) and replace it with (1) the rest ist (0), activity starts with 1, not zero!
    count = np.where(a == 1, 1, 0)
    mean = count.mean()

    #confidence interval is 0.95
    confidence = 0.95

    # number of observations
    n = len(count)

    #standard error
    std_err = stats.sem(count)

    # calculate value for lower and upper threshold
    threshold_value = std_err * stats.t.ppf((1 + confidence) / 2, n - 1)

    # add / subtract threshold value from mean value mean
    upper_threshold = mean + threshold_value
    lower_threshold = mean - threshold_value

    #append values to array
    confidence_upper.append(upper_threshold*100)
    confidence_lower.append(lower_threshold*100)

#now we need to calculate the mean value for each activity for each time step from the original data

# just change the first value in iloc for the activity you want to have the probabilities for (its always "a" (activity number defined in np.where) - 1)

original_data_value=np.array(matrix_df_activities_original_one_person_winter_week.iloc[0,:])   




# Now count the frequency how many data points of the original distributiona are outside of the confidence interval

# first multiply value by 100 in order to compare (since other arrays are in XX% format)
#original_data_value_percentage=[]
#for i in range(0,144):
    #a=original_data_value[i]
    #original_data_value_percentage.append(a)
#original_data_value_percentage

# now create array where all values that are lower/upper of confidence interval are stored
count_frequency=[]

#iterate through each time step
for i in range(0,144):
    if (original_data_value[i]*100)<confidence_lower[i]:
        count_frequency.append(original_data_value[i]*100)
        print("lower",i, (original_data_value[i]*100),confidence_lower[i])
    elif (original_data_value[i]*100)>confidence_upper[i]:
        count_frequency.append(original_data_value[i]*100)
        print("upper", i, (original_data_value[i]*100),confidence_upper[i])
    else:
        pass

print((len(count_frequency)/144)*100)



person Felix Ziegler    schedule 30.03.2019