Планирование jffs2 при атомарной ошибке в ядре 2.6

Кто-нибудь видел ошибку «планирования в то время как атомарно» с jffs2 fs? Похоже, он находится в потоке ядра gcd. Каково точное исправление? Это на ядре 2.6.39. Если вытеснение не настроено, проблемы не будет. Заранее спасибо.

[72940.050000] BUG: scheduling while atomic: sync_supers/74/0x00000002
[72940.060000] 4 locks held by sync_supers/74:
[72940.060000]  #0:  (&type->s_umount_key#14){......}, at: [<c00ae9b0>] sync_supers+0x68/0xf0
[72940.070000]  #1:  (&type->s_lock_key){......}, at: [<c010e9a8>] jffs2_write_super+0x14/0x40
[72940.080000]  #2:  (&(&c->erase_completion_lock)->rlock){......}, at: [<c010baa0>] jffs2_garbage_collect_pass+0x1fc/0x790
[72940.090000]  #3:  (&c->alloc_sem){......}, at: [<c010baac>] jffs2_garbage_collect_pass+0x208/0x790
[72940.100000] Modules linked in:
[72940.110000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0214940>] (schedule+0x54/0x3c0)
[72940.210000] BUG: spinlock cpu recursion on CPU#0, rm/5927
[72940.210000]  lock: c306e560, .magic: dead4ead, .owner: sync_supers/74, .owner_cpu: 0
[72940.220000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0152d48>] (do_raw_spin_lock+0x60/0x120)
[72940.230000] [<c0152d48>] (do_raw_spin_lock+0x60/0x120) from [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90)
[72940.240000] [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90) from [<c0110514>] (jffs2_flash_writev+0x200/0x400)
[72940.250000] [<c0110514>] (jffs2_flash_writev+0x200/0x400) from [<c0107a14>] (jffs2_write_dirent+0x274/0x484)
[72940.260000] [<c0107a14>] (jffs2_write_dirent+0x274/0x484) from [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac)
[72940.270000] [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac) from [<c0101448>] (jffs2_unlink+0x44/0x84)
[72940.280000] [<c0101448>] (jffs2_unlink+0x44/0x84) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72940.290000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c011a010>] (call_unlink+0x90/0xe0)
[72940.290000] [<c011a010>] (call_unlink+0x90/0xe0) from [<c011a090>] (vfsub_unlink+0x30/0x50)
[72940.300000] [<c011a090>] (vfsub_unlink+0x30/0x50) from [<c0129dac>] (aufs_unlink+0x154/0x22c)
[72940.310000] [<c0129dac>] (aufs_unlink+0x154/0x22c) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72940.320000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c00b8c84>] (do_unlinkat+0xac/0x158)
[72940.330000] [<c00b8c84>] (do_unlinkat+0xac/0x158) from [<c00334e0>] (ret_fast_syscall+0x0/0x2c)

[72945.070000] BUG: spinlock lockup on CPU#0, rm/5927, c306e560
[72945.080000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0152dc8>] (do_raw_spin_lock+0xe0/0x120)
[72945.090000] [<c0152dc8>] (do_raw_spin_lock+0xe0/0x120) from [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90)
[72945.100000] [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90) from [<c0110514>] (jffs2_flash_writev+0x200/0x400)
[72945.110000] [<c0110514>] (jffs2_flash_writev+0x200/0x400) from [<c0107a14>] (jffs2_write_dirent+0x274/0x484)
[72945.120000] [<c0107a14>] (jffs2_write_dirent+0x274/0x484) from [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac)
[72945.130000] [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac) from [<c0101448>] (jffs2_unlink+0x44/0x84)
[72945.140000] [<c0101448>] (jffs2_unlink+0x44/0x84) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72945.150000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c011a010>] (call_unlink+0x90/0xe0)
[72945.150000] [<c011a010>] (call_unlink+0x90/0xe0) from [<c011a090>] (vfsub_unlink+0x30/0x50)
[72945.160000] [<c011a090>] (vfsub_unlink+0x30/0x50) from [<c0129dac>] (aufs_unlink+0x154/0x22c)
[72945.170000] [<c0129dac>] (aufs_unlink+0x154/0x22c) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72945.180000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c00b8c84>] (do_unlinkat+0xac/0x158)
[72945.190000] [<c00b8c84>] (do_unlinkat+0xac/0x158) from [<c00334e0>] (ret_fast_syscall+0x0/0x2c)

[73016.460000] BUG: soft lockup - CPU#0 stuck for 63s! [rm:5927]
[73016.460000] Modules linked in:
[73016.460000] 
[73016.460000] Pid: 5927, comm:                   rm
[73016.460000] CPU: 0    Tainted: G        W    (2.6.39.2 #506)
[73016.460000] PC is at do_raw_spin_lock+0x90/0x120
[73016.460000] LR is at do_raw_spin_lock+0xac/0x120
[73016.460000] pc : [<c0152d78>]    lr : [<c0152d94>]    psr: 80000013
[73016.460000] sp : c36f9cb0  ip : c00334e0  fp : 00da47f0
[73016.460000] r10: 00000028  r9 : c3701384  r8 : 03144800
[73016.460000] r7 : 00000000  r6 : c36f8000  r5 : 001b16c7  r4 : c306e560
[73016.460000] r3 : 00000000  r2 : 00000000  r1 : 00000001  r0 : 00000001
[73016.460000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[73016.460000] Control: 0005317f  Table: 82b90000  DAC: 00000015
[73016.460000] Kernel panic - not syncing: softlockup: hung tasks
[73016.460000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0048740>] (panic+0x5c/0x19c)
[73016.460000] [<c0048740>] (panic+0x5c/0x19c) from [<c0078420>] (watchdog_timer_fn+0x11c/0x15c)
[73016.460000] [<c0078420>] (watchdog_timer_fn+0x11c/0x15c) from [<c006467c>] (hrtimer_run_queues+0x11c/0x1e8)
[73016.460000] [<c006467c>] (hrtimer_run_queues+0x11c/0x1e8) from [<c00538dc>] (run_local_timers+0xc/0x1c)
[73016.460000] [<c00538dc>] (run_local_timers+0xc/0x1c) from [<c0053b30>] (update_process_times+0x2c/0x4c)
[73016.460000] [<c0053b30>] (update_process_times+0x2c/0x4c) from [<c006bd70>] (tick_handle_periodic+0x18/0x84)
[73016.460000] [<c006bd70>] (tick_handle_periodic+0x18/0x84) from [<c003d890>] (lpc32xx_timer_interrupt+0x7c/0xa0)
[73016.460000] [<c003d890>] (lpc32xx_timer_interrupt+0x7c/0xa0) from [<c0078780>] (handle_irq_event_percpu+0x2c/0x19c)
[73016.460000] [<c0078780>] (handle_irq_event_percpu+0x2c/0x19c) from [<c007892c>] (handle_irq_event+0x3c/0x5c)
[73016.460000] [<c007892c>] (handle_irq_event+0x3c/0x5c) from [<c007aadc>] (handle_level_irq+0xbc/0xf0)
[73016.460000] [<c007aadc>] (handle_level_irq+0xbc/0xf0) from [<c0028078>] (asm_do_IRQ+0x78/0xa0)
[73016.460000] [<c0028078>] (asm_do_IRQ+0x78/0xa0) from [<c00330e8>] (__irq_svc+0x48/0x90)
[73016.460000] Exception stack(0xc36f9c68 to 0xc36f9cb0)
[73016.460000] 9c60:                   00000001 00000001 00000000 00000000 c306e560 001b16c7
[73016.460000] 9c80: c36f8000 00000000 03144800 c3701384 00000028 00da47f0 c00334e0 c36f9cb0
[73016.460000] 9ca0: c0152d94 c0152d78 80000013 ffffffff
[73016.460000] [<c00330e8>] (__irq_svc+0x48/0x90) from [<c0152d78>] (do_raw_spin_lock+0x90/0x120)
[73016.460000] [<c0152d78>] (do_raw_spin_lock+0x90/0x120) from [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90)
[73016.460000] [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90) from [<c0110514>] (jffs2_flash_writev+0x200/0x400)
[73016.460000] [<c0110514>] (jffs2_flash_writev+0x200/0x400) from [<c0107a14>] (jffs2_write_dirent+0x274/0x484)
[73016.460000] [<c0107a14>] (jffs2_write_dirent+0x274/0x484) from [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac)
[73016.460000] [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac) from [<c0101448>] (jffs2_unlink+0x44/0x84)
[73016.460000] [<c0101448>] (jffs2_unlink+0x44/0x84) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[73016.460000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c011a010>] (call_unlink+0x90/0xe0)
[73016.460000] [<c011a010>] (call_unlink+0x90/0xe0) from [<c011a090>] (vfsub_unlink+0x30/0x50)
[73016.460000] [<c011a090>] (vfsub_unlink+0x30/0x50) from [<c0129dac>] (aufs_unlink+0x154/0x22c)
[73016.460000] [<c0129dac>] (aufs_unlink+0x154/0x22c) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[73016.460000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c00b8c84>] (do_unlinkat+0xac/0x158)
[73016.460000] [<c00b8c84>] (do_unlinkat+0xac/0x158) from [<c00334e0>] (ret_fast_syscall+0x0/0x2c)

Примечание. Другая трассировка без включения отладки в ядре показывает:

BUG: scheduling while atomic: jffs2_gcd_mtd7/288/0x00000002
[<c002e4e4>] (unwind_backtrace+0x0/0xe4) from [<c0200a64>] (schedule+0x54/0x3ec)
...
BUG: scheduling while atomic: sync_supers/72/0x00000002
[<c002e4e4>] (unwind_backtrace+0x0/0xe4) from [<c0200a64>] (schedule+0x54/0x3ec)

person minghua    schedule 19.06.2013    source источник


Ответы (1)


Сем приобретается при удержании спинлока. Как только порядок меняется, проблема исчезает. Проблема была исправлена ​​в более поздних ядрах.

#2: (&(&c->erase_completion_lock)->rlock), jffs2_garbage_collect_pass+0x1fc/0x790
#3: (&c->alloc_sem), jffs2_garbage_collect_pass+0x208/0x790

Код находится в файле fs/jffs2/gc.c. Поменяйте местами строки 228 и 229, чтобы решить проблему.

227         D1(printk(KERN_DEBUG "No progress from erasing blocks; doing GC     anyway\n"));
228         spin_lock(&c->erase_completion_lock);
229         mutex_lock(&c->alloc_sem);
230 }
person minghua    schedule 27.06.2013
comment
Видите что-то подобное с ядром 2.6.39.4, работающим на ARM. Какие-нибудь подробности о том, как вы определили проблему? - person BackDoorNoBaby; 14.04.2017
comment
Не очень помню. Но я помню, что я просеивал коммиты ядра в файлы jffs2. Возможно, я нашел исправление в одном из этих коммитов. Или, может быть, я заметил неправильный порядок мьютекса и спин-блокировки в трассировке и проверил это, просмотрев коммиты ядра. - person minghua; 05.05.2017