commit 3f45898cffc4e386952f3e4821810500adccea1f Author: Greg Kroah-Hartman Date: Fri Aug 21 13:07:46 2020 +0200 Linux 5.7.17 Tested-by: Guenter Roeck Tested-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit adc8db21719668a4bf25ef0bef72c575d0a81cf5 Author: hersen wu Date: Sun Jul 19 17:21:59 2020 -0400 drm/amd/display: dchubbub p-state warning during surface planes switch commit 8b0379a85762b516c7b46aed7dbf2a4947c00564 upstream. [Why] ramp_up_dispclk_with_dpp is to change dispclk, dppclk and dprefclk according to bandwidth requirement. call stack: rv1_update_clocks --> update_clocks --> dcn10_prepare_bandwidth / dcn10_optimize_bandwidth --> prepare_bandwidth / optimize_bandwidth. before change dcn hw, prepare_bandwidth will be called first to allow enough clock, watermark for change, after end of dcn hw change, optimize_bandwidth is executed to lower clock to save power for new dcn hw settings. below is sequence of commit_planes_for_stream: step 1: prepare_bandwidth - raise clock to have enough bandwidth step 2: lock_doublebuffer_enable step 3: pipe_control_lock(true) - make dchubp register change will not take effect right way step 4: apply_ctx_for_surface - program dchubp step 5: pipe_control_lock(false) - dchubp register change take effect step 6: optimize_bandwidth --> dc_post_update_surfaces_to_stream for full_date, optimize clock to save power at end of step 1, dcn clocks (dprefclk, dispclk, dppclk) may be changed for new dchubp configuration. but real dcn hub dchubps are still running with old configuration until end of step 5. this need clocks settings at step 1 should not less than that before step 1. this is checked by two conditions: 1. if (should_set_clock(safe_to_lower , new_clocks->dispclk_khz, clk_mgr_base->clks.dispclk_khz) || new_clocks->dispclk_khz == clk_mgr_base->clks.dispclk_khz) 2. request_dpp_div = new_clocks->dispclk_khz > new_clocks->dppclk_khz the second condition is based on new dchubp configuration. dppclk for new dchubp may be different from dppclk before step 1. for example, before step 1, dchubps are as below: pipe 0: recout=(0,40,1920,980) viewport=(0,0,1920,979) pipe 1: recout=(0,0,1920,1080) viewport=(0,0,1920,1080) for dppclk for pipe0 need dppclk = dispclk new dchubp pipe split configuration: pipe 0: recout=(0,0,960,1080) viewport=(0,0,960,1080) pipe 1: recout=(960,0,960,1080) viewport=(960,0,960,1080) dppclk only needs dppclk = dispclk /2. dispclk, dppclk are not lock by otg master lock. they take effect after step 1. during this transition, dispclk are the same, but dppclk is changed to half of previous clock for old dchubp configuration between step 1 and step 6. This may cause p-state warning intermittently. [How] for new_clocks->dispclk_khz == clk_mgr_base->clks.dispclk_khz, we need make sure dppclk are not changed to less between step 1 and 6. for new_clocks->dispclk_khz > clk_mgr_base->clks.dispclk_khz, new display clock is raised, but we do not know ratio of new_clocks->dispclk_khz and clk_mgr_base->clks.dispclk_khz, new_clocks->dispclk_khz /2 does not guarantee equal or higher than old dppclk. we could ignore power saving different between dppclk = displck and dppclk = dispclk / 2 between step 1 and step 6. as long as safe_to_lower = false, set dpclk = dispclk to simplify condition check. CC: Stable Signed-off-by: Hersen Wu Reviewed-by: Aric Cyr Acked-by: Eryk Brol Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit bab191af99bd97db80ba19aafeea8081949bbe53 Author: Stylon Wang Date: Tue Jun 30 17:55:29 2020 +0800 drm/amd/display: Fix dmesg warning from setting abm level commit c5892a10218214d729699ab61bad6fc109baf0ce upstream. [Why] Setting abm level does not correctly update CRTC state. As a result no surface update is added to dc stream state and triggers warning. [How] Correctly update CRTC state when setting abm level property. CC: Stable Signed-off-by: Stylon Wang Reviewed-by: Nicholas Kazlauskas Acked-by: Eryk Brol Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit bac04cf4ea5cd6819828b33567f23431e2468708 Author: Sandeep Raghuraman Date: Thu Aug 6 22:52:20 2020 +0530 drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume commit f87812284172a9809820d10143b573d833cd3f75 upstream. Reproducing bug report here: After hibernating and resuming, DPM is not enabled. This remains the case even if you test hibernate using the steps here: https://www.kernel.org/doc/html/latest/power/basic-pm-debugging.html I debugged the problem, and figured out that in the file hardwaremanager.c, in the function, phm_enable_dynamic_state_management(), the check 'if (!hwmgr->pp_one_vf && smum_is_dpm_running(hwmgr) && !amdgpu_passthrough(adev) && adev->in_suspend)' returns true for the hibernate case, and false for the suspend case. This means that for the hibernate case, the AMDGPU driver doesn't enable DPM (even though it should) and simply returns from that function. In the suspend case, it goes ahead and enables DPM, even though it doesn't need to. I debugged further, and found out that in the case of suspend, for the CIK/Hawaii GPUs, smum_is_dpm_running(hwmgr) returns false, while in the case of hibernate, smum_is_dpm_running(hwmgr) returns true. For CIK, the ci_is_dpm_running() function calls the ci_is_smc_ram_running() function, which is ultimately used to determine if DPM is currently enabled or not, and this seems to provide the wrong answer. I've changed the ci_is_dpm_running() function to instead use the same method that some other AMD GPU chips do (e.g Fiji), which seems to read the voltage controller. I've tested on my R9 390 and it seems to work correctly for both suspend and hibernate use cases, and has been stable so far. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=208839 Signed-off-by: Sandeep Raghuraman Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 65ed6b7dacdb90099ebd50ca79bb72eaaf74a275 Author: Xin Xiong Date: Sun Jul 19 23:45:45 2020 +0800 drm: fix drm_dp_mst_port refcount leaks in drm_dp_mst_allocate_vcpi commit a34a0a632dd991a371fec56431d73279f9c54029 upstream. drm_dp_mst_allocate_vcpi() invokes drm_dp_mst_topology_get_port_validated(), which increases the refcount of the "port". These reference counting issues take place in two exception handling paths separately. Either when “slots” is less than 0 or when drm_dp_init_vcpi() returns a negative value, the function forgets to reduce the refcnt increased drm_dp_mst_topology_get_port_validated(), which results in a refcount leak. Fix these issues by pulling up the error handling when "slots" is less than 0, and calling drm_dp_mst_topology_put_port() before termination when drm_dp_init_vcpi() returns a negative value. Fixes: 1e797f556c61 ("drm/dp: Split drm_dp_mst_allocate_vcpi") Cc: # v4.12+ Signed-off-by: Xiyu Yang Signed-off-by: Xin Tan Signed-off-by: Xin Xiong Reviewed-by: Lyude Paul Signed-off-by: Lyude Paul Link: https://patchwork.freedesktop.org/patch/msgid/20200719154545.GA41231@xin-virtual-machine Signed-off-by: Greg Kroah-Hartman commit ff4ca77f631ac57013333eecd8ff57e38abb9506 Author: Marius Iacob Date: Sat Aug 1 15:34:46 2020 +0300 drm: Added orientation quirk for ASUS tablet model T103HAF commit b5ac98cbb8e5e30c34ebc837d1e5a3982d2b5f5c upstream. Signed-off-by: Marius Iacob Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20200801123445.1514567-1-themariusus@gmail.com Signed-off-by: Greg Kroah-Hartman commit d79c3af5ef6e9dc2223545a7db1bb88a5eb08d4f Author: Tomi Valkeinen Date: Thu Jun 4 11:02:14 2020 +0300 drm/tidss: fix modeset init for DPI panels commit a72a6a16d51034045cb6355924b62221a8221ca3 upstream. The connector type for DISPC's DPI videoport was set the LVDS instead of DPI. This causes any DPI panel setup to fail with tidss, making all DPI panels unusable. Fix this by using correct connector type. Signed-off-by: Tomi Valkeinen Fixes: 32a1795f57eecc39749017 ("drm/tidss: New driver for TI Keystone platform Display SubSystem") Cc: stable@vger.kernel.org # v5.7+ Link: https://patchwork.freedesktop.org/patch/msgid/20200604080214.107159-1-tomi.valkeinen@ti.com Reviewed-by: Jyri Sarha Signed-off-by: Greg Kroah-Hartman commit 2eb33aa7e9a11fd7a7d1543de6b36e54868ca1d9 Author: Tomi Valkeinen Date: Thu Jun 18 12:51:52 2020 +0300 drm/omap: force runtime PM suspend on system suspend commit ecfdedd7da5d54416db5ca0f851264dca8736f59 upstream. Use SET_LATE_SYSTEM_SLEEP_PM_OPS in DSS submodules to force runtime PM suspend and resume. We use suspend late version so that omapdrm's system suspend callback is called first, as that will disable all the display outputs after which it's safe to force DSS into suspend. Signed-off-by: Tomi Valkeinen Link: https://patchwork.freedesktop.org/patch/msgid/20200618095153.611071-1-tomi.valkeinen@ti.com Acked-by: Tony Lindgren Fixes: cef766300353 ("drm/omap: Prepare DSS for probing without legacy platform data") Cc: stable@vger.kernel.org # v5.7+ Tested-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit d0b3df5fdc668c07bb47cb3082124eb823066141 Author: Alex Deucher Date: Thu Aug 6 10:49:39 2020 -0400 drm/amdgpu: fix ordering of psp suspend The ordering of psp_tmr_terminate() and psp_asd_unload() got reversed when the patches were applied to stable. This patch does not exist in Linus' tree because the ordering is correct there. It got reversed when the patches were applied to stable. This patch is for stable only. Fixes: 22ff658396b446 ("drm/amdgpu: asd function needs to be unloaded in suspend phase") Fixes: 2c41c968c6f648 ("drm/amdgpu: add TMR destory function for psp") Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org # 5.7.x Cc: Huang Rui Signed-off-by: Greg Kroah-Hartman commit cb22808ac2759706a410d89e1219de611a3b2fe4 Author: Imre Deak Date: Mon Jun 8 00:25:21 2020 +0300 drm/dp_mst: Fix the DDC I2C device registration of an MST port commit d8bd15b37d328a935a4fc695fed8b19157503950 upstream. During the initial MST probing an MST port's I2C device will be registered using the kdev of the DRM device as a parent. Later after MST Connection Status Notifications this I2C device will be re-registered with the kdev of the port's connector. This will also move inconsistently the I2C device's sysfs entry from the DRM device's sysfs dir to the connector's dir. Fix the above by keeping the DRM kdev as the parent of the I2C device. Ideally the connector's kdev would be used as a parent, similarly to non-MST connectors, however that needs some more refactoring to ensure the connector's kdev is already available early enough. So keep the existing (initial) behavior for now. Cc: Signed-off-by: Imre Deak Reviewed-by: Stanislav Lisovskiy Reviewed-by: Lyude Paul Link: https://patchwork.freedesktop.org/patch/msgid/20200607212522.16935-2-imre.deak@intel.com Signed-off-by: Greg Kroah-Hartman commit 1a0cfb082770269f642d6a27bba952c6a2bdf722 Author: Denis Efremov Date: Mon Jun 8 18:17:28 2020 +0300 drm/panfrost: Use kvfree() to free bo->sgts commit 114427b8927a4def2942b2b886f7e4aeae289ccb upstream. Use kvfree() to free bo->sgts, because the memory is allocated with kvmalloc_array() in panfrost_mmu_map_fault_addr(). Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Cc: stable@vger.kernel.org Signed-off-by: Denis Efremov Reviewed-by: Steven Price Signed-off-by: Steven Price Link: https://patchwork.freedesktop.org/patch/msgid/20200608151728.234026-1-efremov@linux.com Signed-off-by: Greg Kroah-Hartman commit 5c4f75b93e2263ba5594a3eaa51745786a8565ce Author: Chris Wilson Date: Mon May 25 16:14:59 2020 +0100 drm/i915/gt: Force the GT reset on shutdown commit 7c4541a37bbbf83c0f16f779e85eb61d9348ed29 upstream. Before we return control to the system, and letting it reuse all the pages being accessed by HW, we must disable the HW. At the moment, we dare not reset the GPU if it will clobber the display, but once we know the display has been disabled, we can proceed with the reset as we shutdown the module. We know the next user must reinitialise the HW for their purpose. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/489 Signed-off-by: Chris Wilson Cc: stable@kernel.org Reviewed-by: Mika Kuoppala Link: https://patchwork.freedesktop.org/patch/msgid/20200525151459.12083-1-chris@chris-wilson.co.uk Signed-off-by: Greg Kroah-Hartman commit f3b04fa3539b26262c98fdcd5254470d55ffc752 Author: Sowjanya Komatineni Date: Mon Jan 13 23:24:24 2020 -0800 ASoC: tegra: Enable audio mclk during tegra_asoc_utils_init() commit ff5d18cb04f4ecccbcf05b7f83ab6df2a0d95c16 upstream. Tegra PMC clock clk_out_1 is dedicated for audio mclk from Tegra30 through Tegra210 and currently Tegra clock driver keeps the audio mclk enabled. With the move of PMC clocks from clock driver into pmc driver, audio mclk enable from clock driver is removed and this should be taken care of by the audio driver. tegra_asoc_utils_init() calls tegra_asoc_utils_set_rate() and audio mclk rate configuration is not needed during init and the rate is actually set during the ->hw_params() callback. So, this patch removes tegra_asoc_utils_set_rate() call and just leaves the audio mclk enabled. Signed-off-by: Sowjanya Komatineni Tested-by: Dmitry Osipenko Reviewed-by: Dmitry Osipenko Signed-off-by: Thierry Reding Signed-off-by: Greg Kroah-Hartman commit 22c672583bf7aa8f30c43fbf58f7365a4134a4e5 Author: Sowjanya Komatineni Date: Mon Jan 13 23:24:23 2020 -0800 ASoC: tegra: Add audio mclk parent configuration commit 1e4e0bf136aa4b4aa59c1e6af19844bd6d807794 upstream. Tegra PMC clock clk_out_1 is dedicated for audio mclk from Tegra30 through Tegra210 and currently Tegra clock driver does the initial parent configuration for audio mclk and keeps it enabled by default. With the move of PMC clocks from clock driver into PMC driver, audio clocks parent configuration can be specified through the device tree using assigned-clock-parents property and audio mclk control should be taken care of by the audio driver. This patch has implementation for parent configuration when default parent configuration through assigned-clock-parents property is not specified in the device tree. Tested-by: Dmitry Osipenko Reviewed-by: Dmitry Osipenko Reviewed-by: Sameer Pujar Signed-off-by: Sowjanya Komatineni Signed-off-by: Thierry Reding Signed-off-by: Greg Kroah-Hartman commit 59fdffe6cfaceaa2a22bda7ea05b4c151f9fc264 Author: Sowjanya Komatineni Date: Mon Jan 13 23:24:17 2020 -0800 ASoC: tegra: Use device managed resource APIs to get the clock commit 0de6db30ef79b391cedd749801a49c485d2daf4b upstream. tegra_asoc_utils uses clk_get() to get the clock and clk_put() to free them explicitly. This patch updates it to use device managed resource API devm_clk_get() so the clock will be automatically released and freed when the device is unbound and removes tegra_asoc_utils_fini() as its no longer needed. Tested-by: Dmitry Osipenko Reviewed-by: Dmitry Osipenko Reviewed-by: Sameer Pujar Signed-off-by: Sowjanya Komatineni Signed-off-by: Thierry Reding Signed-off-by: Greg Kroah-Hartman commit ffc16bab11243dbfd6a48f585f6f620b374fa7b6 Author: Hugh Dickins Date: Thu Aug 6 23:26:22 2020 -0700 khugepaged: retract_page_tables() remember to test exit commit 18e77600f7a1ed69f8ce46c9e11cad0985712dfa upstream. Only once have I seen this scenario (and forgot even to notice what forced the eventual crash): a sequence of "BUG: Bad page map" alerts from vm_normal_page(), from zap_pte_range() servicing exit_mmap(); pmd:00000000, pte values corresponding to data in physical page 0. The pte mappings being zapped in this case were supposed to be from a huge page of ext4 text (but could as well have been shmem): my belief is that it was racing with collapse_file()'s retract_page_tables(), found *pmd pointing to a page table, locked it, but *pmd had become 0 by the time start_pte was decided. In most cases, that possibility is excluded by holding mmap lock; but exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged checks khugepaged_test_exit() after acquiring mmap lock: khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so, for example. But retract_page_tables() did not: fix that. The fix is for retract_page_tables() to check khugepaged_test_exit(), after acquiring mmap lock, before doing anything to the page table. Getting the mmap lock serializes with __mmput(), which briefly takes and drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on mm_users makes sure we don't touch the page table once exit_mmap() might reach it, since exit_mmap() will be proceeding without mmap lock, not expecting anyone to be racing with it. Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Mike Kravetz Cc: Song Liu Cc: [4.8+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 9dc36a1d70e9cccfc1a0b2f2c37929ee789a425f Author: Geert Uytterhoeven Date: Fri Aug 14 14:42:45 2020 +0200 sh: landisk: Add missing initialization of sh_io_port_base [ Upstream commit 0c64a0dce51faa9c706fdf1f957d6f19878f4b81 ] The Landisk setup code maps the CF IDE area using ioremap_prot(), and passes the resulting virtual addresses to the pata_platform driver, disguising them as I/O port addresses. Hence the pata_platform driver translates them again using ioport_map(). As CONFIG_GENERIC_IOMAP=n, and CONFIG_HAS_IOPORT_MAP=y, the SuperH-specific mapping code in arch/sh/kernel/ioport.c translates I/O port addresses to virtual addresses by adding sh_io_port_base, which defaults to -1, thus breaking the assumption of an identity mapping. Fix this by setting sh_io_port_base to zero. Fixes: 37b7a97884ba64bf ("sh: machvec IO death.") Signed-off-by: Geert Uytterhoeven Signed-off-by: Rich Felker Signed-off-by: Sasha Levin commit 6db9914337d217b7c6aa71b743e9a8cb8c0ddb2c Author: Zhang Rui Date: Tue Aug 11 23:31:47 2020 +0800 perf/x86/rapl: Fix missing psys sysfs attributes [ Upstream commit 4bb5fcb97a5df0bbc0a27e0252b1e7ce140a8431 ] This fixes a problem introduced by commit: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface") that perf event sysfs attributes for psys RAPL domain are missing. Fixes: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface") Signed-off-by: Zhang Rui Signed-off-by: Ingo Molnar Reviewed-by: Kan Liang Reviewed-by: Len Brown Acked-by: Jiri Olsa Link: https://lore.kernel.org/r/20200811153149.12242-2-rui.zhang@intel.com Signed-off-by: Sasha Levin commit 8222753bd467c95c3218698fe332326fc9cf7952 Author: Daniel Díaz Date: Wed Aug 12 17:15:17 2020 -0500 tools build feature: Quote CC and CXX for their arguments [ Upstream commit fa5c893181ed2ca2f96552f50073786d2cfce6c0 ] When using a cross-compilation environment, such as OpenEmbedded, the CC an CXX variables are set to something more than just a command: there are arguments (such as --sysroot) that need to be passed on to the compiler so that the right set of headers and libraries are used. For the particular case that our systems detected, CC is set to the following: export CC="aarch64-linaro-linux-gcc --sysroot=/oe/build/tmp/work/machine/perf/1.0-r9/recipe-sysroot" Without quotes, detection is as follows: Auto-detecting system features: ... dwarf: [ OFF ] ... dwarf_getlocations: [ OFF ] ... glibc: [ OFF ] ... gtk2: [ OFF ] ... libbfd: [ OFF ] ... libcap: [ OFF ] ... libelf: [ OFF ] ... libnuma: [ OFF ] ... numa_num_possible_cpus: [ OFF ] ... libperl: [ OFF ] ... libpython: [ OFF ] ... libcrypto: [ OFF ] ... libunwind: [ OFF ] ... libdw-dwarf-unwind: [ OFF ] ... zlib: [ OFF ] ... lzma: [ OFF ] ... get_cpuid: [ OFF ] ... bpf: [ OFF ] ... libaio: [ OFF ] ... libzstd: [ OFF ] ... disassembler-four-args: [ OFF ] Makefile.config:414: *** No gnu/libc-version.h found, please install glibc-dev[el]. Stop. Makefile.perf:230: recipe for target 'sub-make' failed make[1]: *** [sub-make] Error 2 Makefile:69: recipe for target 'all' failed make: *** [all] Error 2 With CC and CXX quoted, some of those features are now detected. Fixes: e3232c2f39ac ("tools build feature: Use CC and CXX from parent") Signed-off-by: Daniel Díaz Reviewed-by: Thomas Hebb Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Daniel Borkmann Cc: Jiri Olsa Cc: John Fastabend Cc: KP Singh Cc: Martin KaFai Lau Cc: Namhyung Kim Cc: Song Liu Cc: Stephane Eranian Cc: Yonghong Song Link: http://lore.kernel.org/lkml/20200812221518.2869003-1-daniel.diaz@linaro.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin commit c3973cdad124f9e8453814554dd4e8316c7d671a Author: Vincent Whitchurch Date: Mon Aug 10 15:34:04 2020 +0200 perf bench mem: Always memset source before memcpy [ Upstream commit 1beaef29c34154ccdcb3f1ae557f6883eda18840 ] For memcpy, the source pages are memset to zero only when --cycles is used. This leads to wildly different results with or without --cycles, since all sources pages are likely to be mapped to the same zero page without explicit writes. Before this fix: $ export cmd="./perf stat -e LLC-loads -- ./perf bench \ mem memcpy -s 1024MB -l 100 -f default" $ $cmd 2,935,826 LLC-loads 3.821677452 seconds time elapsed $ $cmd --cycles 217,533,436 LLC-loads 8.616725985 seconds time elapsed After this fix: $ $cmd 214,459,686 LLC-loads 8.674301124 seconds time elapsed $ $cmd --cycles 214,758,651 LLC-loads 8.644480006 seconds time elapsed Fixes: 47b5757bac03c338 ("perf bench mem: Move boilerplate memory allocation to the infrastructure") Signed-off-by: Vincent Whitchurch Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Cc: kernel@axis.com Link: http://lore.kernel.org/lkml/20200810133404.30829-1-vincent.whitchurch@axis.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin commit 4e4dd9ae19f61179fd40f3cd9f156c34a3d2d202 Author: Dinghao Liu Date: Thu Aug 13 15:46:30 2020 +0800 ALSA: echoaudio: Fix potential Oops in snd_echo_resume() [ Upstream commit 5a25de6df789cc805a9b8ba7ab5deef5067af47e ] Freeing chip on error may lead to an Oops at the next time the system goes to resume. Fix this by removing all snd_echo_free() calls on error. Fixes: 47b5d028fdce8 ("ALSA: Echoaudio - Add suspend support #2") Signed-off-by: Dinghao Liu Link: https://lore.kernel.org/r/20200813074632.17022-1-dinghao.liu@zju.edu.cn Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin commit d54df8d8a84321238e6f8557dd32a1b0eef8aa5c Author: Ondrej Mosnacek Date: Wed Aug 12 14:58:25 2020 +0200 crypto: algif_aead - fix uninitialized ctx->init [ Upstream commit 21dfbcd1f5cbff9cf2f9e7e43475aed8d072b0dd ] In skcipher_accept_parent_nokey() the whole af_alg_ctx structure is cleared by memset() after allocation, so add such memset() also to aead_accept_parent_nokey() so that the new "init" field is also initialized to zero. Without that the initial ctx->init checks might randomly return true and cause errors. While there, also remove the redundant zero assignments in both functions. Found via libkcapi testsuite. Cc: Stephan Mueller Fixes: f3c802a1f300 ("crypto: algif_aead - Only wake up when ctx->more is zero") Suggested-by: Herbert Xu Signed-off-by: Ondrej Mosnacek Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin commit 35a7872bebe83cb6d24b4f81fb6826ebadee1a22 Author: Andy Shevchenko Date: Thu Jul 23 16:02:46 2020 +0300 mfd: dln2: Run event handler loop under spinlock [ Upstream commit 3d858942250820b9adc35f963a257481d6d4c81d ] The event handler loop must be run with interrupts disabled. Otherwise we will have a warning: [ 1970.785649] irq 31 handler lineevent_irq_handler+0x0/0x20 enabled interrupts [ 1970.792739] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159 __handle_irq_event_percpu+0x162/0x170 [ 1970.860732] RIP: 0010:__handle_irq_event_percpu+0x162/0x170 ... [ 1970.946994] Call Trace: [ 1970.949446] [ 1970.951471] handle_irq_event_percpu+0x2c/0x80 [ 1970.955921] handle_irq_event+0x23/0x43 [ 1970.959766] handle_simple_irq+0x57/0x70 [ 1970.963695] generic_handle_irq+0x42/0x50 [ 1970.967717] dln2_rx+0xc1/0x210 [dln2] [ 1970.971479] ? usb_hcd_unmap_urb_for_dma+0xa6/0x1c0 [ 1970.976362] __usb_hcd_giveback_urb+0x77/0xe0 [ 1970.980727] usb_giveback_urb_bh+0x8e/0xe0 [ 1970.984837] tasklet_action_common.isra.0+0x4a/0xe0 ... Recently xHCI driver switched to tasklets in the commit 36dc01657b49 ("usb: host: xhci: Support running urb giveback in tasklet context"). The handle_irq_event_* functions are expected to be called with interrupts disabled and they rightfully complain here because we run in tasklet context with interrupts enabled. Use a event spinlock to protect event handler from being interrupted. Note, that there are only two users of this GPIO and ADC drivers and both of them are using generic_handle_irq() which makes above happen. Fixes: 338a12814297 ("mfd: Add support for Diolan DLN-2 devices") Signed-off-by: Andy Shevchenko Signed-off-by: Lee Jones Signed-off-by: Sasha Levin commit f4b970a15e1fe2b5a91987f3c86334e7379becad Author: Dhananjay Phadke Date: Mon Aug 10 17:42:40 2020 -0700 i2c: iproc: fix race between client unreg and isr [ Upstream commit b1eef236f50ba6afea680da039ef3a2ca9c43d11 ] When i2c client unregisters, synchronize irq before setting iproc_i2c->slave to NULL. (1) disable_irq() (2) Mask event enable bits in control reg (3) Erase slave address (avoid further writes to rx fifo) (4) Flush tx and rx FIFOs (5) Clear pending event (interrupt) bits in status reg (6) enable_irq() (7) Set client pointer to NULL Unable to handle kernel NULL pointer dereference at virtual address 0000000000000318 [ 371.020421] pc : bcm_iproc_i2c_isr+0x530/0x11f0 [ 371.025098] lr : __handle_irq_event_percpu+0x6c/0x170 [ 371.030309] sp : ffff800010003e40 [ 371.033727] x29: ffff800010003e40 x28: 0000000000000060 [ 371.039206] x27: ffff800010ca9de0 x26: ffff800010f895df [ 371.044686] x25: ffff800010f18888 x24: ffff0008f7ff3600 [ 371.050165] x23: 0000000000000003 x22: 0000000001600000 [ 371.055645] x21: ffff800010f18888 x20: 0000000001600000 [ 371.061124] x19: ffff0008f726f080 x18: 0000000000000000 [ 371.066603] x17: 0000000000000000 x16: 0000000000000000 [ 371.072082] x15: 0000000000000000 x14: 0000000000000000 [ 371.077561] x13: 0000000000000000 x12: 0000000000000001 [ 371.083040] x11: 0000000000000000 x10: 0000000000000040 [ 371.088519] x9 : ffff800010f317c8 x8 : ffff800010f317c0 [ 371.093999] x7 : ffff0008f805b3b0 x6 : 0000000000000000 [ 371.099478] x5 : ffff0008f7ff36a4 x4 : ffff8008ee43d000 [ 371.104957] x3 : 0000000000000000 x2 : ffff8000107d64c0 [ 371.110436] x1 : 00000000c00000af x0 : 0000000000000000 [ 371.115916] Call trace: [ 371.118439] bcm_iproc_i2c_isr+0x530/0x11f0 [ 371.122754] __handle_irq_event_percpu+0x6c/0x170 [ 371.127606] handle_irq_event_percpu+0x34/0x88 [ 371.132189] handle_irq_event+0x40/0x120 [ 371.136234] handle_fasteoi_irq+0xcc/0x1a0 [ 371.140459] generic_handle_irq+0x24/0x38 [ 371.144594] __handle_domain_irq+0x60/0xb8 [ 371.148820] gic_handle_irq+0xc0/0x158 [ 371.152687] el1_irq+0xb8/0x140 [ 371.155927] arch_cpu_idle+0x10/0x18 [ 371.159615] do_idle+0x204/0x290 [ 371.162943] cpu_startup_entry+0x24/0x60 [ 371.166990] rest_init+0xb0/0xbc [ 371.170322] arch_call_rest_init+0xc/0x14 [ 371.174458] start_kernel+0x404/0x430 Fixes: c245d94ed106 ("i2c: iproc: Add multi byte read-write support for slave mode") Signed-off-by: Dhananjay Phadke Reviewed-by: Florian Fainelli Acked-by: Ray Jui Signed-off-by: Wolfram Sang Signed-off-by: Sasha Levin commit 8254b3b3299a47686495d3c7e1cd7ab8af6cb219 Author: Tiezhu Yang Date: Tue Aug 11 18:36:16 2020 -0700 test_kmod: avoid potential double free in trigger_config_run_type() [ Upstream commit 0776d1231bec0c7ab43baf440a3f5ef5f49dd795 ] Reset the member "test_fs" of the test configuration after a call of the function "kfree_const" to a null pointer so that a double memory release will not be performed. Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader") Signed-off-by: Tiezhu Yang Signed-off-by: Luis Chamberlain Signed-off-by: Andrew Morton Acked-by: Luis Chamberlain Cc: Alexei Starovoitov Cc: Al Viro Cc: Christian Brauner Cc: Chuck Lever Cc: David Howells Cc: David S. Miller Cc: Greg Kroah-Hartman Cc: Jakub Kicinski Cc: James Morris Cc: Jarkko Sakkinen Cc: J. Bruce Fields Cc: Jens Axboe Cc: Josh Triplett Cc: Kees Cook Cc: Lars Ellenberg Cc: Nikolay Aleksandrov Cc: Philipp Reisner Cc: Roopa Prabhu Cc: "Serge E. Hallyn" Cc: Sergei Trofimovich Cc: Sergey Kvachonok Cc: Shuah Khan Cc: Tony Vroon Cc: Christoph Hellwig Link: http://lkml.kernel.org/r/20200610154923.27510-4-mcgrof@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit f1162563717a5e5ee71b2114f3bb0c4d6b02c69a Author: Colin Ian King Date: Tue Aug 11 18:35:53 2020 -0700 fs/ufs: avoid potential u32 multiplication overflow [ Upstream commit 88b2e9b06381551b707d980627ad0591191f7a2d ] The 64 bit ino is being compared to the product of two u32 values, however, the multiplication is being performed using a 32 bit multiply so there is a potential of an overflow. To be fully safe, cast uspi->s_ncg to a u64 to ensure a 64 bit multiplication occurs to avoid any chance of overflow. Fixes: f3e2a520f5fb ("ufs: NFS support") Signed-off-by: Colin Ian King Signed-off-by: Andrew Morton Cc: Evgeniy Dushistov Cc: Alexey Dobriyan Link: http://lkml.kernel.org/r/20200715170355.1081713-1-colin.king@canonical.com Addresses-Coverity: ("Unintentional integer overflow") Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 3a1af802a4fcf172e10580cd60faf3b7d257d80f Author: Eric Biggers Date: Tue Aug 11 18:35:39 2020 -0700 fs/minix: remove expected error message in block_to_path() [ Upstream commit f666f9fb9a36f1c833b9d18923572f0e4d304754 ] When truncating a file to a size within the last allowed logical block, block_to_path() is called with the *next* block. This exceeds the limit, causing the "block %ld too big" error message to be printed. This case isn't actually an error; there are just no more blocks past that point. So, remove this error message. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers Signed-off-by: Andrew Morton Cc: Alexander Viro Cc: Qiujun Huang Link: http://lkml.kernel.org/r/20200628060846.682158-7-ebiggers@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit d37181ce395124eeb42433a6f47a7c62b5c739ab Author: Eric Biggers Date: Tue Aug 11 18:35:36 2020 -0700 fs/minix: fix block limit check for V1 filesystems [ Upstream commit 0a12c4a8069607247cb8edc3b035a664e636fd9a ] The minix filesystem reads its maximum file size from its on-disk superblock. This value isn't necessarily a multiple of the block size. When it's not, the V1 block mapping code doesn't allow mapping the last possible block. Commit 6ed6a722f9ab ("minixfs: fix block limit check") fixed this in the V2 mapping code. Fix it in the V1 mapping code too. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers Signed-off-by: Andrew Morton Cc: Alexander Viro Cc: Qiujun Huang Link: http://lkml.kernel.org/r/20200628060846.682158-6-ebiggers@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 9c49d4c5b7ebd30bfb4a004697d97a40b2656866 Author: Eric Biggers Date: Tue Aug 11 18:35:33 2020 -0700 fs/minix: set s_maxbytes correctly [ Upstream commit 32ac86efff91a3e4ef8c3d1cadd4559e23c8e73a ] The minix filesystem leaves super_block::s_maxbytes at MAX_NON_LFS rather than setting it to the actual filesystem-specific limit. This is broken because it means userspace doesn't see the standard behavior like getting EFBIG and SIGXFSZ when exceeding the maximum file size. Fix this by setting s_maxbytes correctly. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers Signed-off-by: Andrew Morton Cc: Alexander Viro Cc: Qiujun Huang Link: http://lkml.kernel.org/r/20200628060846.682158-5-ebiggers@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit b41dfa93f1e53fcf0e9154ec85012304f8968f91 Author: Tiezhu Yang Date: Tue Aug 11 18:34:47 2020 -0700 lib/test_lockup.c: fix return value of test_lockup_init() [ Upstream commit 3adf3bae0d612357da516d39e1584f1547eb6e86 ] Since filp_open() returns an error pointer, we should use IS_ERR() to check the return value and then return PTR_ERR() if failed to get the actual return value instead of always -EINVAL. E.g. without this patch: [root@localhost loongson]# ls no_such_file ls: cannot access no_such_file: No such file or directory [root@localhost loongson]# modprobe test_lockup file_path=no_such_file lock_sb_umount time_secs=60 state=S modprobe: ERROR: could not insert 'test_lockup': Invalid argument [root@localhost loongson]# dmesg | tail -1 [ 126.100596] test_lockup: cannot find file_path With this patch: [root@localhost loongson]# ls no_such_file ls: cannot access no_such_file: No such file or directory [root@localhost loongson]# modprobe test_lockup file_path=no_such_file lock_sb_umount time_secs=60 state=S modprobe: ERROR: could not insert 'test_lockup': Unknown symbol in module, or unknown parameter (see dmesg) [root@localhost loongson]# dmesg | tail -1 [ 95.134362] test_lockup: failed to open no_such_file: -2 Fixes: aecd42df6d39 ("lib/test_lockup.c: add parameters for locking generic vfs locks") Signed-off-by: Tiezhu Yang Signed-off-by: Andrew Morton Reviewed-by: Guenter Roeck Cc: Konstantin Khlebnikov Cc: Kees Cook Link: http://lkml.kernel.org/r/1595555407-29875-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit ee7798b00f97e14bc9c91d8ac47e207386930907 Author: Trond Myklebust Date: Tue Aug 11 13:36:32 2020 -0400 NFS: Fix flexfiles read failover [ Upstream commit 563c53e73b8b6ec842828736f77e633f7b0911e9 ] The current mirrored read failover code is correctly resetting the mirror index between failed reads, however it is not able to actually flip the RPC call over to the next RPC client. The end result is that we keep resending the RPC call to the same client over and over. The fix is to use the pnfs_read_resend_pnfs() mechanism to schedule a new RPC call, but we need to add the ability to pass in a mirror index so that we always retry the next mirror in the list. Fixes: 166bd5b889ac ("pNFS/flexfiles: Fix layoutstats handling during read failovers") Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit 4476b8282f0bdbf21c8a1e5d783ee11a0edfcaf2 Author: Jeffrey Mitchell Date: Wed Aug 5 12:23:19 2020 -0500 nfs: Fix getxattr kernel panic and memory overflow [ Upstream commit b4487b93545214a9db8cbf32e86411677b0cca21 ] Move the buffer size check to decode_attr_security_label() before memcpy() Only call memcpy() if the buffer is large enough Fixes: aa9c2669626c ("NFS: Client implementation of Labeled-NFS") Signed-off-by: Jeffrey Mitchell [Trond: clean up duplicate test of label->len != 0] Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit 2699807803f9e00c7a471f92601a60185adb59be Author: Wang Hai Date: Mon Aug 10 10:57:05 2020 +0800 net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init [ Upstream commit 50caa777a3a24d7027748e96265728ce748b41ef ] Fix the missing clk_disable_unprepare() before return from emac_clks_phase1_init() in the error handling case. Fixes: b9b17debc69d ("net: emac: emac gigabit ethernet controller driver") Reported-by: Hulk Robot Signed-off-by: Wang Hai Acked-by: Timur Tabi Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit fdeb41184c3283bc7019c44b619de9fe8688e598 Author: Krzysztof Kozlowski Date: Wed Aug 5 17:50:53 2020 +0200 s390/Kconfig: add missing ZCRYPT dependency to VFIO_AP [ Upstream commit 929a343b858612100cb09443a8aaa20d4a4706d3 ] The VFIO_AP uses ap_driver_register() (and deregister) functions implemented in ap_bus.c (compiled into ap.o). However the ap.o will be built only if CONFIG_ZCRYPT is selected. This was not visible before commit e93a1695d7fb ("iommu: Enable compile testing for some of drivers") because the CONFIG_VFIO_AP depends on CONFIG_S390_AP_IOMMU which depends on the missing CONFIG_ZCRYPT. After adding COMPILE_TEST, it is possible to select a configuration with VFIO_AP and S390_AP_IOMMU but without the ZCRYPT. Add proper dependency to the VFIO_AP to fix build errors: ERROR: modpost: "ap_driver_register" [drivers/s390/crypto/vfio_ap.ko] undefined! ERROR: modpost: "ap_driver_unregister" [drivers/s390/crypto/vfio_ap.ko] undefined! Reported-by: kernel test robot Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski Signed-off-by: Heiko Carstens Signed-off-by: Sasha Levin commit 07a4dc1435820ef9fcbb9767a0239336d24ca9bf Author: Wang Hai Date: Thu Jul 30 14:36:02 2020 +0800 s390/test_unwind: fix possible memleak in test_unwind() [ Upstream commit 75d3e7f4769d276a056efa1cc7f08de571fc9b4b ] test_unwind() misses to call kfree(bt) in an error path. Add the missed function call to fix it. Fixes: 0610154650f1 ("s390/test_unwind: print verbose unwinding results") Reported-by: Hulk Robot Signed-off-by: Wang Hai Acked-by: Ilya Leoshkevich Signed-off-by: Heiko Carstens Signed-off-by: Sasha Levin commit 64efb94072ae2e99da7886930c1321cf8d537f0f Author: Dan Carpenter Date: Fri Jun 26 13:39:59 2020 +0300 drm/vmwgfx: Fix two list_for_each loop exit tests [ Upstream commit 4437c1152ce0e57ab8f401aa696ea6291cc07ab1 ] These if statements are supposed to be true if we ended the list_for_each_entry() loops without hitting a break statement but they don't work. In the first loop, we increment "i" after the "if (i == unit)" condition so we don't necessarily know that "i" is not equal to unit at the end of the loop. In the second loop we exit when mode is not pointing to a valid drm_display_mode struct so it doesn't make sense to check "mode->type". Fixes: a278724aa23c ("drm/vmwgfx: Implement fbdev on kms v2") Signed-off-by: Dan Carpenter Reviewed-by: Roland Scheidegger Signed-off-by: Roland Scheidegger Signed-off-by: Sasha Levin commit 9b2dca760333887efcab2b1a7023c5f1abcf770f Author: Dan Carpenter Date: Fri Jun 26 13:34:37 2020 +0300 drm/vmwgfx: Use correct vmw_legacy_display_unit pointer [ Upstream commit 1d2c0c565bc0da25f5e899a862fb58e612b222df ] The "entry" pointer is an offset from the list head and it doesn't point to a valid vmw_legacy_display_unit struct. Presumably the intent was to point to the last entry. Also the "i++" wasn't used so I have removed that as well. Fixes: d7e1958dbe4a ("drm/vmwgfx: Support older hardware.") Signed-off-by: Dan Carpenter Reviewed-by: Roland Scheidegger Signed-off-by: Roland Scheidegger Signed-off-by: Sasha Levin commit dc799dbb98c726fc783e50badac4865e24af65d9 Author: Dan Carpenter Date: Mon Apr 6 17:45:52 2020 +0300 vdpa: Fix pointer math bug in vdpasim_get_config() [ Upstream commit cf16fe9243bfa2863491026fc727618c7c593c84 ] If "offset" is non-zero then we end up copying from beyond the end of the config because of pointer math. We can fix this by casting the struct to a u8 pointer. Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/20200406144552.GF68494@mwanda Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang Signed-off-by: Sasha Levin commit eb3ec5bf7ad48b064387d3a63af21754008ce635 Author: Christophe Leroy Date: Mon Aug 10 08:48:22 2020 +0000 recordmcount: Fix build failure on non arm64 [ Upstream commit 3df14264ad9930733a8166e5bd0eccc1727564bb ] Commit ea0eada45632 leads to the following build failure on powerpc: HOSTCC scripts/recordmcount scripts/recordmcount.c: In function 'arm64_is_fake_mcount': scripts/recordmcount.c:440: error: 'R_AARCH64_CALL26' undeclared (first use in this function) scripts/recordmcount.c:440: error: (Each undeclared identifier is reported only once scripts/recordmcount.c:440: error: for each function it appears in.) make[2]: *** [scripts/recordmcount] Error 1 Make sure R_AARCH64_CALL26 is always defined. Fixes: ea0eada45632 ("recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.") Signed-off-by: Christophe Leroy Acked-by: Steven Rostedt (VMware) Acked-by: Gregory Herrero Cc: Gregory Herrero Link: https://lore.kernel.org/r/5ca1be21fa6ebf73203b45fd9aadd2bafb5e6b15.1597049145.git.christophe.leroy@csgroup.eu Signed-off-by: Catalin Marinas Signed-off-by: Sasha Levin commit 9260a439b91059d9ab6752802f48494792fa1189 Author: Michael S. Tsirkin Date: Mon Aug 10 08:44:43 2020 -0400 vdpa_sim: init iommu lock [ Upstream commit 1e3e792650d2c0df8dd796906275b7c79e278664 ] The patch adding the iommu lock did not initialize it. The struct is zero-initialized so this is mostly a problem when using lockdep. Reported-by: kernel test robot Cc: Max Gurtovoy Fixes: 0ea9ee430e74 ("vdpasim: protect concurrent access to iommu iotlb") Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin commit b42f4550febec776275102da880155f1531a3948 Author: Colin Ian King Date: Thu Aug 6 15:35:34 2020 -0700 Input: sentelic - fix error return when fsp_reg_write fails [ Upstream commit ea38f06e0291986eb93beb6d61fd413607a30ca4 ] Currently when the call to fsp_reg_write fails -EIO is not being returned because the count is being returned instead of the return value in retval. Fix this by returning the value in retval instead of count. Addresses-Coverity: ("Unused value") Fixes: fc69f4a6af49 ("Input: add new driver for Sentelic Finger Sensing Pad") Signed-off-by: Colin Ian King Link: https://lore.kernel.org/r/20200603141218.131663-1-colin.king@canonical.com Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin commit 27770820e9fa1a649e2984b9f0b0c9f2eabac2a5 Author: Andrii Nakryiko Date: Tue Aug 4 17:47:57 2020 -0700 selftests/bpf: Prevent runqslower from racing on building bpftool [ Upstream commit 6bcaf41f9613278cd5897fc80ab93033bda8efaa ] runqslower's Makefile is building/installing bpftool into $(OUTPUT)/sbin/bpftool, which coincides with $(DEFAULT_BPFTOOL). In practice this means that often when building selftests from scratch (after `make clean`), selftests are racing with runqslower to simultaneously build bpftool and one of the two processes fail due to file being busy. Prevent this race by explicitly order-depending on $(BPFTOOL_DEFAULT). Fixes: a2c9652f751e ("selftests: Refactor build to remove tools/lib/bpf from include path") Signed-off-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/20200805004757.2960750-1-andriin@fb.com Signed-off-by: Sasha Levin commit ca77100cb07abada01a5a392444c2a491d4dea90 Author: Pawan Gupta Date: Thu Jul 16 12:23:59 2020 -0700 x86/bugs/multihit: Fix mitigation reporting when VMX is not in use [ Upstream commit f29dfa53cc8ae6ad93bae619bcc0bf45cab344f7 ] On systems that have virtualization disabled or unsupported, sysfs mitigation for X86_BUG_ITLB_MULTIHIT is reported incorrectly as: $ cat /sys/devices/system/cpu/vulnerabilities/itlb_multihit KVM: Vulnerable System is not vulnerable to DoS attack from a rogue guest when virtualization is disabled or unsupported in the hardware. Change the mitigation reporting for these cases. Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Reported-by: Nelson Dsouza Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Pawan Gupta Signed-off-by: Ingo Molnar Reviewed-by: Tony Luck Acked-by: Thomas Gleixner Link: https://lore.kernel.org/r/0ba029932a816179b9d14a30db38f0f11ef1f166.1594925782.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Sasha Levin commit 55f60f18ed6c8bb4d26619d65e892cb64daca4e8 Author: Dilip Kota Date: Mon Aug 3 15:56:36 2020 +0800 x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC [ Upstream commit 7d98585860d845e36ee612832a5ff021f201dbaf ] Frequency descriptor of Lightning Mountain SoC doesn't have all the frequency entries so resulting in the below failure causing a kernel hang: Error MSR_FSB_FREQ index 15 is unknown tsc: Fast TSC calibration failed So, add all the frequency entries in the Lightning Mountain SoC frequency descriptor. Fixes: 0cc5359d8fd45 ("x86/cpu: Update init data for new Airmont CPU model") Fixes: 812c2d7506fd ("x86/tsc_msr: Use named struct initializers") Signed-off-by: Dilip Kota Signed-off-by: Ingo Molnar Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/211c643ae217604b46cbec43a2c0423946dc7d2d.1596440057.git.eswara.kota@linux.intel.com Signed-off-by: Sasha Levin commit 2f3292325816d0cba5988916357751d91a194660 Author: Dan Carpenter Date: Tue Aug 4 13:16:45 2020 +0300 md-cluster: Fix potential error pointer dereference in resize_bitmaps() [ Upstream commit e8abe1de43dac658dacbd04a4543e0c988a8d386 ] The error handling calls md_bitmap_free(bitmap) which checks for NULL but will Oops if we pass an error pointer. Let's set "bitmap" to NULL on this error path. Fixes: afd756286083 ("md-cluster/raid10: resize all the bitmaps before start reshape") Signed-off-by: Dan Carpenter Reviewed-by: Guoqing Jiang Signed-off-by: Song Liu Signed-off-by: Sasha Levin commit b1e0ac52fd4fdc8b16928a564cc8a73be85b5adc Author: Tero Kristo Date: Fri Jul 17 16:29:58 2020 +0300 watchdog: rti-wdt: balance pm runtime enable calls [ Upstream commit d5b29c2c5ba2bd5bbdb5b744659984185d17d079 ] PM runtime should be disabled in the fail path of probe and when the driver is removed. Fixes: 2d63908bdbfb ("watchdog: Add K3 RTI watchdog support") Signed-off-by: Tero Kristo Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20200717132958.14304-5-t-kristo@ti.com Signed-off-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck Signed-off-by: Sasha Levin commit cbc3b48eccd643d1a104fa00d4afad4c80400cdc Author: Krzysztof Sobota Date: Fri Jul 17 12:31:09 2020 +0200 watchdog: initialize device before misc_register [ Upstream commit cb36e29bb0e4b0c33c3d5866a0a4aebace4c99b7 ] When watchdog device is being registered, it calls misc_register that makes watchdog available for systemd to open. This is a data race scenario, because when device is open it may still have device struct not initialized - this in turn causes a crash. This patch moves device initialization before misc_register call and it solves the problem printed below. ------------[ cut here ]------------ WARNING: CPU: 3 PID: 1 at lib/kobject.c:612 kobject_get+0x50/0x54 kobject: '(null)' ((ptrval)): is not initialized, yet kobject_get() is being called. Modules linked in: k2_reset_status(O) davinci_wdt(+) sfn_platform_hwbcn(O) fsmddg_sfn(O) clk_misc_mmap(O) clk_sw_bcn(O) fsp_reset(O) cma_mod(O) slave_sup_notif(O) fpga_master(O) latency(O+) evnotify(O) enable_arm_pmu(O) xge(O) rio_mport_cdev br_netfilter bridge stp llc nvrd_checksum(O) ipv6 CPU: 3 PID: 1 Comm: systemd Tainted: G O 4.19.113-g2579778-fsm4_k2 #1 Hardware name: Keystone [] (unwind_backtrace) from [] (show_stack+0x18/0x1c) [] (show_stack) from [] (dump_stack+0xb4/0xe8) [] (dump_stack) from [] (__warn+0xfc/0x114) [] (__warn) from [] (warn_slowpath_fmt+0x50/0x74) [] (warn_slowpath_fmt) from [] (kobject_get+0x50/0x54) [] (kobject_get) from [] (get_device+0x1c/0x24) [] (get_device) from [] (watchdog_open+0x90/0xf0) [] (watchdog_open) from [] (misc_open+0x130/0x17c) [] (misc_open) from [] (chrdev_open+0xec/0x1a8) [] (chrdev_open) from [] (do_dentry_open+0x204/0x3cc) [] (do_dentry_open) from [] (path_openat+0x330/0x1148) [] (path_openat) from [] (do_filp_open+0x78/0xec) [] (do_filp_open) from [] (do_sys_open+0x130/0x1f4) [] (do_sys_open) from [] (ret_fast_syscall+0x0/0x28) Exception stack(0xd2ceffa8 to 0xd2cefff0) ffa0: b6f69968 00000000 ffffff9c b6ebd210 000a0001 00000000 ffc0: b6f69968 00000000 00000000 00000142 fffffffd ffffffff 00b65530 bed7bb78 ffe0: 00000142 bed7ba70 b6cc2503 b6cc41d6 ---[ end trace 7b16eb105513974f ]--- ------------[ cut here ]------------ WARNING: CPU: 3 PID: 1 at lib/refcount.c:153 kobject_get+0x24/0x54 refcount_t: increment on 0; use-after-free. Modules linked in: k2_reset_status(O) davinci_wdt(+) sfn_platform_hwbcn(O) fsmddg_sfn(O) clk_misc_mmap(O) clk_sw_bcn(O) fsp_reset(O) cma_mod(O) slave_sup_notif(O) fpga_master(O) latency(O+) evnotify(O) enable_arm_pmu(O) xge(O) rio_mport_cdev br_netfilter bridge stp llc nvrd_checksum(O) ipv6 CPU: 3 PID: 1 Comm: systemd Tainted: G W O 4.19.113-g2579778-fsm4_k2 #1 Hardware name: Keystone [] (unwind_backtrace) from [] (show_stack+0x18/0x1c) [] (show_stack) from [] (dump_stack+0xb4/0xe8) [] (dump_stack) from [] (__warn+0xfc/0x114) [] (__warn) from [] (warn_slowpath_fmt+0x50/0x74) [] (warn_slowpath_fmt) from [] (kobject_get+0x24/0x54) [] (kobject_get) from [] (get_device+0x1c/0x24) [] (get_device) from [] (watchdog_open+0x90/0xf0) [] (watchdog_open) from [] (misc_open+0x130/0x17c) [] (misc_open) from [] (chrdev_open+0xec/0x1a8) [] (chrdev_open) from [] (do_dentry_open+0x204/0x3cc) [] (do_dentry_open) from [] (path_openat+0x330/0x1148) [] (path_openat) from [] (do_filp_open+0x78/0xec) [] (do_filp_open) from [] (do_sys_open+0x130/0x1f4) [] (do_sys_open) from [] (ret_fast_syscall+0x0/0x28) Exception stack(0xd2ceffa8 to 0xd2cefff0) ffa0: b6f69968 00000000 ffffff9c b6ebd210 000a0001 00000000 ffc0: b6f69968 00000000 00000000 00000142 fffffffd ffffffff 00b65530 bed7bb78 ffe0: 00000142 bed7ba70 b6cc2503 b6cc41d6 ---[ end trace 7b16eb1055139750 ]--- Fixes: 72139dfa2464 ("watchdog: Fix the race between the release of watchdog_core_data and cdev") Reviewed-by: Guenter Roeck Reviewed-by: Alexander Sverdlin Signed-off-by: Krzysztof Sobota Link: https://lore.kernel.org/r/20200717103109.14660-1-krzysztof.sobota@nokia.com Signed-off-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck Signed-off-by: Sasha Levin commit f0bcc25314ff43f801d709214599884d4b1ac58f Author: Scott Mayhew Date: Sat Aug 1 07:10:39 2020 -0400 nfs: nfs_file_write() should check for writeback errors [ Upstream commit ce368536dd614452407dc31e2449eb84681a06af ] The NFS_CONTEXT_ERROR_WRITE flag (as well as the check of said flag) was removed by commit 6fbda89b257f. The absence of an error check allows writes to be continually queued up for a server that may no longer be able to handle them. Fix it by adding an error check using the generic error reporting functions. Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by: Scott Mayhew Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit 4c6b5e4fff8ba115f96167a9ee6e23350d76bb0b Author: Ewan D. Milne Date: Wed Jul 29 19:10:11 2020 -0400 scsi: lpfc: nvmet: Avoid hang / use-after-free again when destroying targetport [ Upstream commit af6de8c60fe9433afa73cea6fcccdccd98ad3e5e ] We cannot wait on a completion object in the lpfc_nvme_targetport structure in the _destroy_targetport() code path because the NVMe/fc transport will free that structure immediately after the .targetport_delete() callback. This results in a use-after-free, and a crash if slub_debug=FZPU is enabled. An earlier fix put put the completion on the stack, but commit 2a0fb340fcc8 ("scsi: lpfc: Correct localport timeout duration error") subsequently changed the code to reference the completion through a pointer in the object rather than the local stack variable. Fix this by using the stack variable directly. Link: https://lore.kernel.org/r/20200729231011.13240-1-emilne@redhat.com Fixes: 2a0fb340fcc8 ("scsi: lpfc: Correct localport timeout duration error") Reviewed-by: James Smart Signed-off-by: Ewan D. Milne Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit dcce29b71001dab8af7dde8aefeeec7ea0f309fb Author: Stafford Horne Date: Tue Jun 16 06:19:46 2020 +0900 openrisc: Fix oops caused when dumping stack [ Upstream commit 57b8e277c33620e115633cdf700a260b55095460 ] When dumping a stack with 'cat /proc/#/stack' the kernel would oops. For example: # cat /proc/690/stack Unable to handle kernel access at virtual address 0x7fc60f58 Oops#: 0000 CPU #: 0 PC: c00097fc SR: 0000807f SP: d6f09b9c GPR00: 00000000 GPR01: d6f09b9c GPR02: d6f09bb8 GPR03: d6f09bc4 GPR04: 7fc60f5c GPR05: c00099b4 GPR06: 00000000 GPR07: d6f09ba3 GPR08: ffffff00 GPR09: c0009804 GPR10: d6f08000 GPR11: 00000000 GPR12: ffffe000 GPR13: dbb86000 GPR14: 00000001 GPR15: dbb86250 GPR16: 7fc60f63 GPR17: 00000f5c GPR18: d6f09bc4 GPR19: 00000000 GPR20: c00099b4 GPR21: ffffffc0 GPR22: 00000000 GPR23: 00000000 GPR24: 00000001 GPR25: 000002c6 GPR26: d78b6850 GPR27: 00000001 GPR28: 00000000 GPR29: dbb86000 GPR30: ffffffff GPR31: dbb862fc RES: 00000000 oGPR11: ffffffff Process cat (pid: 702, stackpage=d79d6000) Stack: Call trace: [<598977f2>] save_stack_trace_tsk+0x40/0x74 [<95063f0e>] stack_trace_save_tsk+0x44/0x58 [] proc_pid_stack+0xd0/0x13c [] proc_single_show+0x6c/0xf0 [] seq_read+0x1b4/0x688 [<2d6c7480>] do_iter_read+0x208/0x248 [<2182a2fb>] vfs_readv+0x64/0x90 This was caused by the stack trace code in save_stack_trace_tsk using the wrong stack pointer. It was using the user stack pointer instead of the kernel stack pointer. Fix this by using the right stack. Also for good measure we add try_get_task_stack/put_task_stack to ensure the task is not lost while we are walking it's stack. Fixes: eecac38b0423a ("openrisc: support framepointers and STACKTRACE_SUPPORT") Signed-off-by: Stafford Horne Signed-off-by: Sasha Levin commit e7bd1d8efc78044d4e8d4a1d74f627b44bf45203 Author: Jane Chu Date: Mon Aug 3 16:41:39 2020 -0600 libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr [ Upstream commit 7f674025d9f7321dea11b802cc0ab3f09cbe51c5 ] commit 7d988097c546 ("acpi/nfit, libnvdimm/security: Add security DSM overwrite support") adds a sysfs_notify_dirent() to wake up userspace poll thread when the "overwrite" operation has completed. But the notification is issued before the internal dimm security state and flags have been updated, so the userspace poll thread wakes up and fetches the not-yet-updated attr and falls back to sleep, forever. But if user from another terminal issue "ndctl wait-overwrite nmemX" again, the command returns instantly. Link: https://lore.kernel.org/r/1596494499-9852-3-git-send-email-jane.chu@oracle.com Fixes: 7d988097c546 ("acpi/nfit, libnvdimm/security: Add security DSM overwrite support") Cc: Dave Jiang Cc: Dan Williams Reviewed-by: Dave Jiang Signed-off-by: Jane Chu Signed-off-by: Vishal Verma Signed-off-by: Sasha Levin commit b8f7fe67f5f404a06e0330d2536d8d61fb3abbf2 Author: Jane Chu Date: Mon Aug 3 16:41:37 2020 -0600 libnvdimm/security: fix a typo [ Upstream commit dad42d17558f316e9e807698cd4207359b636084 ] commit d78c620a2e82 ("libnvdimm/security: Introduce a 'frozen' attribute") introduced a typo, causing a 'nvdimm->sec.flags' update being overwritten by the subsequent update meant for 'nvdimm->sec.ext_flags'. Link: https://lore.kernel.org/r/1596494499-9852-1-git-send-email-jane.chu@oracle.com Fixes: d78c620a2e82 ("libnvdimm/security: Introduce a 'frozen' attribute") Cc: Dan Williams Reviewed-by: Dave Jiang Signed-off-by: Jane Chu Signed-off-by: Vishal Verma Signed-off-by: Sasha Levin commit 72b983a34b8aeda8fd70a4e4c51f1a4d1277552e Author: Nicolas Saenz Julienne Date: Thu Jul 30 20:26:19 2020 +0200 clk: bcm2835: Do not use prediv with bcm2711's PLLs [ Upstream commit f34e4651ce66a754f41203284acf09b28b9dd955 ] Contrary to previous SoCs, bcm2711 doesn't have a prescaler in the PLL feedback loop. Bypass it by zeroing fb_prediv_mask when running on bcm2711. Note that, since the prediv configuration bits were re-purposed, this was triggering miscalculations on all clocks hanging from the VPU clock, notably the aux UART, making its output unintelligible. Fixes: 42de9ad400af ("clk: bcm2835: Add BCM2711_CLOCK_EMMC2 support") Reported-by: Nathan Chancellor Signed-off-by: Nicolas Saenz Julienne Link: https://lore.kernel.org/r/20200730182619.23246-1-nsaenzjulienne@suse.de Tested-by: Nathan Chancellor Reviewed-by: Florian Fainelli Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin commit 0e5b183b6d6936bf7256cd657f12d1f5beb8049e Author: Zhihao Cheng Date: Tue Jul 7 20:51:40 2020 +0800 ubifs: Fix wrong orphan node deletion in ubifs_jnl_update|rename [ Upstream commit 094b6d1295474f338201b846a1f15e72eb0b12cf ] There a wrong orphan node deleting in error handling path in ubifs_jnl_update() and ubifs_jnl_rename(), which may cause following error msg: UBIFS error (ubi0:0 pid 1522): ubifs_delete_orphan [ubifs]: missing orphan ino 65 Fix this by checking whether the node has been operated for adding to orphan list before being deleted, Signed-off-by: Zhihao Cheng Fixes: 823838a486888cf484e ("ubifs: Add hashes to the tree node cache") Signed-off-by: Richard Weinberger Signed-off-by: Sasha Levin commit cc03f4be0a15e70f56981750a9d84ab7613921ea Author: Scott Mayhew Date: Sat Aug 1 07:10:38 2020 -0400 nfs: ensure correct writeback errors are returned on close() [ Upstream commit 67dd23f9e6fbaf163431912ef5599c5e0693476c ] nfs_wb_all() calls filemap_write_and_wait(), which uses filemap_check_errors() to determine the error to return. filemap_check_errors() only looks at the mapping->flags and will therefore only return either -ENOSPC or -EIO. To ensure that the correct error is returned on close(), nfs{,4}_file_flush() should call filemap_check_wb_err() which looks at the errseq value in mapping->wb_err without consuming it. Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by: Scott Mayhew Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit f7002388d87325a7d43318e91f8dc0f703f39933 Author: Wolfram Sang Date: Sun Jul 26 18:16:06 2020 +0200 i2c: rcar: avoid race when unregistering slave [ Upstream commit c7c9e914f9a0478fba4dc6f227cfd69cf84a4063 ] Due to the lockless design of the driver, it is theoretically possible to access a NULL pointer, if a slave interrupt was running while we were unregistering the slave. To make this rock solid, disable the interrupt for a short time while we are clearing the interrupt_enable register. This patch is purely based on code inspection. The OOPS is super-hard to trigger because clearing SAR (the address) makes interrupts even more unlikely to happen as well. While here, reinit SCR to SDBS because this bit should always be set according to documentation. There is no effect, though, because the interface is disabled. Fixes: 7b814d852af6 ("i2c: rcar: avoid race when unregistering slave client") Signed-off-by: Wolfram Sang Reviewed-by: Niklas Söderlund Signed-off-by: Wolfram Sang Signed-off-by: Sasha Levin commit 706355158ebc04ff7bb6c7e988c95648d20f28e2 Author: Thomas Hebb Date: Sun Jul 26 21:08:14 2020 -0700 tools build feature: Use CC and CXX from parent [ Upstream commit e3232c2f39acafd5a29128425bc30b9884642cfa ] commit c8c188679ccf ("tools build: Use the same CC for feature detection and actual build") changed these assignments from unconditional (:=) to conditional (?=) so that they wouldn't clobber values from the environment. However, conditional assignment does not work properly for variables that Make implicitly sets, among which are CC and CXX. To quote tools/scripts/Makefile.include, which handles this properly: # Makefiles suck: This macro sets a default value of $(2) for the # variable named by $(1), unless the variable has been set by # environment or command line. This is necessary for CC and AR # because make sets default values, so the simpler ?= approach # won't work as expected. In other words, the conditional assignments will not run even if the variables are not overridden in the environment; Make will set CC to "cc" and CXX to "g++" when it starts[1], meaning the variables are not empty by the time the conditional assignments are evaluated. This breaks cross-compilation when CROSS_COMPILE is set but CC isn't, since "cc" gets used for feature detection instead of the cross compiler (and likewise for CXX). To fix the issue, just pass down the values of CC and CXX computed by the parent Makefile, which gets included by the Makefile that actually builds whatever we're detecting features for and so is guaranteed to have good values. This is a better solution anyway, since it means we aren't trying to replicate the logic of the parent build system and so don't risk it getting out of sync. Leave PKG_CONFIG alone, since 1) there's no common logic to compute it in Makefile.include, and 2) it's not an implicit variable, so conditional assignment works properly. [1] https://www.gnu.org/software/make/manual/html_node/Implicit-Variables.html Fixes: c8c188679ccf ("tools build: Use the same CC for feature detection and actual build") Signed-off-by: Thomas Hebb Acked-by: Jiri Olsa Cc: David Carrillo-Cisneros Cc: Ian Rogers Cc: Igor Lubashev Cc: Namhyung Kim Cc: Quentin Monnet Cc: Song Liu Cc: Stephane Eranian Cc: thomas hebb Link: http://lore.kernel.org/lkml/0a6e69d1736b0fa231a648f50b0cce5d8a6734ef.1595822871.git.tommyhebb@gmail.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin commit f46878f9aa5349383ef40989faf74dff1aa4663c Author: Rayagonda Kokatanur Date: Fri Jul 17 21:46:06 2020 -0700 pwm: bcm-iproc: handle clk_get_rate() return [ Upstream commit 6ced5ff0be8e94871ba846dfbddf69d21363f3d7 ] Handle clk_get_rate() returning 0 to avoid possible division by zero. Fixes: daa5abc41c80 ("pwm: Add support for Broadcom iProc PWM controller") Signed-off-by: Rayagonda Kokatanur Signed-off-by: Scott Branden Reviewed-by: Ray Jui Reviewed-by: Uwe Kleine-König Signed-off-by: Thierry Reding Signed-off-by: Sasha Levin commit 49f70356153baa42eaf781466647d5ab2a79837c Author: Qais Yousef Date: Thu Jul 16 12:03:47 2020 +0100 sched/uclamp: Fix a deadlock when enabling uclamp static key [ Upstream commit e65855a52b479f98674998cb23b21ef5a8144b04 ] The following splat was caught when setting uclamp value of a task: BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:49 cpus_read_lock+0x68/0x130 static_key_enable+0x1c/0x38 __sched_setscheduler+0x900/0xad8 Fix by ensuring we enable the key outside of the critical section in __sched_setscheduler() Fixes: 46609ce22703 ("sched/uclamp: Protect uclamp fast path code with static key") Signed-off-by: Qais Yousef Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20200716110347.19553-4-qais.yousef@arm.com Signed-off-by: Sasha Levin commit af3b204d2063ed58137efb69091fb03d196b70bb Author: Sagi Grimberg Date: Wed Jul 22 16:32:19 2020 -0700 nvme: fix deadlock in disconnect during scan_work and/or ana_work [ Upstream commit ecca390e80561debbfdb4dc96bf94595136889fa ] A deadlock happens in the following scenario with multipath: 1) scan_work(nvme0) detects a new nsid while nvme0 is an optimized path to it, path nvme1 happens to be inaccessible. 2) Before scan_work is complete nvme0 disconnect is initiated nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING 3) scan_work(1) attempts to submit IO, but nvme_path_is_optimized() observes nvme0 is not LIVE. Since nvme1 is a possible path IO is requeued and scan_work hangs. -- Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 -- 4) Delete also hangs in flush_work(ctrl->scan_work) from nvme_remove_namespaces(). Similiarly a deadlock with ana_work may happen: if ana_work has started and calls nvme_mpath_set_live and device_add_disk, it will trigger I/O. When we trigger disconnect I/O will block because our accessible (optimized) path is disconnecting, but the alternate path is inaccessible, so I/O blocks. Then disconnect tries to flush the ana_work and hangs. [ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core] [ 605.552087] Call Trace: [ 605.552683] __schedule+0x2b9/0x6c0 [ 605.553507] schedule+0x42/0xb0 [ 605.554201] io_schedule+0x16/0x40 [ 605.555012] do_read_cache_page+0x438/0x830 [ 605.556925] read_cache_page+0x12/0x20 [ 605.557757] read_dev_sector+0x27/0xc0 [ 605.558587] amiga_partition+0x4d/0x4c5 [ 605.561278] check_partition+0x154/0x244 [ 605.562138] rescan_partitions+0xae/0x280 [ 605.563076] __blkdev_get+0x40f/0x560 [ 605.563830] blkdev_get+0x3d/0x140 [ 605.564500] __device_add_disk+0x388/0x480 [ 605.565316] device_add_disk+0x13/0x20 [ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core] [ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core] [ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core] [ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core] [ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core] [ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core] [ 605.573330] process_one_work+0x1db/0x380 [ 605.574144] worker_thread+0x4d/0x400 [ 605.574896] kthread+0x104/0x140 [ 605.577205] ret_from_fork+0x35/0x40 [ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds. [ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830 [ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.582320] nvme D 0 14044 14043 0x00000000 [ 605.583424] Call Trace: [ 605.583935] __schedule+0x2b9/0x6c0 [ 605.584625] schedule+0x42/0xb0 [ 605.585290] schedule_timeout+0x203/0x2f0 [ 605.588493] wait_for_completion+0xb1/0x120 [ 605.590066] __flush_work+0x123/0x1d0 [ 605.591758] __cancel_work_timer+0x10e/0x190 [ 605.593542] cancel_work_sync+0x10/0x20 [ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core] [ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core] [ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core] [ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core] [ 605.598320] dev_attr_store+0x17/0x30 Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will indicate the phase of controller deletion where I/O cannot be allowed to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to be issued to the bottom device, and only after we flush the ana_work and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces) we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work from re-firing by aborting early if we are not LIVE, so we should be safe here. In addition, change the transport drivers to follow the updated state machine. Fixes: 0d0b660f214d ("nvme: add ANA support") Reported-by: Anton Eidelman Signed-off-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin commit 4bcf99f0ee88d9763bde1060edfb5262de9c7573 Author: Xu Wang Date: Mon Jul 13 03:21:43 2020 +0000 clk: clk-atlas6: fix return value check in atlas6_clk_init() [ Upstream commit 12b90b40854a8461a02ef19f6f4474cc88d64b66 ] In case of error, the function clk_register() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Signed-off-by: Xu Wang Link: https://lore.kernel.org/r/20200713032143.21362-1-vulab@iscas.ac.cn Acked-by: Barry Song Fixes: 7bf21bc81f28 ("clk: sirf: re-arch to make the codes support both prima2 and atlas6") Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin commit 09144223f3e05955fac2f06bb7670f401e00451c Author: Konrad Dybcio Date: Sun Jul 26 13:12:05 2020 +0200 clk: qcom: gcc-sdm660: Fix up gcc_mss_mnoc_bimc_axi_clk [ Upstream commit 3386af51d3bcebcba3f7becdb1ef2e384abe90cf ] Add missing halt_check, hwcg_reg and hwcg_bit properties. These were likely omitted when porting the driver upstream. Signed-off-by: Konrad Dybcio Link: https://lore.kernel.org/r/20200726111215.22361-9-konradybcio@gmail.com Fixes: f2a76a2955c0 ("clk: qcom: Add Global Clock controller (GCC) driver for SDM660") Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin commit bfd05afee1d4de3d71d98d69937c4e63509aeb93 Author: Chao Yu Date: Fri Jul 24 18:21:36 2020 +0800 f2fs: compress: fix to update isize when overwriting compressed file [ Upstream commit 944dd22ea4475bd11180fd2f431a4a547ca4d8f5 ] We missed to update isize of compressed file in write_end() with below case: cluster size is 16KB - write 14KB data from offset 0 - overwrite 16KB data from offset 0 Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin commit 5a8e852af62c693fc2072c45db58d58cf9c81da0 Author: Wolfram Sang Date: Mon Jun 29 17:38:07 2020 +0200 i2c: rcar: slave: only send STOP event when we have been addressed [ Upstream commit 314139f9f0abdba61ed9a8463bbcb0bf900ac5a2 ] When the SSR interrupt is activated, it will detect every STOP condition on the bus, not only the ones after we have been addressed. So, enable this interrupt only after we have been addressed, and disable it otherwise. Fixes: de20d1857dd6 ("i2c: rcar: add slave support") Signed-off-by: Wolfram Sang Signed-off-by: Wolfram Sang Signed-off-by: Sasha Levin commit 31f383287c93399c798cd0e707c69ca4df7ea312 Author: Liu Yi L Date: Fri Jul 24 09:49:14 2020 +0800 iommu/vt-d: Enforce PASID devTLB field mask [ Upstream commit 5f77d6ca5ca74e4b4a5e2e010f7ff50c45dea326 ] Set proper masks to avoid invalid input spillover to reserved bits. Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan Signed-off-by: Lu Baolu Reviewed-by: Eric Auger Link: https://lore.kernel.org/r/20200724014925.15523-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin commit 9e31192d17a5db124d56c0234ddbea105381a7d5 Author: Jonathan Marek Date: Thu Jul 9 09:52:33 2020 -0400 clk: qcom: clk-alpha-pll: remove unused/incorrect PLL_CAL_VAL [ Upstream commit c8b9002f44e4a1d2771b2f59f6de900864b1f9d7 ] 0x44 isn't a register offset, it is the value that goes into CAL_L_VAL. Fixes: 548a909597d5 ("clk: qcom: clk-alpha-pll: Add support for Trion PLLs") Signed-off-by: Jonathan Marek Tested-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20200709135251.643-3-jonathan@marek.ca Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin commit 2f1d5595c0e293aa858677a26cc17af7f8da2ae5 Author: Jonathan Marek Date: Thu Jul 9 09:52:32 2020 -0400 clk: qcom: gcc: fix sm8150 GPU and NPU clocks [ Upstream commit 667f39b59b494d96ae70f4217637db2ebbee3df0 ] Fix the parents and set BRANCH_HALT_SKIP. From the downstream driver it should be a 500us delay and not skip, however this matches what was done for other clocks that had 500us delay in downstream. Fixes: f73a4230d5bb ("clk: qcom: gcc: Add GPU and NPU clocks for SM8150") Signed-off-by: Jonathan Marek Tested-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20200709135251.643-2-jonathan@marek.ca Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin commit 75273ceac453e333d66ae6e8f4db3fcd0879945c Author: Colin Ian King Date: Tue Jul 14 20:22:11 2020 +0100 iommu/omap: Check for failure of a call to omap_iommu_dump_ctx [ Upstream commit dee9d154f40c58d02f69acdaa5cfd1eae6ebc28b ] It is possible for the call to omap_iommu_dump_ctx to return a negative error number, so check for the failure and return the error number rather than pass the negative value to simple_read_from_buffer. Fixes: 14e0e6796a0d ("OMAP: iommu: add initial debugfs support") Signed-off-by: Colin Ian King Link: https://lore.kernel.org/r/20200714192211.744776-1-colin.king@canonical.com Addresses-Coverity: ("Improper use of negative value") Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin commit e50f9ec8240ac847f061ae266252c1ebb4f89573 Author: Aneesh Kumar K.V Date: Thu Jul 9 08:59:45 2020 +0530 selftests/powerpc: ptrace-pkey: Don't update expected UAMOR value [ Upstream commit 3563b9bea0ca7f53e4218b5e268550341a49f333 ] With commit 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not change pkey registers") we are not updating UAMOR on key allocation. So don't update the expected uamor value in the test. Fixes: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not change pkey registers") Signed-off-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200709032946.881753-23-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin commit 2a8e6dedf4d29c6a2d3d3e0a5d647f21eb83664c Author: Aneesh Kumar K.V Date: Thu Jul 9 08:59:44 2020 +0530 selftests/powerpc: ptrace-pkey: Update the test to mark an invalid pkey correctly [ Upstream commit 0eaa3b5ca7b5a76e3783639c828498343be66a01 ] Signed-off-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200709032946.881753-22-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin commit 8488b63248aebd490b204a89f211d482039d64c4 Author: Aneesh Kumar K.V Date: Thu Jul 9 08:59:43 2020 +0530 selftests/powerpc: ptrace-pkey: Rename variables to make it easier to follow code [ Upstream commit 9a11f12e0a6c374b3ef1ce81e32ce477d28eb1b8 ] Rename variable to indicate that they are invalid values which we will use to test ptrace update of pkeys. Signed-off-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200709032946.881753-21-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin commit 6cac6020f87526f32a4c685fad2fb52ab0eddfd9 Author: Cristian Ciocaltea Date: Fri Jul 3 20:05:07 2020 +0300 clk: actions: Fix h_clk for Actions S500 SoC [ Upstream commit f47ee279d25fb0e010cae5d6e758e39b40eb6378 ] The h_clk clock in the Actions Semi S500 SoC clock driver has an invalid parent. Replace with the correct one. Fixes: ed6b4795ece4 ("clk: actions: Add clock driver for S500 SoC") Signed-off-by: Cristian Ciocaltea Reviewed-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/c57e7ebabfa970014f073b92fe95b47d3e5a70b1.1593788312.git.cristian.ciocaltea@gmail.com Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin commit f1c3c5b78c3e3f9cd5c9816b0e5518e55a37fd64 Author: Chao Yu Date: Mon Jul 20 16:52:50 2020 +0800 f2fs: compress: fix to avoid memory leak on cc->cpages [ Upstream commit 02772fbfcba8597eef9d5c5f7f94087132d0c1d4 ] Memory allocated for storing compressed pages' poitner should be released after f2fs_write_compressed_pages(), otherwise it will cause memory leak issue. Signed-off-by: Chao Yu Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin commit 36b61bf978b755e46ae8aefbe92cb7903621e938 Author: Tyler Hicks Date: Thu Jul 9 01:19:06 2020 -0500 ima: Fail rule parsing when appraise_flag=blacklist is unsupportable [ Upstream commit 5f3e92657bbfb63ad3109433d843c89996114b03 ] Verifying that a file hash is not blacklisted is currently only supported for files with appended signatures (modsig). In the future, this might change. For now, the "appraise_flag" option is only appropriate for appraise actions and its "blacklist" value is only appropriate when CONFIG_IMA_APPRAISE_MODSIG is enabled and "appraise_flag=blacklist" is only appropriate when "appraise_type=imasig|modsig" is also present. Make this clear at policy load so that IMA policy authors don't assume that other uses of "appraise_flag=blacklist" are supported. Fixes: 273df864cf74 ("ima: Check against blacklisted hashes for files with modsig") Signed-off-by: Tyler Hicks Reivewed-by: Nayna Jain Tested-by: Nayna Jain Signed-off-by: Mimi Zohar Signed-off-by: Sasha Levin commit 4e60d717d60ce1f4007eea9a8f8f68cec6bab24d Author: Ming Lei Date: Fri Jun 19 16:42:14 2020 +0800 dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue() [ Upstream commit e766668c6cd49d741cfb49eaeb38998ba34d27bc ] dm_stop_queue() only uses blk_mq_quiesce_queue() so it doesn't formally stop the blk-mq queue; therefore there is no point making the blk_mq_queue_stopped() check -- it will never be stopped. In addition, even though dm_stop_queue() actually tries to quiesce hw queues via blk_mq_quiesce_queue(), checking with blk_queue_quiesced() to avoid unnecessary queue quiesce isn't reliable because: the QUEUE_FLAG_QUIESCED flag is set before synchronize_rcu() and dm_stop_queue() may be called when synchronize_rcu() from another blk_mq_quiesce_queue() is in-progress. Fixes: 7b17c2f7292ba ("dm: Fix a race condition related to stopping and starting queues") Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin commit 5d010d2fab4adc3ebc9abf9c582345dacbaf052f Author: Steve Longerbeam Date: Thu Jun 25 11:13:37 2020 -0700 gpu: ipu-v3: image-convert: Wait for all EOFs before completing a tile [ Upstream commit dd81d821d0b3f77d949d0cac5c05c1f05b921d46 ] Use a bit-mask of EOF irqs to determine when all required idmac channel EOFs have been received for a tile conversion, and only do tile completion processing after all EOFs have been received. Otherwise it was found that a conversion would stall after the completion of a tile and the start of the next tile, because the input/read idmac channel had not completed and entered idle state, thus locking up the channel when attempting to re-start it for the next tile. Fixes: 0537db801bb01 ("gpu: ipu-v3: image-convert: reconfigure IC per tile") Signed-off-by: Steve Longerbeam Signed-off-by: Philipp Zabel Signed-off-by: Sasha Levin commit 0399740536803d9ee2cdd681fe249a89f857fd05 Author: Steve Longerbeam Date: Wed Jun 17 15:40:37 2020 -0700 gpu: ipu-v3: image-convert: Combine rotate/no-rotate irq handlers [ Upstream commit 0f6245f42ce9b7e4d20f2cda8d5f12b55a44d7d1 ] Combine the rotate_irq() and norotate_irq() handlers into a single eof_irq() handler. Signed-off-by: Steve Longerbeam Signed-off-by: Philipp Zabel Signed-off-by: Sasha Levin commit 0c46592fe69c195d1cce4834c7449ce3e7d6efdd Author: Herbert Xu Date: Thu Jul 16 21:45:03 2020 +1000 crypto: caam - Remove broken arc4 support [ Upstream commit eeedb618378f8a09779546a3eeac16b000447d62 ] The arc4 algorithm requires storing state in the request context in order to allow more than one encrypt/decrypt operation. As this driver does not seem to do that, it means that using it for more than one operation is broken. Fixes: eaed71a44ad9 ("crypto: caam - add ecb(*) support") Link: https://lore.kernel.org/linux-crypto/CAMj1kXGvMe_A_iQ43Pmygg9xaAM-RLy=_M=v+eg--8xNmv9P+w@mail.gmail.com Link: https://lore.kernel.org/linux-crypto/20200702101947.682-1-ardb@kernel.org Signed-off-by: Herbert Xu Acked-by: Ard Biesheuvel Acked-by: Horia Geantă Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin commit 09a4844eca9e7a4d12a73b77c042b1c972b228c2 Author: Sudeep Holla Date: Tue Jul 14 13:45:56 2020 +0100 rtc: pl031: fix set_alarm by adding back call to alarm_irq_enable [ Upstream commit 4df2ef85f0efe44505f511ca5e4455585f53a2da ] Commit c8ff5841a90b ("rtc: pl031: switch to rtc_time64_to_tm/rtc_tm_to_time64") seemed to have accidentally removed the call to pl031_alarm_irq_enable from pl031_set_alarm while switching to 64-bit apis. Let us add back the same to get the set alarm functionality back. Fixes: c8ff5841a90b ("rtc: pl031: switch to rtc_time64_to_tm/rtc_tm_to_time64") Signed-off-by: Sudeep Holla Signed-off-by: Alexandre Belloni Tested-by: Valentin Schneider Cc: Linus Walleij Cc: Alexandre Belloni Link: https://lore.kernel.org/r/20200714124556.20294-1-sudeep.holla@arm.com Signed-off-by: Sasha Levin commit 6616b75394ac63630cd2b61051685202868c417f Author: Yan-Hsuan Chuang Date: Fri Jun 5 15:47:03 2020 +0800 rtw88: pci: disable aspm for platform inter-op with module parameter [ Upstream commit 68aa716b7dd36f55e080da9e27bc594346334c41 ] Some platforms cannot read the DBI register successfully for the ASPM settings. After the read failed, the bus could be unstable, and the device just became unavailable [1]. For those platforms, the ASPM should be disabled. But as the ASPM can help the driver to save the power consumption in power save mode, the ASPM is still needed. So, add a module parameter for them to disable it, then the device can still work, while others can benefit from the less power consumption that brings by ASPM enabled. [1] https://bugzilla.kernel.org/show_bug.cgi?id=206411 [2] Note that my lenovo T430 is the same. Fixes: 3dff7c6e3749 ("rtw88: allows to enable/disable HCI link PS mechanism") Signed-off-by: Yan-Hsuan Chuang Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20200605074703.32726-1-yhchuang@realtek.com Signed-off-by: Sasha Levin commit 691f4dbe891c8762885de41b1e69fa68cc94031f Author: Yoshihiro Shimoda Date: Thu May 21 16:01:05 2020 +0900 mmc: renesas_sdhi_internal_dmac: clean up the code for dma complete [ Upstream commit 2b26e34e9af3fa24fa1266e9ea2d66a1f7d62dc0 ] To add end() operation in the future, clean the code of renesas_sdhi_internal_dmac_complete_tasklet_fn(). No behavior change. Signed-off-by: Yoshihiro Shimoda Link: https://lore.kernel.org/r/1590044466-28372-3-git-send-email-yoshihiro.shimoda.uh@renesas.com Tested-by: Wolfram Sang Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin commit 55e965979f708856763ef897ed1d2fc848003eb1 Author: Mark Zhang Date: Thu Jul 2 11:29:33 2020 +0300 RDMA/counter: Allow manually bind QPs with different pids to same counter [ Upstream commit cbeb7d896c0f296451ffa7b67e7706786b8364c8 ] In manual mode allow bind user QPs with different pids to same counter, since this is allowed in auto mode. Bind kernel QPs and user QPs to the same counter are not allowed. Fixes: 1bd8e0a9d0fd ("RDMA/counter: Allow manual mode configuration support") Link: https://lore.kernel.org/r/20200702082933.424537-4-leon@kernel.org Signed-off-by: Mark Zhang Reviewed-by: Maor Gottlieb Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin commit abda874cc136a98ca66ce7f3d155a03281922263 Author: Mark Zhang Date: Thu Jul 2 11:29:32 2020 +0300 RDMA/counter: Only bind user QPs in auto mode [ Upstream commit c9f557421e505f75da4234a6af8eff46bc08614b ] In auto mode only bind user QPs to a dynamic counter, since this feature is mainly used for system statistic and diagnostic purpose, while there's no need to counter kernel QPs so far. Fixes: 99fa331dc862 ("RDMA/counter: Add "auto" configuration mode support") Link: https://lore.kernel.org/r/20200702082933.424537-3-leon@kernel.org Signed-off-by: Mark Zhang Reviewed-by: Maor Gottlieb Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin commit ceb3dc00c8c378ec4ea82c7fc45a69d2093e06f1 Author: Vladimir Oltean Date: Mon Jun 1 12:58:26 2020 +0300 devres: keep both device name and resource name in pretty name [ Upstream commit 35bd8c07db2ce8fd2834ef866240613a4ef982e7 ] Sometimes debugging a device is easiest using devmem on its register map, and that can be seen with /proc/iomem. But some device drivers have many memory regions. Take for example a networking switch. Its memory map used to look like this in /proc/iomem: 1fc000000-1fc3fffff : pcie@1f0000000 1fc000000-1fc3fffff : 0000:00:00.5 1fc010000-1fc01ffff : sys 1fc030000-1fc03ffff : rew 1fc060000-1fc0603ff : s2 1fc070000-1fc0701ff : devcpu_gcb 1fc080000-1fc0800ff : qs 1fc090000-1fc0900cb : ptp 1fc100000-1fc10ffff : port0 1fc110000-1fc11ffff : port1 1fc120000-1fc12ffff : port2 1fc130000-1fc13ffff : port3 1fc140000-1fc14ffff : port4 1fc150000-1fc15ffff : port5 1fc200000-1fc21ffff : qsys 1fc280000-1fc28ffff : ana But after the patch in Fixes: was applied, the information is now presented in a much more opaque way: 1fc000000-1fc3fffff : pcie@1f0000000 1fc000000-1fc3fffff : 0000:00:00.5 1fc010000-1fc01ffff : 0000:00:00.5 1fc030000-1fc03ffff : 0000:00:00.5 1fc060000-1fc0603ff : 0000:00:00.5 1fc070000-1fc0701ff : 0000:00:00.5 1fc080000-1fc0800ff : 0000:00:00.5 1fc090000-1fc0900cb : 0000:00:00.5 1fc100000-1fc10ffff : 0000:00:00.5 1fc110000-1fc11ffff : 0000:00:00.5 1fc120000-1fc12ffff : 0000:00:00.5 1fc130000-1fc13ffff : 0000:00:00.5 1fc140000-1fc14ffff : 0000:00:00.5 1fc150000-1fc15ffff : 0000:00:00.5 1fc200000-1fc21ffff : 0000:00:00.5 1fc280000-1fc28ffff : 0000:00:00.5 That patch made a fair comment that /proc/iomem might be confusing when it shows resources without an associated device, but we can do better than just hide the resource name altogether. Namely, we can print the device name _and_ the resource name. Like this: 1fc000000-1fc3fffff : pcie@1f0000000 1fc000000-1fc3fffff : 0000:00:00.5 1fc010000-1fc01ffff : 0000:00:00.5 sys 1fc030000-1fc03ffff : 0000:00:00.5 rew 1fc060000-1fc0603ff : 0000:00:00.5 s2 1fc070000-1fc0701ff : 0000:00:00.5 devcpu_gcb 1fc080000-1fc0800ff : 0000:00:00.5 qs 1fc090000-1fc0900cb : 0000:00:00.5 ptp 1fc100000-1fc10ffff : 0000:00:00.5 port0 1fc110000-1fc11ffff : 0000:00:00.5 port1 1fc120000-1fc12ffff : 0000:00:00.5 port2 1fc130000-1fc13ffff : 0000:00:00.5 port3 1fc140000-1fc14ffff : 0000:00:00.5 port4 1fc150000-1fc15ffff : 0000:00:00.5 port5 1fc200000-1fc21ffff : 0000:00:00.5 qsys 1fc280000-1fc28ffff : 0000:00:00.5 ana Fixes: 8d84b18f5678 ("devres: always use dev_name() in devm_ioremap_resource()") Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20200601095826.1757621-1-olteanv@gmail.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin commit 01c963daaea71791e3ea6a0b1ee594f5f078f528 Author: Herbert Xu Date: Thu Jul 2 13:32:21 2020 +1000 crypto: af_alg - Fix regression on empty requests [ Upstream commit 662bb52f50bca16a74fe92b487a14d7dccb85e1a ] Some user-space programs rely on crypto requests that have no control metadata. This broke when a check was added to require the presence of control metadata with the ctx->init flag. This patch fixes the regression by setting ctx->init as long as one sendmsg(2) has been made, with or without a control message. Reported-by: Sachin Sant Reported-by: Naresh Kamboju Fixes: f3c802a1f300 ("crypto: algif_aead - Only wake up when...") Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin commit ef73e8f85594ed034ee714b319904219b60c94b8 Author: Johan Hovold Date: Wed Jul 8 14:49:52 2020 +0200 USB: serial: ftdi_sio: clean up receive processing [ Upstream commit ce054039ba5e47b75a3be02a00274e52b06a6456 ] Clean up receive processing by dropping the character pointer and keeping the length argument unchanged throughout the function. Also make it more apparent that sysrq processing can consume a characters by adding an explicit continue. Reviewed-by: Greg Kroah-Hartman Signed-off-by: Johan Hovold Signed-off-by: Sasha Levin commit 37c24e3f601aef1fb8593d53a5d8ec913e066475 Author: Johan Hovold Date: Wed Jul 8 14:49:51 2020 +0200 USB: serial: ftdi_sio: make process-packet buffer unsigned [ Upstream commit ab4cc4ef6724ea588e835fc1e764c4b4407a70b7 ] Use an unsigned type for the process-packet buffer argument and give it a more apt name. Reviewed-by: Greg Kroah-Hartman Signed-off-by: Johan Hovold Signed-off-by: Sasha Levin commit 8b6aebd9aff8ef36430fad4f98ec1d64ea12b63f Author: Jesper Dangaard Brouer Date: Tue Jul 7 09:12:25 2020 +0200 selftests/bpf: test_progs avoid minus shell exit codes [ Upstream commit b8c50df0cb3eb9008f8372e4ff0317eee993b8d1 ] There are a number of places in test_progs that use minus-1 as the argument to exit(). This is confusing as a process exit status is masked to be a number between 0 and 255 as defined in man exit(3). Thus, users will see status 255 instead of minus-1. This patch use positive exit code 3 instead of minus-1. These cases are put in the same group of infrastructure setup errors. Fixes: fd27b1835e70 ("selftests/bpf: Reset process and thread affinity after each test/sub-test") Fixes: 811d7e375d08 ("bpf: selftests: Restore netns after each test") Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Daniel Borkmann Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/159410594499.1093222.11080787853132708654.stgit@firesoul Signed-off-by: Sasha Levin commit 44e8963f50ebf6472dad487f6ae352a0159fef0d Author: Jesper Dangaard Brouer Date: Tue Jul 7 09:12:19 2020 +0200 selftests/bpf: test_progs use another shell exit on non-actions [ Upstream commit 3220fb667842a9725cbb71656f406eadb03c094b ] This is a follow up adjustment to commit 6c92bd5cd465 ("selftests/bpf: Test_progs indicate to shell on non-actions"), that returns shell exit indication EXIT_FAILURE (value 1) when user selects a non-existing test. The problem with using EXIT_FAILURE is that a shell script cannot tell the difference between a non-existing test and the test failing. This patch uses value 2 as shell exit indication. (Aside note unrecognized option parameters use value 64). Fixes: 6c92bd5cd465 ("selftests/bpf: Test_progs indicate to shell on non-actions") Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Daniel Borkmann Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/159410593992.1093222.90072558386094370.stgit@firesoul Signed-off-by: Sasha Levin commit 56129e44124f913c6f97b28977bcb447ea36bfdb Author: Martin KaFai Lau Date: Wed Jul 1 17:48:58 2020 -0700 bpf: selftests: Restore netns after each test [ Upstream commit 811d7e375d08312dba23f3b6bf7e58ec14aa5dcb ] It is common for networking tests creating its netns and making its own setting under this new netns (e.g. changing tcp sysctl). If the test forgot to restore to the original netns, it would affect the result of other tests. This patch saves the original netns at the beginning and then restores it after every test. Since the restore "setns()" is not expensive, it does it on all tests without tracking if a test has created a new netns or not. The new restore_netns() could also be done in test__end_subtest() such that each subtest will get an automatic netns reset. However, the individual test would lose flexibility to have total control on netns for its own subtests. In some cases, forcing a test to do unnecessary netns re-configure for each subtest is time consuming. e.g. In my vm, forcing netns re-configure on each subtest in sk_assign.c increased the runtime from 1s to 8s. On top of that, test_progs.c is also doing per-test (instead of per-subtest) cleanup for cgroup. Thus, this patch also does per-test restore_netns(). The only existing per-subtest cleanup is reset_affinity() and no test is depending on this. Thus, it is removed from test__end_subtest() to give a consistent expectation to the individual tests. test_progs.c only ensures any affinity/netns/cgroup change made by an earlier test does not affect the following tests. Signed-off-by: Martin KaFai Lau Signed-off-by: Daniel Borkmann Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20200702004858.2103728-1-kafai@fb.com Signed-off-by: Sasha Levin commit fe4d303cd63d3a93aaa698c9200acec91c32b69b Author: Jesper Dangaard Brouer Date: Wed Jul 1 23:44:07 2020 +0200 selftests/bpf: Test_progs indicate to shell on non-actions [ Upstream commit 6c92bd5cd4650c39dd929565ee172984c680fead ] When a user selects a non-existing test the summary is printed with indication 0 for all info types, and shell "success" (EXIT_SUCCESS) is indicated. This can be understood by a human end-user, but for shell scripting is it useful to indicate a shell failure (EXIT_FAILURE). Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Alexei Starovoitov Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/159363984736.930467.17956007131403952343.stgit@firesoul Signed-off-by: Sasha Levin commit 786910682009b5ea19c7429548187bf78827a14a Author: Qais Yousef Date: Tue Jun 30 12:21:23 2020 +0100 sched/uclamp: Protect uclamp fast path code with static key [ Upstream commit 46609ce227039fd192e0ecc7d940bed587fd2c78 ] There is a report that when uclamp is enabled, a netperf UDP test regresses compared to a kernel compiled without uclamp. https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/ While investigating the root cause, there were no sign that the uclamp code is doing anything particularly expensive but could suffer from bad cache behavior under certain circumstances that are yet to be understood. https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/ To reduce the pressure on the fast path anyway, add a static key that is by default will skip executing uclamp logic in the enqueue/dequeue_task() fast path until it's needed. As soon as the user start using util clamp by: 1. Changing uclamp value of a task with sched_setattr() 2. Modifying the default sysctl_sched_util_clamp_{min, max} 3. Modifying the default cpu.uclamp.{min, max} value in cgroup We flip the static key now that the user has opted to use util clamp. Effectively re-introducing uclamp logic in the enqueue/dequeue_task() fast path. It stays on from that point forward until the next reboot. This should help minimize the effect of util clamp on workloads that don't need it but still allow distros to ship their kernels with uclamp compiled in by default. SCHED_WARN_ON() in uclamp_rq_dec_id() was removed since now we can end up with unbalanced call to uclamp_rq_dec_id() if we flip the key while a task is running in the rq. Since we know it is harmless we just quietly return if we attempt a uclamp_rq_dec_id() when rq->uclamp[].bucket[].tasks is 0. In schedutil, we introduce a new uclamp_is_enabled() helper which takes the static key into account to ensure RT boosting behavior is retained. The following results demonstrates how this helps on 2 Sockets Xeon E5 2x10-Cores system. nouclamp uclamp uclamp-static-key Hmean send-64 162.43 ( 0.00%) 157.84 * -2.82%* 163.39 * 0.59%* Hmean send-128 324.71 ( 0.00%) 314.78 * -3.06%* 326.18 * 0.45%* Hmean send-256 641.55 ( 0.00%) 628.67 * -2.01%* 648.12 * 1.02%* Hmean send-1024 2525.28 ( 0.00%) 2448.26 * -3.05%* 2543.73 * 0.73%* Hmean send-2048 4836.14 ( 0.00%) 4712.08 * -2.57%* 4867.69 * 0.65%* Hmean send-3312 7540.83 ( 0.00%) 7425.45 * -1.53%* 7621.06 * 1.06%* Hmean send-4096 9124.53 ( 0.00%) 8948.82 * -1.93%* 9276.25 * 1.66%* Hmean send-8192 15589.67 ( 0.00%) 15486.35 * -0.66%* 15819.98 * 1.48%* Hmean send-16384 26386.47 ( 0.00%) 25752.25 * -2.40%* 26773.74 * 1.47%* The perf diff between nouclamp and uclamp-static-key when uclamp is disabled in the fast path: 8.73% -1.55% [kernel.kallsyms] [k] try_to_wake_up 0.07% +0.04% [kernel.kallsyms] [k] deactivate_task 0.13% -0.02% [kernel.kallsyms] [k] activate_task The diff between nouclamp and uclamp-static-key when uclamp is enabled in the fast path: 8.73% -0.72% [kernel.kallsyms] [k] try_to_wake_up 0.13% +0.39% [kernel.kallsyms] [k] activate_task 0.07% +0.38% [kernel.kallsyms] [k] deactivate_task Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting") Reported-by: Mel Gorman Signed-off-by: Qais Yousef Signed-off-by: Peter Zijlstra (Intel) Tested-by: Lukasz Luba Link: https://lkml.kernel.org/r/20200630112123.12076-3-qais.yousef@arm.com Signed-off-by: Sasha Levin commit 41cc370aa1981b7bbb19b7dfde69a945ce8d770e Author: Yishai Hadas Date: Tue Jun 30 12:39:11 2020 +0300 IB/uverbs: Set IOVA on IB MR in uverbs layer [ Upstream commit 04c0a5fcfcf65aade2fb238b6336445f1a99b646 ] Set IOVA on IB MR in uverbs layer to let all drivers have it, this includes both reg/rereg MR flows. As part of this change cleaned-up this setting from the drivers that already did it by themselves in their user flows. Fixes: e6f0330106f4 ("mlx4_ib: set user mr attributes in struct ib_mr") Link: https://lore.kernel.org/r/20200630093916.332097-3-leon@kernel.org Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin commit fcb5d11dd87fe5559e2428e362112e4e462f9216 Author: Paul Kocialkowski Date: Thu Apr 30 18:42:45 2020 +0200 media: rockchip: rga: Only set output CSC mode for RGB input [ Upstream commit 0f879bab72f47e8ba2421a984e7acfa763d3e84e ] Setting the output CSC mode is required for a YUV output, but must not be set when the input is also YUV. Doing this (as tested with a YUV420P to YUV420P conversion) results in wrong colors. Adapt the logic to only set the output CSC mode when the output is YUV and the input is RGB. Also add a comment to clarify the rationale. Fixes: f7e7b48e6d79 ("[media] rockchip/rga: v4l2 m2m support") Signed-off-by: Paul Kocialkowski Reviewed-by: Ezequiel Garcia Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin commit f44d97c58c2c85f24ab7ab996101040a13315f10 Author: Paul Kocialkowski Date: Thu Apr 30 18:42:44 2020 +0200 media: rockchip: rga: Introduce color fmt macros and refactor CSC mode logic [ Upstream commit ded874ece29d3fe2abd3775810a06056067eb68c ] This introduces two macros: RGA_COLOR_FMT_IS_YUV and RGA_COLOR_FMT_IS_RGB which allow quick checking of the colorspace familily of a RGA color format. These macros are then used to refactor the logic for CSC mode selection. The two nested tests for input colorspace are simplified into a single one, with a logical and, making the whole more readable. Signed-off-by: Paul Kocialkowski Reviewed-by: Ezequiel Garcia Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin commit 7f1434f265219a9c218cea66decefdeaf996004c Author: Dafna Hirschfeld Date: Thu Jun 18 13:35:16 2020 +0200 media: staging: rkisp1: remove macro RKISP1_DIR_SINK_SRC [ Upstream commit b861d139a36a4593498932bfec957bdcc7d98eb3 ] The macro RKISP1_DIR_SINK_SRC is a mask of two flags. The macro hides the fact that it's a mask and the code is actually more clear if we replace it the with bitwise-or explicitly. Signed-off-by: Dafna Hirschfeld Acked-by: Helen Koike Reviewed-by: Tomasz Figa Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin commit 44cc21f6ad922874249b73fbe91d739c4683d605 Author: Sebastian Reichel Date: Mon Jun 29 13:41:23 2020 +0200 rtc: cpcap: fix range [ Upstream commit 3180cfabf6fbf982ca6d1a6eb56334647cc1416b ] Unbreak CPCAP driver, which has one more bit in the day counter increasing the max. range from 2014 to 2058. The original commit introducing the range limit was obviously wrong, since the driver has only been written in 2017 (3 years after 14 bits would have run out). Fixes: d2377f8cc5a7 ("rtc: cpcap: set range") Reported-by: Sicelo A. Mhlongo Reported-by: Dev Null Signed-off-by: Sebastian Reichel Signed-off-by: Alexandre Belloni Tested-by: Merlijn Wajer Acked-by: Tony Lindgren Acked-by: Merlijn Wajer Link: https://lore.kernel.org/r/20200629114123.27956-1-sebastian.reichel@collabora.com Signed-off-by: Sasha Levin commit 8a9f5c1541a17120b7e8c91eba5f2ffc95f0f894 Author: Jason Gunthorpe Date: Thu Jun 25 20:42:19 2020 +0300 RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah() [ Upstream commit 65936bf25f90fe440bb2d11624c7d10fab266639 ] ipoib_mcast_carrier_on_task() insanely open codes a rtnl_lock() such that the only time flush_workqueue() can be called is if it also clears IPOIB_FLAG_OPER_UP. Thus the flush inside ipoib_flush_ah() will deadlock if it gets unlucky enough, and lockdep doesn't help us to find it early: CPU0 CPU1 CPU2 __ipoib_ib_dev_flush() down_read(vlan_rwsem) ipoib_vlan_add() rtnl_trylock() down_write(vlan_rwsem) ipoib_mcast_carrier_on_task() while (!rtnl_trylock()) msleep(20); ipoib_flush_ah() flush_workqueue(priv->wq) Clean up the ah_reaper related functions and lifecycle to make sense: - Start/Stop of the reaper should only be done in open/stop NDOs, not in any other places - cancel and flush of the reaper should only happen in the stop NDO. cancel is only functional when combined with IPOIB_STOP_REAPER. - Non-stop places were flushing the AH's just need to flush out dead AH's synchronously and ignore the background task completely. It is fully locked and harmless to leave running. Which ultimately fixes the ABBA deadlock by removing the unnecessary flush_workqueue() from the problematic place under the vlan_rwsem. Fixes: efc82eeeae4e ("IB/ipoib: No longer use flush as a parameter") Link: https://lore.kernel.org/r/20200625174219.290842-1-kamalheib1@gmail.com Reported-by: Kamal Heib Tested-by: Kamal Heib Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin commit 36d1628c8ddad69aafc227d613d1e8fae382ceb1 Author: Kamal Heib Date: Tue Jun 23 13:52:36 2020 +0300 RDMA/ipoib: Return void from ipoib_ib_dev_stop() [ Upstream commit 95a5631f6c9f3045f26245e6045244652204dfdb ] The return value from ipoib_ib_dev_stop() is always 0 - change it to be void. Link: https://lore.kernel.org/r/20200623105236.18683-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin commit 538576a05cf8d0da16ea2bf194e997891adf82ef Author: Qiushi Wu Date: Fri May 22 22:16:08 2020 -0500 platform/chrome: cros_ec_ishtp: Fix a double-unlock issue [ Upstream commit aaa3cbbac326c95308e315f1ab964a3369c4d07d ] In function cros_ec_ishtp_probe(), "up_write" is already called before function "cros_ec_dev_init". But "up_write" will be called again after the calling of the function "cros_ec_dev_init" failed. Thus add a call of the function “down_write” in this if branch for the completion of the exception handling. Fixes: 26a14267aff2 ("platform/chrome: Add ChromeOS EC ISHTP driver") Signed-off-by: Qiushi Wu Tested-by: Mathew King Signed-off-by: Enric Balletbo i Serra Signed-off-by: Sasha Levin commit 76629a6d86a7956279d38eb3686027d3a89eaabf Author: Kamal Dasu Date: Fri Jun 12 17:29:02 2020 -0400 mtd: rawnand: brcmnand: ECC error handling on EDU transfers [ Upstream commit 4551e78ad98add1f16b70cf286d5aad3ce7bcd4c ] Implement ECC correctable and uncorrectable error handling for EDU reads. If ECC correctable bitflips are encountered on EDU transfer, read page again using PIO. This is needed due to a NAND controller limitation where corrected data is not transferred to the DMA buffer on ECC error. This applies to ECC correctable errors that are reported by the controller hardware based on set number of bitflips threshold in the controller threshold register, bitflips below the threshold are corrected silently and are not reported by the controller hardware. Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers") Signed-off-by: Kamal Dasu Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20200612212902.21347-3-kdasu.kdev@gmail.com Signed-off-by: Sasha Levin commit 0b36d8e1db13f4ac1ef6ea057ae43fa88160c725 Author: Boris Brezillon Date: Wed Jun 3 15:49:13 2020 +0200 mtd: rawnand: fsl_upm: Remove unused mtd var [ Upstream commit ccc49eff77bee2885447a032948959a134029fe3 ] The mtd var in fun_wait_rnb() is now unused, let's get rid of it and fix the warning resulting from this unused var. Fixes: 50a487e7719c ("mtd: rawnand: Pass a nand_chip object to chip->dev_ready()") Signed-off-by: Boris Brezillon Reviewed-by: Miquel Raynal Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20200603134922.1352340-2-boris.brezillon@collabora.com Signed-off-by: Sasha Levin commit 6469e715bc576eda0ea65fd5dcdbe2c31518d76c Author: Eric Dumazet Date: Wed Jun 17 20:53:21 2020 -0700 octeontx2-af: change (struct qmem)->entry_sz from u8 to u16 [ Upstream commit 393415203f5c916b5907e0a7c89f4c2c5a9c5505 ] We need to increase TSO_HEADER_SIZE from 128 to 256. Since otx2_sq_init() calls qmem_alloc() with TSO_HEADER_SIZE, we need to change (struct qmem)->entry_sz to avoid truncation to 0. Fixes: 7a37245ef23f ("octeontx2-af: NPA block admin queue init") Signed-off-by: Eric Dumazet Cc: Sunil Goutham Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit cbcad0a477f474086a89258080841dd0b88bc367 Author: Charles Keepax Date: Mon Jun 15 14:53:21 2020 +0100 mfd: arizona: Ensure 32k clock is put on driver unbind and error [ Upstream commit ddff6c45b21d0437ce0c85f8ac35d7b5480513d7 ] Whilst it doesn't matter if the internal 32k clock register settings are cleaned up on exit, as the part will be turned off losing any settings, hence the driver hasn't historially bothered. The external clock should however be cleaned up, as it could cause clocks to be left on, and will at best generate a warning on unbind. Add clean up on both the probe error path and unbind for the 32k clock. Fixes: cdd8da8cc66b ("mfd: arizona: Add gating of external MCLKn clocks") Signed-off-by: Charles Keepax Signed-off-by: Lee Jones Signed-off-by: Sasha Levin commit e26cdc36de7b1bb0c75e9e46df2fb55c2b24e713 Author: Herbert Xu Date: Sat May 30 00:23:49 2020 +1000 crypto: algif_aead - Only wake up when ctx->more is zero [ Upstream commit f3c802a1f30013f8f723b62d7fa49eb9e991da23 ] AEAD does not support partial requests so we must not wake up while ctx->more is set. In order to distinguish between the case of no data sent yet and a zero-length request, a new init flag has been added to ctx. SKCIPHER has also been modified to ensure that at least a block of data is available if there is more data to come. Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of...") Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin commit 9f0a3ced6e25c125a00677845b86bed164c02434 Author: Paul Cercueil Date: Thu Jul 16 18:38:35 2020 +0200 drm/ingenic: Fix incorrect assumption about plane->index commit ca43f274e03f91c533643299ae4984965ce03205 upstream. plane->index is NOT the index of the color plane in a YUV frame. Actually, a YUV frame is represented by a single drm_plane, even though it contains three Y, U, V planes. v2-v3: No change Cc: stable@vger.kernel.org # v5.3 Fixes: 90b86fcc47b4 ("DRM: Add KMS driver for the Ingenic JZ47xx SoCs") Signed-off-by: Paul Cercueil Reviewed-by: Sam Ravnborg Link: https://patchwork.freedesktop.org/patch/msgid/20200716163846.174790-1-paul@crapouillou.net Signed-off-by: Greg Kroah-Hartman commit 6a9e066ba31bebb481180295c1f87a576f987938 Author: Liu Ying Date: Thu Jul 9 10:28:52 2020 +0800 drm/imx: imx-ldb: Disable both channels for split mode in enc->disable() commit 3b2a999582c467d1883716b37ffcc00178a13713 upstream. Both of the two LVDS channels should be disabled for split mode in the encoder's ->disable() callback, because they are enabled in the encoder's ->enable() callback. Fixes: 6556f7f82b9c ("drm: imx: Move imx-drm driver out of staging") Cc: Philipp Zabel Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: NXP Linux Team Cc: Signed-off-by: Liu Ying Signed-off-by: Philipp Zabel Signed-off-by: Greg Kroah-Hartman commit bbe21b6e6a7551157adc0dd7a3bdee08e549163a Author: Sibi Sankar Date: Thu Jul 23 01:40:46 2020 +0530 remoteproc: qcom_q6v5_mss: Validate modem blob firmware size before load commit 135b9e8d1cd8ba5ac9ad9bcf24b464b7b052e5b8 upstream. The following mem abort is observed when one of the modem blob firmware size exceeds the allocated mpss region. Fix this by restricting the copy size to segment size using request_firmware_into_buf before load. Err Logs: Unable to handle kernel paging request at virtual address Mem abort info: ... Call trace: __memcpy+0x110/0x180 rproc_start+0xd0/0x190 rproc_boot+0x404/0x550 state_store+0x54/0xf8 dev_attr_store+0x44/0x60 sysfs_kf_write+0x58/0x80 kernfs_fop_write+0x140/0x230 vfs_write+0xc4/0x208 ksys_write+0x74/0xf8 ... Reviewed-by: Bjorn Andersson Fixes: 051fb70fd4ea4 ("remoteproc: qcom: Driver for the self-authenticating Hexagon v5") Cc: stable@vger.kernel.org Signed-off-by: Sibi Sankar Link: https://lore.kernel.org/r/20200722201047.12975-3-sibis@codeaurora.org Signed-off-by: Bjorn Andersson Signed-off-by: Greg Kroah-Hartman commit af61da7510fb4e69970b6a2d1e02e4edc4154537 Author: Sibi Sankar Date: Thu Jul 23 01:40:45 2020 +0530 remoteproc: qcom_q6v5_mss: Validate MBA firmware size before load commit e013f455d95add874f310dc47c608e8c70692ae5 upstream. The following mem abort is observed when the mba firmware size exceeds the allocated mba region. MBA firmware size is restricted to a maximum size of 1M and remaining memory region is used by modem debug policy firmware when available. Hence verify whether the MBA firmware size lies within the allocated memory region and is not greater than 1M before loading. Err Logs: Unable to handle kernel paging request at virtual address Mem abort info: ... Call trace: __memcpy+0x110/0x180 rproc_start+0x40/0x218 rproc_boot+0x5b4/0x608 state_store+0x54/0xf8 dev_attr_store+0x44/0x60 sysfs_kf_write+0x58/0x80 kernfs_fop_write+0x140/0x230 vfs_write+0xc4/0x208 ksys_write+0x74/0xf8 __arm64_sys_write+0x24/0x30 ... Reviewed-by: Bjorn Andersson Fixes: 051fb70fd4ea4 ("remoteproc: qcom: Driver for the self-authenticating Hexagon v5") Cc: stable@vger.kernel.org Signed-off-by: Sibi Sankar Link: https://lore.kernel.org/r/20200722201047.12975-2-sibis@codeaurora.org Signed-off-by: Bjorn Andersson Signed-off-by: Greg Kroah-Hartman commit bf43b5d5713e4297b363855dd0696341ddbc1908 Author: Sibi Sankar Date: Tue Jun 2 22:02:56 2020 +0530 remoteproc: qcom: q6v5: Update running state before requesting stop commit 5b7be880074c73540948f8fc597e0407b98fabfa upstream. Sometimes the stop triggers a watchdog rather than a stop-ack. Update the running state to false on requesting stop to skip the watchdog instead. Error Logs: $ echo stop > /sys/class/remoteproc/remoteproc0/state ipa 1e40000.ipa: received modem stopping event remoteproc-modem: watchdog received: sys_m_smsm_mpss.c:291:APPS force stop qcom-q6v5-mss 4080000.remoteproc-modem: port failed halt ipa 1e40000.ipa: received modem offline event remoteproc0: stopped remote processor 4080000.remoteproc-modem Reviewed-by: Evan Green Fixes: 3b415c8fb263 ("remoteproc: q6v5: Extract common resource handling") Cc: stable@vger.kernel.org Signed-off-by: Sibi Sankar Link: https://lore.kernel.org/r/20200602163257.26978-1-sibis@codeaurora.org Signed-off-by: Bjorn Andersson Signed-off-by: Greg Kroah-Hartman commit 282a17f64650fe240a2294c1c46b2689386c366b Author: Bob Peterson Date: Fri Jul 24 12:06:31 2020 -0500 gfs2: Never call gfs2_block_zero_range with an open transaction commit 70499cdfeb3625c87eebe4f7a7ea06fa7447e5df upstream. Before this patch, some functions started transactions then they called gfs2_block_zero_range. However, gfs2_block_zero_range, like writes, can start transactions, which results in a recursive transaction error. For example: do_shrink trunc_start gfs2_trans_begin <------------------------------------------------ gfs2_block_zero_range iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops); iomap_apply ... iomap_zero_range_actor iomap_begin gfs2_iomap_begin gfs2_iomap_begin_write actor (iomap_zero_range_actor) iomap_zero iomap_write_begin gfs2_iomap_page_prepare gfs2_trans_begin <------------------------ This patch reorders the callers of gfs2_block_zero_range so that they only start their transactions after the call. It also adds a BUG_ON to ensure this doesn't happen again. Fixes: 2257e468a63b ("gfs2: implement gfs2_block_zero_range using iomap_zero_range") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Bob Peterson Signed-off-by: Andreas Gruenbacher Signed-off-by: Greg Kroah-Hartman commit 6a2125ecca2d3bea72be53da0533459874425115 Author: Adrian Hunter Date: Fri Jul 10 18:10:54 2020 +0300 perf intel-pt: Fix duplicate branch after CBR commit a58a057ce65b52125dd355b7d8b0d540ea267a5f upstream. CBR events can result in a duplicate branch event, because the state type defaults to a branch. Fix by clearing the state type. Example: trace 'sleep' and hope for a frequency change Before: $ perf record -e intel_pt//u sleep 0.1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.034 MB perf.data ] $ perf script --itrace=bpe > before.txt After: $ perf script --itrace=bpe > after.txt $ diff -u before.txt after.txt # --- before.txt 2020-07-07 14:42:18.191508098 +0300 # +++ after.txt 2020-07-07 14:42:36.587891753 +0300 @@ -29673,7 +29673,6 @@ sleep 93431 [007] 15411.619905: 1 branches:u: 0 [unknown] ([unknown]) => 7f0818abb2e0 clock_nanosleep@@GLIBC_2.17+0x0 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) sleep 93431 [007] 15411.619905: 1 branches:u: 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 0 [unknown] ([unknown]) sleep 93431 [007] 15411.720069: cbr: cbr: 15 freq: 1507 MHz ( 56%) 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) - sleep 93431 [007] 15411.720069: 1 branches:u: 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 0 [unknown] ([unknown]) sleep 93431 [007] 15411.720076: 1 branches:u: 0 [unknown] ([unknown]) => 7f0818abb30e clock_nanosleep@@GLIBC_2.17+0x2e (/usr/lib/x86_64-linux-gnu/libc-2.31.so) sleep 93431 [007] 15411.720077: 1 branches:u: 7f0818abb323 clock_nanosleep@@GLIBC_2.17+0x43 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 7f0818ac0eb7 __nanosleep+0x17 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) sleep 93431 [007] 15411.720077: 1 branches:u: 7f0818ac0ebf __nanosleep+0x1f (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 55cb7e4c2827 rpl_nanosleep+0x97 (/usr/bin/sleep) Fixes: 91de8684f1cff ("perf intel-pt: Cater for CBR change in PSB+") Fixes: abe5a1d3e4bee ("perf intel-pt: Decoder to output CBR changes immediately") Signed-off-by: Adrian Hunter Reviewed-by: Andi Kleen Tested-by: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20200710151104.15137-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit b89a927d0744ed0b532c41d3331c86cbc5fa7178 Author: Adrian Hunter Date: Fri Jul 10 18:10:53 2020 +0300 perf intel-pt: Fix FUP packet state commit 401136bb084fd021acd9f8c51b52fe0a25e326b2 upstream. While walking code towards a FUP ip, the packet state is INTEL_PT_STATE_FUP or INTEL_PT_STATE_FUP_NO_TIP. That was mishandled resulting in the state becoming INTEL_PT_STATE_IN_SYNC prematurely. The result was an occasional lost EXSTOP event. Signed-off-by: Adrian Hunter Reviewed-by: Andi Kleen Cc: Jiri Olsa Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20200710151104.15137-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 3a951d034a56ec45da83d4207b78816c84d4cc8f Author: Masami Hiramatsu Date: Fri Jul 10 22:11:23 2020 +0900 perf probe: Fix memory leakage when the probe point is not found commit 12d572e785b15bc764e956caaa8a4c846fd15694 upstream. Fix the memory leakage in debuginfo__find_trace_events() when the probe point is not found in the debuginfo. If there is no probe point found in the debuginfo, debuginfo__find_probes() will NOT return -ENOENT, but 0. Thus the caller of debuginfo__find_probes() must check the tf.ntevs and release the allocated memory for the array of struct probe_trace_event. The current code releases the memory only if the debuginfo__find_probes() hits an error but not checks tf.ntevs. In the result, the memory allocated on *tevs are not released if tf.ntevs == 0. This fixes the memory leakage by checking tf.ntevs == 0 in addition to ret < 0. Fixes: ff741783506c ("perf probe: Introduce debuginfo to encapsulate dwarf information") Signed-off-by: Masami Hiramatsu Reviewed-by: Srikar Dronamraju Cc: Andi Kleen Cc: Oleg Nesterov Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/159438668346.62703.10887420400718492503.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit edf00578914fb1ac5979ef23e93705396e75c605 Author: Masami Hiramatsu Date: Fri Jul 10 22:11:13 2020 +0900 perf probe: Fix wrong variable warning when the probe point is not found commit 11fd3eb874e73ee8069bcfd54e3c16fa7ce56fe6 upstream. Fix a wrong "variable not found" warning when the probe point is not found in the debuginfo. Since the debuginfo__find_probes() can return 0 even if it does not find given probe point in the debuginfo, fill_empty_trace_arg() can be called with tf.ntevs == 0 and it can emit a wrong warning. To fix this, reject ntevs == 0 in fill_empty_trace_arg(). E.g. without this patch; # perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di" Failed to find the location of the '%di' variable at this address. Perhaps it has been optimized out. Use -V with the --range option to show '%di' location range. Added new events: probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di) probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di) You can now use it in all perf tools, such as: perf record -e probe_libc:memcpy -aR sleep 1 With this; # perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di" Added new events: probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di) probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di) You can now use it in all perf tools, such as: perf record -e probe_libc:memcpy -aR sleep 1 Fixes: cb4027308570 ("perf probe: Trace a magic number if variable is not found") Reported-by: Andi Kleen Signed-off-by: Masami Hiramatsu Reviewed-by: Srikar Dronamraju Tested-by: Andi Kleen Cc: Oleg Nesterov Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/159438667364.62703.2200642186798763202.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 541758de4fdc7f2d0b78fb25b934398ab35711a2 Author: Masami Hiramatsu Date: Tue Aug 4 11:52:13 2020 +0900 bootconfig: Fix to find the initargs correctly commit 477d08478170469d10b533624342d13701e24b34 upstream. Since the parse_args() stops parsing at '--', bootconfig_params() will never get the '--' as param and initargs_found never be true. In the result, if we pass some init arguments via the bootconfig, those are always appended to the kernel command line with '--' even if the kernel command line already has '--'. To fix this correctly, check the return value of parse_args() and set initargs_found true if the return value is not an error but a valid address. Link: https://lkml.kernel.org/r/159650953285.270383.14822353843556363851.stgit@devnote2 Fixes: f61872bb58a1 ("bootconfig: Use parse_args() to find bootconfig and '--'") Cc: stable@vger.kernel.org Reported-by: Arvind Sankar Suggested-by: Arvind Sankar Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit a57c682a1676e9ef03052a7c2c992103fc528203 Author: Kees Cook Date: Thu Aug 6 14:15:23 2020 -0700 module: Correctly truncate sysfs sections output commit 11990a5bd7e558e9203c1070fc52fb6f0488e75b upstream. The only-root-readable /sys/module/$module/sections/$section files did not truncate their output to the available buffer size. While most paths into the kernfs read handlers end up using PAGE_SIZE buffers, it's possible to get there through other paths (e.g. splice, sendfile). Actually limit the output to the "count" passed into the read function, and report it back correctly. *sigh* Reported-by: kernel test robot Link: https://lore.kernel.org/lkml/20200805002015.GE23458@shao2-debian Fixes: ed66f991bb19 ("module: Refactor section attr into bin attribute") Cc: stable@vger.kernel.org Reviewed-by: Greg Kroah-Hartman Acked-by: Jessica Yu Signed-off-by: Kees Cook Signed-off-by: Greg Kroah-Hartman commit eb7ad9a06715cede4273075bb73b2f7c40558a3f Author: Johannes Thumshirn Date: Tue Aug 4 18:25:01 2020 +0900 dm: don't call report zones for more than the user requested commit a9cb9f4148ef6bb8fabbdaa85c42b2171fbd5a0d upstream. Don't call report zones for more zones than the user actually requested, otherwise this can lead to out-of-bounds accesses in the callback functions. Such a situation can happen if the target's ->report_zones() callback function returns 0 because we've reached the end of the target and then restart the report zones on the second target. We're again calling into ->report_zones() and ultimately into the user supplied callback function but when we're not subtracting the number of zones already processed this may lead to out-of-bounds accesses in the user callbacks. Signed-off-by: Johannes Thumshirn Reviewed-by: Damien Le Moal Fixes: d41003513e61 ("block: rework zone reporting") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 7bd13ac62979762f46a81b633ad2732aa764d577 Author: Anton Blanchard Date: Wed Jul 15 10:08:20 2020 +1000 pseries: Fix 64 bit logical memory block panic commit 89c140bbaeee7a55ed0360a88f294ead2b95201b upstream. Booting with a 4GB LMB size causes us to panic: qemu-system-ppc64: OS terminated: OS panic: Memory block size not suitable: 0x0 Fix pseries_memory_block_size() to handle 64 bit LMBs. Cc: stable@vger.kernel.org Signed-off-by: Anton Blanchard Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200715000820.1255764-1-anton@ozlabs.org Signed-off-by: Greg Kroah-Hartman commit 5b0051e74a1507647ec10e6b0f88648ea92eb863 Author: Jeff Layton Date: Tue Aug 4 12:31:56 2020 -0400 ceph: handle zero-length feature mask in session messages commit 02e37571f9e79022498fd0525c073b07e9d9ac69 upstream. Most session messages contain a feature mask, but the MDS will routinely send a REJECT message with one that is zero-length. Commit 0fa8263367db ("ceph: fix endianness bug when handling MDS session feature bits") fixed the decoding of the feature mask, but failed to account for the MDS sending a zero-length feature mask. This causes REJECT message decoding to fail. Skip trying to decode a feature mask if the word count is zero. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/46823 Fixes: 0fa8263367db ("ceph: fix endianness bug when handling MDS session feature bits") Signed-off-by: Jeff Layton Reviewed-by: Ilya Dryomov Tested-by: Patrick Donnelly Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman commit 6dff911a8dbacc61cc1340a9a317fe817d310cab Author: Jeff Layton Date: Tue Jul 28 10:34:20 2020 -0400 ceph: set sec_context xattr on symlink creation commit b748fc7a8763a5b3f8149f12c45711cd73ef8176 upstream. Symlink inodes should have the security context set in their xattrs on creation. We already set the context on creation, but we don't attach the pagelist. The effect is that symlink inodes don't get an SELinux context set on them at creation, so they end up unlabeled instead of inheriting the proper context. Make it do so. Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman commit edd26e0d21b71f50090040aab3c72c5c1b8ac24e Author: Ahmad Fatoum Date: Thu Jun 11 21:17:45 2020 +0200 watchdog: f71808e_wdt: clear watchdog timeout occurred flag commit 4f39d575844148fbf3081571a1f3b4ae04150958 upstream. The flag indicating a watchdog timeout having occurred normally persists till Power-On Reset of the Fintek Super I/O chip. The user can clear it by writing a `1' to the bit. The driver doesn't offer a restart method, so regular system reboot might not reset the Super I/O and if the watchdog isn't enabled, we won't touch the register containing the bit on the next boot. In this case all subsequent regular reboots will be wrongly flagged by the driver as being caused by the watchdog. Fix this by having the flag cleared after read. This is also done by other drivers like those for the i6300esb and mpc8xxx_wdt. Fixes: b97cb21a4634 ("watchdog: f71808e_wdt: Fix WDTMOUT_STS register read") Cc: stable@vger.kernel.org Signed-off-by: Ahmad Fatoum Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20200611191750.28096-5-a.fatoum@pengutronix.de Signed-off-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck Signed-off-by: Greg Kroah-Hartman commit 02a943c1b83030836e8e474217cdf7ac103b1c9f Author: Ahmad Fatoum Date: Thu Jun 11 21:17:44 2020 +0200 watchdog: f71808e_wdt: remove use of wrong watchdog_info option commit 802141462d844f2e6a4d63a12260d79b7afc4c34 upstream. The flags that should be or-ed into the watchdog_info.options by drivers all start with WDIOF_, e.g. WDIOF_SETTIMEOUT, which indicates that the driver's watchdog_ops has a usable set_timeout. WDIOC_SETTIMEOUT was used instead, which expands to 0xc0045706, which equals: WDIOF_FANFAULT | WDIOF_EXTERN1 | WDIOF_PRETIMEOUT | WDIOF_ALARMONLY | WDIOF_MAGICCLOSE | 0xc0045000 These were so far indicated to userspace on WDIOC_GETSUPPORT. As the driver has not yet been migrated to the new watchdog kernel API, the constant can just be dropped without substitute. Fixes: 96cb4eb019ce ("watchdog: f71808e_wdt: new watchdog driver for Fintek F71808E and F71882FG") Cc: stable@vger.kernel.org Signed-off-by: Ahmad Fatoum Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20200611191750.28096-4-a.fatoum@pengutronix.de Signed-off-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck Signed-off-by: Greg Kroah-Hartman commit cc3012a915e8797864960fa312601e711332d031 Author: Ahmad Fatoum Date: Thu Jun 11 21:17:43 2020 +0200 watchdog: f71808e_wdt: indicate WDIOF_CARDRESET support in watchdog_info.options commit e871e93fb08a619dfc015974a05768ed6880fd82 upstream. The driver supports populating bootstatus with WDIOF_CARDRESET, but so far userspace couldn't portably determine whether absence of this flag meant no watchdog reset or no driver support. Or-in the bit to fix this. Fixes: b97cb21a4634 ("watchdog: f71808e_wdt: Fix WDTMOUT_STS register read") Cc: stable@vger.kernel.org Signed-off-by: Ahmad Fatoum Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20200611191750.28096-3-a.fatoum@pengutronix.de Signed-off-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck Signed-off-by: Greg Kroah-Hartman commit 493df1df6c3157314839609c457e07afb716a506 Author: Steven Rostedt (VMware) Date: Tue Aug 4 20:00:02 2020 -0400 tracing: Use trace_sched_process_free() instead of exit() for pid tracing commit afcab636657421f7ebfa0783a91f90256bba0091 upstream. On exit, if a process is preempted after the trace_sched_process_exit() tracepoint but before the process is done exiting, then when it gets scheduled in, the function tracers will not filter it properly against the function tracing pid filters. That is because the function tracing pid filters hooks to the sched_process_exit() tracepoint to remove the exiting task's pid from the filter list. Because the filtering happens at the sched_switch tracepoint, when the exiting task schedules back in to finish up the exit, it will no longer be in the function pid filtering tables. This was noticeable in the notrace self tests on a preemptable kernel, as the tests would fail as it exits and preempted after being taken off the notrace filter table and on scheduling back in it would not be in the notrace list, and then the ending of the exit function would trace. The test detected this and would fail. Cc: stable@vger.kernel.org Cc: Namhyung Kim Fixes: 1e10486ffee0a ("ftrace: Add 'function-fork' trace option") Fixes: c37775d57830a ("tracing: Add infrastructure to allow set_event_pid to follow children" Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit 82175a806016c25f0461b221fb747c508952d90a Author: Kevin Hao Date: Thu Jul 30 16:23:18 2020 +0800 tracing/hwlat: Honor the tracing_cpumask commit 96b4833b6827a62c295b149213c68b559514c929 upstream. In calculation of the cpu mask for the hwlat kernel thread, the wrong cpu mask is used instead of the tracing_cpumask, this causes the tracing/tracing_cpumask useless for hwlat tracer. Fixes it. Link: https://lkml.kernel.org/r/20200730082318.42584-2-haokexin@gmail.com Cc: Ingo Molnar Cc: stable@vger.kernel.org Fixes: 0330f7aa8ee6 ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs") Signed-off-by: Kevin Hao Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit 101d5df5248a40dddc99147cb819ea18e7d90700 Author: Muchun Song Date: Tue Jul 28 14:45:36 2020 +0800 kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler commit 0cb2f1372baa60af8456388a574af6133edd7d80 upstream. We found a case of kernel panic on our server. The stack trace is as follows(omit some irrelevant information): BUG: kernel NULL pointer dereference, address: 0000000000000080 RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0 RSP: 0018:ffffb512c6550998 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000 RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0 RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ftrace_ops_assist_func+0x56/0xe0 ftrace_call+0x5/0x34 tcpa_statistic_send+0x5/0x130 [ttcp_engine] The tcpa_statistic_send is the function being kprobed. After analysis, the root cause is that the fourth parameter regs of kprobe_ftrace_handler is NULL. Why regs is NULL? We use the crash tool to analyze the kdump. crash> dis tcpa_statistic_send -r : callq 0xffffffffbd8018c0 The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller. So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler is NULL. In theory, we should call the ftrace_regs_caller instead of the ftrace_caller. After in-depth analysis, we found a reproducible path. Writing a simple kernel module which starts a periodic timer. The timer's handler is named 'kprobe_test_timer_handler'. The module name is kprobe_test.ko. 1) insmod kprobe_test.ko 2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}' 3) echo 0 > /proc/sys/kernel/ftrace_enabled 4) rmmod kprobe_test 5) stop step 2) kprobe 6) insmod kprobe_test.ko 7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}' We mark the kprobe as GONE but not disarm the kprobe in the step 4). The step 5) also do not disarm the kprobe when unregister kprobe. So we do not remove the ip from the filter. In this case, when the module loads again in the step 6), we will replace the code to ftrace_caller via the ftrace_module_enable(). When we register kprobe again, we will not replace ftrace_caller to ftrace_regs_caller because the ftrace is disabled in the step 3). So the step 7) will trigger kernel panic. Fix this problem by disarming the kprobe when the module is going away. Link: https://lkml.kernel.org/r/20200728064536.24405-1-songmuchun@bytedance.com Cc: stable@vger.kernel.org Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization") Acked-by: Masami Hiramatsu Signed-off-by: Muchun Song Co-developed-by: Chengming Zhou Signed-off-by: Chengming Zhou Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit 8b62d4553209b069f799676cf9bda36ad12982f6 Author: Chengming Zhou Date: Wed Jul 29 02:05:53 2020 +0800 ftrace: Setup correct FTRACE_FL_REGS flags for module commit 8a224ffb3f52b0027f6b7279854c71a31c48fc97 upstream. When module loaded and enabled, we will use __ftrace_replace_code for module if any ftrace_ops referenced it found. But we will get wrong ftrace_addr for module rec in ftrace_get_addr_new, because rec->flags has not been setup correctly. It can cause the callback function of a ftrace_ops has FTRACE_OPS_FL_SAVE_REGS to be called with pt_regs set to NULL. So setup correct FTRACE_FL_REGS flags for rec when we call referenced_filters to find ftrace_ops references it. Link: https://lkml.kernel.org/r/20200728180554.65203-1-zhouchengming@bytedance.com Cc: stable@vger.kernel.org Fixes: 8c4f3c3fa9681 ("ftrace: Check module functions being traced on reload") Signed-off-by: Chengming Zhou Signed-off-by: Muchun Song Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman commit 37dd888355b43a0499e96adce9afa1e8c180a729 Author: Jia He Date: Tue Aug 11 18:32:20 2020 -0700 mm/memory_hotplug: fix unpaired mem_hotplug_begin/done commit b4223a510e2ab1bf0f971d50af7c1431014b25ad upstream. When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is online at that time), mem_hotplug_begin/done is unpaired in such case. Therefore a warning: Call Trace: percpu_up_write+0x33/0x40 try_remove_memory+0x66/0x120 ? _cond_resched+0x19/0x30 remove_memory+0x2b/0x40 dev_dax_kmem_remove+0x36/0x72 [kmem] device_release_driver_internal+0xf0/0x1c0 device_release_driver+0x12/0x20 bus_remove_device+0xe1/0x150 device_del+0x17b/0x3e0 unregister_dev_dax+0x29/0x60 devm_action_release+0x15/0x20 release_nodes+0x19a/0x1e0 devres_release_all+0x3f/0x50 device_release_driver_internal+0x100/0x1c0 driver_detach+0x4c/0x8f bus_remove_driver+0x5c/0xd0 driver_unregister+0x31/0x50 dax_pmem_exit+0x10/0xfe0 [dax_pmem] Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat") Signed-off-by: Jia He Signed-off-by: Andrew Morton Reviewed-by: David Hildenbrand Acked-by: Michal Hocko Acked-by: Dan Williams Cc: [5.6+] Cc: Andy Lutomirski Cc: Baoquan He Cc: Borislav Petkov Cc: Catalin Marinas Cc: Chuhong Yuan Cc: Dave Hansen Cc: Dave Jiang Cc: Fenghua Yu Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jonathan Cameron Cc: Kaly Xin Cc: Logan Gunthorpe Cc: Masahiro Yamada Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Rich Felker Cc: Thomas Gleixner Cc: Tony Luck Cc: Vishal Verma Cc: Will Deacon Cc: Yoshinori Sato Link: http://lkml.kernel.org/r/20200710031619.18762-3-justin.he@arm.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 23294321c9b5d337469a1bbd4d927c7f5d545da4 Author: Mike Kravetz Date: Tue Aug 11 18:32:03 2020 -0700 cma: don't quit at first error when activating reserved areas commit 3a5139f1c5bb76d69756fb8f13fffa173e261153 upstream. The routine cma_init_reserved_areas is designed to activate all reserved cma areas. It quits when it first encounters an error. This can leave some areas in a state where they are reserved but not activated. There is no feedback to code which performed the reservation. Attempting to allocate memory from areas in such a state will result in a BUG. Modify cma_init_reserved_areas to always attempt to activate all areas. The called routine, cma_activate_area is responsible for leaving the area in a valid state. No one is making active use of returned error codes, so change the routine to void. How to reproduce: This example uses kernelcore, hugetlb and cma as an easy way to reproduce. However, this is a more general cma issue. Two node x86 VM 16GB total, 8GB per node Kernel command line parameters, kernelcore=4G hugetlb_cma=8G Related boot time messages, hugetlb_cma: reserve 8192 MiB, up to 4096 MiB per node cma: Reserved 4096 MiB at 0x0000000100000000 hugetlb_cma: reserved 4096 MiB on node 0 cma: Reserved 4096 MiB at 0x0000000300000000 hugetlb_cma: reserved 4096 MiB on node 1 cma: CMA area hugetlb could not be activated # echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... Call Trace: bitmap_find_next_zero_area_off+0x51/0x90 cma_alloc+0x1a5/0x310 alloc_fresh_huge_page+0x78/0x1a0 alloc_pool_huge_page+0x6f/0xf0 set_max_huge_pages+0x10c/0x250 nr_hugepages_store_common+0x92/0x120 ? __kmalloc+0x171/0x270 kernfs_fop_write+0xc1/0x1a0 vfs_write+0xc7/0x1f0 ksys_write+0x5f/0xe0 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: c64be2bb1c6e ("drivers: add Contiguous Memory Allocator") Signed-off-by: Mike Kravetz Signed-off-by: Andrew Morton Reviewed-by: Roman Gushchin Acked-by: Barry Song Cc: Marek Szyprowski Cc: Michal Nazarewicz Cc: Kyungmin Park Cc: Joonsoo Kim Cc: Link: http://lkml.kernel.org/r/20200730163123.6451-1-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 7d2c48d9a54be308f433a528eea68b393fb05ded Author: Michal Koutný Date: Thu Aug 6 23:22:18 2020 -0700 mm/page_counter.c: fix protection usage propagation commit a6f23d14ec7d7d02220ad8bb2774be3322b9aeec upstream. When workload runs in cgroups that aren't directly below root cgroup and their parent specifies reclaim protection, it may end up ineffective. The reason is that propagate_protected_usage() is not called in all hierarchy up. All the protected usage is incorrectly accumulated in the workload's parent. This means that siblings_low_usage is overestimated and effective protection underestimated. Even though it is transitional phenomenon (uncharge path does correct propagation and fixes the wrong children_low_usage), it can undermine the intended protection unexpectedly. We have noticed this problem while seeing a swap out in a descendant of a protected memcg (intermediate node) while the parent was conveniently under its protection limit and the memory pressure was external to that hierarchy. Michal has pinpointed this down to the wrong siblings_low_usage which led to the unwanted reclaim. The fix is simply updating children_low_usage in respective ancestors also in the charging path. Fixes: 230671533d64 ("mm: memory.low hierarchical behavior") Signed-off-by: Michal Koutný Signed-off-by: Michal Hocko Signed-off-by: Andrew Morton Acked-by: Michal Hocko Acked-by: Roman Gushchin Cc: Johannes Weiner Cc: Tejun Heo Cc: [4.18+] Link: http://lkml.kernel.org/r/20200803153231.15477-1-mhocko@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 3203103c7262e7a910f57abd834dec1b73b5fc0a Author: Junxiao Bi Date: Thu Aug 6 23:18:02 2020 -0700 ocfs2: change slot number type s16 to u16 commit 38d51b2dd171ad973afc1f5faab825ed05a2d5e9 upstream. Dan Carpenter reported the following static checker warning. fs/ocfs2/super.c:1269 ocfs2_parse_options() warn: '(-1)' 65535 can't fit into 32767 'mopt->slot' fs/ocfs2/suballoc.c:859 ocfs2_init_inode_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_inode_steal_slot' fs/ocfs2/suballoc.c:867 ocfs2_init_meta_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_meta_steal_slot' That's because OCFS2_INVALID_SLOT is (u16)-1. Slot number in ocfs2 can be never negative, so change s16 to u16. Fixes: 9277f8334ffc ("ocfs2: fix value of OCFS2_INVALID_SLOT") Reported-by: Dan Carpenter Signed-off-by: Junxiao Bi Signed-off-by: Andrew Morton Reviewed-by: Joseph Qi Reviewed-by: Gang He Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Jun Piao Cc: Link: http://lkml.kernel.org/r/20200627001259.19757-1-junxiao.bi@oracle.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 1fc1fa9d4e778e619dd0b6ef4a0959f0d570f49e Author: David Hildenbrand Date: Thu Aug 6 23:17:13 2020 -0700 mm/shuffle: don't move pages between zones and don't read garbage memmaps commit 4a93025cbe4a0b19d1a25a2d763a3d2018bad0d9 upstream. Especially with memory hotplug, we can have offline sections (with a garbage memmap) and overlapping zones. We have to make sure to only touch initialized memmaps (online sections managed by the buddy) and that the zone matches, to not move pages between zones. To test if this can actually happen, I added a simple BUG_ON(page_zone(page_i) != page_zone(page_j)); right before the swap. When hotplugging a 256M DIMM to a 4G x86-64 VM and onlining the first memory block "online_movable" and the second memory block "online_kernel", it will trigger the BUG, as both zones (NORMAL and MOVABLE) overlap. This might result in all kinds of weird situations (e.g., double allocations, list corruptions, unmovable allocations ending up in the movable zone). Fixes: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization") Signed-off-by: David Hildenbrand Signed-off-by: Andrew Morton Reviewed-by: Wei Yang Acked-by: Michal Hocko Acked-by: Dan Williams Cc: Andrew Morton Cc: Johannes Weiner Cc: Michal Hocko Cc: Minchan Kim Cc: Huang Ying Cc: Wei Yang Cc: Mel Gorman Cc: [5.2+] Link: http://lkml.kernel.org/r/20200624094741.9918-2-david@redhat.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 216944c62eacec737a260829e4ba8de3dac3c6e3 Author: Mike Kravetz Date: Tue Aug 11 18:31:38 2020 -0700 hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem commit 34ae204f18519f0920bd50a644abd6fefc8dbfcf upstream. Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem in at least read mode. This is because the explicit locking in huge_pmd_share (called by huge_pte_alloc) was removed. When restructuring the code, the call to huge_pte_alloc in the else block at the beginning of hugetlb_fault was missed. Unfortunately, that else clause is exercised when there is no page table entry. This will likely lead to a call to huge_pmd_share. If huge_pmd_share thinks pmd sharing is possible, it will traverse the mapping tree (i_mmap) without holding i_mmap_rwsem. If someone else is modifying the tree, bad things such as addressing exceptions or worse could happen. Simply remove the else clause. It should have been removed previously. The code following the else will call huge_pte_alloc with the appropriate locking. To prevent this type of issue in the future, add routines to assert that i_mmap_rwsem is held, and call these routines in huge pmd sharing routines. Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Suggested-by: Matthew Wilcox Signed-off-by: Mike Kravetz Signed-off-by: Andrew Morton Cc: Michal Hocko Cc: Hugh Dickins Cc: Naoya Horiguchi Cc: "Aneesh Kumar K.V" Cc: Andrea Arcangeli Cc: "Kirill A.Shutemov" Cc: Davidlohr Bueso Cc: Prakash Sangappa Cc: Link: http://lkml.kernel.org/r/e670f327-5cf9-1959-96e4-6dc7cc30d3d5@oracle.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0c9e2cfb548da7e9e5d14fe463d2a87004726425 Author: Hugh Dickins Date: Thu Aug 6 23:26:18 2020 -0700 khugepaged: collapse_pte_mapped_thp() protect the pmd lock commit 119a5fc16105b2b9383a6e2a7800b2ef861b2975 upstream. When retract_page_tables() removes a page table to make way for a huge pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the case when the original mmap_write_trylock had failed), only mmap_write_trylock and pmd lock are held. That's not enough. One machine has twice crashed under load, with "BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving page_referenced() on a file THP, that had found a page table at *pmd) discovers that the page table page and its lock have already been freed by the time it comes to unlock. Follow the example of retract_page_tables(), but we only need one of huge page lock or i_mmap_lock_write to secure against this: because it's the narrower lock, and because it simplifies collapse_pte_mapped_thp() to know the hpage earlier, choose to rely on huge page lock here. Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Mike Kravetz Cc: Song Liu Cc: [5.4+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 53d366ea63309995c2840af6f0bdbbbb3c50ab1d Author: Peter Xu Date: Thu Aug 6 23:26:11 2020 -0700 mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible commit 75802ca66354a39ab8e35822747cd08b3384a99a upstream. This is found by code observation only. Firstly, the worst case scenario should assume the whole range was covered by pmd sharing. The old algorithm might not work as expected for ranges like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the expected range should be (0, 2g). Since at it, remove the loop since it should not be required. With that, the new code should be faster too when the invalidating range is huge. Mike said: : With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only : adjust to (0, 1g+2m) which is incorrect. : : We should cc stable. The original reason for adjusting the range was to : prevent data corruption (getting wrong page). Since the range is not : always adjusted correctly, the potential for corruption still exists. : : However, I am fairly confident that adjust_range_if_pmd_sharing_possible : is only gong to be called in two cases: : : 1) for a single page : 2) for range == entire vma : : In those cases, the current code should produce the correct results. : : To be safe, let's just cc stable. Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages") Signed-off-by: Peter Xu Signed-off-by: Andrew Morton Reviewed-by: Mike Kravetz Cc: Andrea Arcangeli Cc: Matthew Wilcox Cc: Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 8dd14f2f2b9108dd1f1df10ddf4c65527a047f63 Author: Hugh Dickins Date: Thu Aug 6 23:26:15 2020 -0700 khugepaged: collapse_pte_mapped_thp() flush the right range commit 723a80dafed5c95889d48baab9aa433a6ffa0b4e upstream. pmdp_collapse_flush() should be given the start address at which the huge page is mapped, haddr: it was given addr, which at that point has been used as a local variable, incremented to the end address of the extent. Found by source inspection while chasing a hugepage locking bug, which I then could not explain by this. At first I thought this was very bad; then saw that all of the page translations that were not flushed would actually still point to the right pages afterwards, so harmless; then realized that I know nothing of how different architectures and models cache intermediate paging structures, so maybe it matters after all - particularly since the page table concerned is immediately freed. Much easier to fix than to think about. Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Mike Kravetz Cc: Song Liu Cc: [5.4+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvils Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit cdfe493e2a26daa70d855465745bd7792c06e7dc Author: Mikulas Patocka Date: Mon Apr 20 16:02:21 2020 -0400 ext2: fix missing percpu_counter_inc commit bc2fbaa4d3808aef82dd1064a8e61c16549fe956 upstream. sbi->s_freeinodes_counter is only decreased by the ext2 code, it is never increased. This patch fixes it. Note that sbi->s_freeinodes_counter is only used in the algorithm that tries to find the group for new allocations, so this bug is not easily visible (the only visibility is that the group finding algorithm selects inoptinal result). Link: https://lore.kernel.org/r/alpine.LRH.2.02.2004201538300.19436@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit e5eed40643535faaf9f4a787b688324dd6efa97f Author: Mike Rapoport Date: Wed Aug 5 15:51:41 2020 +0300 MIPS: SGI-IP27: always enable NUMA in Kconfig commit 6c86a3029ce3b44597526909f2e39a77a497f640 upstream. When a configuration has NUMA disabled and SGI_IP27 enabled, the build fails: CC kernel/bounds.s CC arch/mips/kernel/asm-offsets.s In file included from arch/mips/include/asm/topology.h:11, from include/linux/topology.h:36, from include/linux/gfp.h:9, from include/linux/slab.h:15, from include/linux/crypto.h:19, from include/crypto/hash.h:11, from include/linux/uio.h:10, from include/linux/socket.h:8, from include/linux/compat.h:15, from arch/mips/kernel/asm-offsets.c:12: include/linux/topology.h: In function 'numa_node_id': arch/mips/include/asm/mach-ip27/topology.h:16:27: error: implicit declaration of function 'cputonasid'; did you mean 'cpu_vpe_id'? [-Werror=implicit-function-declaration] #define cpu_to_node(cpu) (cputonasid(cpu)) ^~~~~~~~~~ include/linux/topology.h:119:9: note: in expansion of macro 'cpu_to_node' return cpu_to_node(raw_smp_processor_id()); ^~~~~~~~~~~ include/linux/topology.h: In function 'cpu_cpu_mask': arch/mips/include/asm/mach-ip27/topology.h:19:7: error: implicit declaration of function 'hub_data' [-Werror=implicit-function-declaration] &hub_data(node)->h_cpus) ^~~~~~~~ include/linux/topology.h:210:9: note: in expansion of macro 'cpumask_of_node' return cpumask_of_node(cpu_to_node(cpu)); ^~~~~~~~~~~~~~~ arch/mips/include/asm/mach-ip27/topology.h:19:21: error: invalid type argument of '->' (have 'int') &hub_data(node)->h_cpus) ^~ include/linux/topology.h:210:9: note: in expansion of macro 'cpumask_of_node' return cpumask_of_node(cpu_to_node(cpu)); ^~~~~~~~~~~~~~~ Before switch from discontigmem to sparsemem, there always was CONFIG_NEED_MULTIPLE_NODES=y because it was selected by DISCONTIGMEM. Without DISCONTIGMEM it is possible to have SPARSEMEM without NUMA for SGI_IP27 and as many things there rely on custom node definition, the build breaks. As Thomas noted "... there are right now too many places in IP27 code, which assumes NUMA enabled", the simplest solution would be to always enable NUMA for SGI-IP27 builds. Reported-by: kernel test robot Fixes: 397dc00e249e ("mips: sgi-ip27: switch from DISCONTIGMEM to SPARSEMEM") Cc: stable@vger.kernel.org Signed-off-by: Mike Rapoport Signed-off-by: Thomas Bogendoerfer Signed-off-by: Greg Kroah-Hartman commit 1d5ce407f4e509a52a6707d74a4b467a196c66c9 Author: Paul Cercueil Date: Mon Jul 27 20:11:28 2020 +0200 MIPS: qi_lb60: Fix routing to audio amplifier commit 0889a67a9e7a56ba39af223d536630b20b877fda upstream. The ROUT (right channel output of audio codec) was connected to INL (left channel of audio amplifier) instead of INR (right channel of audio amplifier). Fixes: 8ddebad15e9b ("MIPS: qi_lb60: Migrate to devicetree") Cc: stable@vger.kernel.org # v5.3 Signed-off-by: Paul Cercueil Signed-off-by: Thomas Bogendoerfer Signed-off-by: Greg Kroah-Hartman commit 26fa5d7ec02dbdc6a2bd4275e5bbf0382689f4a5 Author: Huacai Chen Date: Thu Jul 16 18:40:23 2020 +0800 MIPS: CPU#0 is not hotpluggable commit 9cce844abf07b683cff5f0273977d5f8d0af94c7 upstream. Now CPU#0 is not hotpluggable on MIPS, so prevent to create /sys/devices /system/cpu/cpu0/online which confuses some user-space tools. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen Signed-off-by: Thomas Bogendoerfer Signed-off-by: Greg Kroah-Hartman commit 45274806f28da5d87dae2a056325bab97d164351 Author: Lukas Wunner Date: Wed Jul 8 15:27:01 2020 +0200 driver core: Avoid binding drivers to dead devices commit 654888327e9f655a9d55ad477a9583e90e8c9b5c upstream. Commit 3451a495ef24 ("driver core: Establish order of operations for device_add and device_del via bitflag") sought to prevent asynchronous driver binding to a device which is being removed. It added a per-device "dead" flag which is checked in the following code paths: * asynchronous binding in __driver_attach_async_helper() * synchronous binding in device_driver_attach() * asynchronous binding in __device_attach_async_helper() It did *not* check the flag upon: * synchronous binding in __device_attach() However __device_attach() may also be called asynchronously from: deferred_probe_work_func() bus_probe_device() device_initial_probe() __device_attach() So if the commit's intention was to check the "dead" flag in all asynchronous code paths, then a check is also necessary in __device_attach(). Add the missing check. Fixes: 3451a495ef24 ("driver core: Establish order of operations for device_add and device_del via bitflag") Signed-off-by: Lukas Wunner Cc: stable@vger.kernel.org # v5.1+ Cc: Alexander Duyck Link: https://lore.kernel.org/r/de88a23a6fe0ef70f7cfd13c8aea9ab51b4edab6.1594214103.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman commit 7b897c98b23ebf3258492c6f61a4ea22bd5939f6 Author: Johannes Berg Date: Mon Aug 3 11:02:10 2020 +0200 mac80211: fix misplaced while instead of if commit 5981fe5b0529ba25d95f37d7faa434183ad618c5 upstream. This never was intended to be a 'while' loop, it should've just been an 'if' instead of 'while'. Fix this. I noticed this while applying another patch from Ben that intended to fix a busy loop at this spot. Cc: stable@vger.kernel.org Fixes: b16798f5b907 ("mac80211: mark station unauthorized before key removal") Reported-by: Ben Greear Link: https://lore.kernel.org/r/20200803110209.253009ae41ff.I3522aad099392b31d5cf2dcca34cbac7e5832dde@changeid Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit d9eb83144bb60c8d6087d4a51186659399e99a72 Author: Coly Li Date: Sat Jul 25 20:00:22 2020 +0800 bcache: fix overflow in offset_to_stripe() commit 7a1481267999c02abf4a624515c1b5c7c1fccbd6 upstream. offset_to_stripe() returns the stripe number (in type unsigned int) from an offset (in type uint64_t) by the following calculation, do_div(offset, d->stripe_size); For large capacity backing device (e.g. 18TB) with small stripe size (e.g. 4KB), the result is 4831838208 and exceeds UINT_MAX. The actual returned value which caller receives is 536870912, due to the overflow. Indeed in bcache_device_init(), bcache_device->nr_stripes is limited in range [1, INT_MAX]. Therefore all valid stripe numbers in bcache are in range [0, bcache_dev->nr_stripes - 1]. This patch adds a upper limition check in offset_to_stripe(): the max valid stripe number should be less than bcache_device->nr_stripes. If the calculated stripe number from do_div() is equal to or larger than bcache_device->nr_stripe, -EINVAL will be returned. (Normally nr_stripes is less than INT_MAX, exceeding upper limitation doesn't mean overflow, therefore -EOVERFLOW is not used as error code.) This patch also changes nr_stripes' type of struct bcache_device from 'unsigned int' to 'int', and return value type of offset_to_stripe() from 'unsigned int' to 'int', to match their exact data ranges. All locations where bcache_device->nr_stripes and offset_to_stripe() are referenced also get updated for the above type change. Reported-and-tested-by: Ken Raeburn Signed-off-by: Coly Li Cc: stable@vger.kernel.org Link: https://bugzilla.redhat.com/show_bug.cgi?id=1783075 Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit f95748877d0c55fabc4a8a797e965cb8dc6431e7 Author: Coly Li Date: Sat Jul 25 20:00:16 2020 +0800 bcache: allocate meta data pages as compound pages commit 5fe48867856367142d91a82f2cbf7a57a24cbb70 upstream. There are some meta data of bcache are allocated by multiple pages, and they are used as bio bv_page for I/Os to the cache device. for example cache_set->uuids, cache->disk_buckets, journal_write->data, bset_tree->data. For such meta data memory, all the allocated pages should be treated as a single memory block. Then the memory management and underlying I/O code can treat them more clearly. This patch adds __GFP_COMP flag to all the location allocating >0 order pages for the above mentioned meta data. Then their pages are treated as compound pages now. Signed-off-by: Coly Li Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 84d2efba233042ea4d35eb66e7850983728d4c17 Author: ChangSyun Peng Date: Fri Jul 31 17:50:17 2020 +0800 md/raid5: Fix Force reconstruct-write io stuck in degraded raid5 commit a1c6ae3d9f3dd6aa5981a332a6f700cf1c25edef upstream. In degraded raid5, we need to read parity to do reconstruct-write when data disks fail. However, we can not read parity from handle_stripe_dirtying() in force reconstruct-write mode. Reproducible Steps: 1. Create degraded raid5 mdadm -C /dev/md2 --assume-clean -l5 -n3 /dev/sda2 /dev/sdb2 missing 2. Set rmw_level to 0 echo 0 > /sys/block/md2/md/rmw_level 3. IO to raid5 Now some io may be stuck in raid5. We can use handle_stripe_fill() to read the parity in this situation. Cc: # v4.4+ Reviewed-by: Alex Wu Reviewed-by: BingJing Chang Reviewed-by: Danny Shih Signed-off-by: ChangSyun Peng Signed-off-by: Song Liu Signed-off-by: Greg Kroah-Hartman commit 05cbe38ed514d62b6241fd893ec1319d7549b544 Author: Kees Cook Date: Fri Jul 10 10:29:41 2020 -0700 selftests/seccomp: Set NNP for TSYNC ESRCH flag test commit e4d05028a07f505a08802a6d1b11674c149df2b3 upstream. The TSYNC ESRCH flag test will fail for regular users because NNP was not set yet. Add NNP setting. Fixes: 51891498f2da ("seccomp: allow TSYNC and USER_NOTIF together") Cc: stable@vger.kernel.org Reviewed-by: Tycho Andersen Signed-off-by: Kees Cook Signed-off-by: Greg Kroah-Hartman commit 342a9b03f7944c0164bb4352ba7544d22af07654 Author: Kees Cook Date: Tue Jun 9 16:11:29 2020 -0700 net/compat: Add missing sock updates for SCM_RIGHTS commit d9539752d23283db4692384a634034f451261e29 upstream. Add missed sock updates to compat path via a new helper, which will be used more in coming patches. (The net/core/scm.c code is left as-is here to assist with -stable backports for the compat path.) Cc: Christoph Hellwig Cc: Sargun Dhillon Cc: Jakub Kicinski Cc: stable@vger.kernel.org Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly") Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly") Acked-by: Christian Brauner Signed-off-by: Kees Cook Signed-off-by: Greg Kroah-Hartman commit 08d41d0451e6a9e0a6132b0149d3edd731234fc6 Author: Kees Cook Date: Tue Jun 9 16:21:38 2020 -0700 pidfd: Add missing sock updates for pidfd_getfd() commit 4969f8a073977123504609d7310b42a588297aa4 upstream. The sock counting (sock_update_netprioidx() and sock_update_classid()) was missing from pidfd's implementation of received fd installation. Add a call to the new __receive_sock() helper. Cc: Christian Brauner Cc: Christoph Hellwig Cc: Sargun Dhillon Cc: Jakub Kicinski Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 8649c322f75c ("pid: Implement pidfd_getfd syscall") Signed-off-by: Kees Cook Signed-off-by: Greg Kroah-Hartman commit 2c0ac43a7eb2525d683a2167b6b0c9a49611638d Author: Zenghui Yu Date: Mon Jul 20 17:23:28 2020 +0800 irqchip/gic-v4.1: Ensure accessing the correct RD when writing INVALLR commit 3af9571cd585efafc2facbd8dbd407317ff898cf upstream. The GICv4.1 spec tells us that it's CONSTRAINED UNPREDICTABLE to issue a register-based invalidation operation for a vPEID not mapped to that RD, or another RD within the same CommonLPIAff group. To follow this rule, commit f3a059219bc7 ("irqchip/gic-v4.1: Ensure mutual exclusion between vPE affinity change and RD access") tried to address the race between the RD accesses and the vPE affinity change, but somehow forgot to take GICR_INVALLR into account. Let's take the vpe_lock before evaluating vpe->col_idx to fix it. Fixes: f3a059219bc7 ("irqchip/gic-v4.1: Ensure mutual exclusion between vPE affinity change and RD access") Signed-off-by: Zenghui Yu Signed-off-by: Marc Zyngier Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200720092328.708-1-yuzenghui@huawei.com Signed-off-by: Greg Kroah-Hartman commit 79b265a75c69be632033802298eac3ea46fc541f Author: Huacai Chen Date: Thu Jul 30 16:51:28 2020 +0800 irqchip/loongson-liointc: Fix misuse of gc->mask_cache commit c9c73a05413ea4a465cae1cb3593b01b190a233f upstream. In gc->mask_cache bits, 1 means enabled and 0 means disabled, but in the loongson-liointc driver mask_cache is misused by reverting its meaning. This patch fix the bug and update the comments as well. Fixes: dbb152267908c4b2c3639492a ("irqchip: Add driver for Loongson I/O Local Interrupt Controller") Signed-off-by: Huacai Chen Signed-off-by: Marc Zyngier Reviewed-by: Jiaxun Yang Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1596099090-23516-4-git-send-email-chenhc@lemote.com Signed-off-by: Greg Kroah-Hartman commit 52c19dd9eedc43671f52bfb7bfacf00bda70fcb5 Author: Jonathan McDowell Date: Wed Aug 12 20:37:01 2020 +0100 net: stmmac: dwmac1000: provide multicast filter fallback commit 592d751c1e174df5ff219946908b005eb48934b3 upstream. If we don't have a hardware multicast filter available then instead of silently failing to listen for the requested ethernet broadcast addresses fall back to receiving all multicast packets, in a similar fashion to other drivers with no multicast filter. Cc: stable@vger.kernel.org Signed-off-by: Jonathan McDowell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ee776e21112596fb2785de81c2b5d4d01329ef1f Author: Jonathan McDowell Date: Wed Aug 12 20:37:23 2020 +0100 net: ethernet: stmmac: Disable hardware multicast filter commit df43dd526e6609769ae513a81443c7aa727c8ca3 upstream. The IPQ806x does not appear to have a functional multicast ethernet address filter. This was observed as a failure to correctly receive IPv6 packets on a LAN to the all stations address. Checking the vendor driver shows that it does not attempt to enable the multicast filter and instead falls back to receiving all multicast packets, internally setting ALLMULTI. Use the new fallback support in the dwmac1000 driver to correctly achieve the same with the mainline IPQ806x driver. Confirmed to fix IPv6 functionality on an RB3011 router. Cc: stable@vger.kernel.org Signed-off-by: Jonathan McDowell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0515f53776fee3f6109c065c911c83e16234d23b Author: Eugeniu Rosca Date: Tue Jun 2 21:50:16 2020 +0200 media: vsp1: dl: Fix NULL pointer dereference on unbind commit c92d30e4b78dc331909f8c6056c2792aa14e2166 upstream. In commit f3b98e3c4d2e16 ("media: vsp1: Provide support for extended command pools"), the vsp pointer used for referencing the VSP1 device structure from a command pool during vsp1_dl_ext_cmd_pool_destroy() was not populated. Correctly assign the pointer to prevent the following null-pointer-dereference when removing the device: [*] h3ulcb-kf #> echo fea28000.vsp > /sys/bus/platform/devices/fea28000.vsp/driver/unbind Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000007318be000 [0000000000000028] pgd=00000007333a1003, pud=00000007333a6003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 486 Comm: sh Not tainted 5.7.0-rc6-arm64-renesas-00118-ge644645abf47 #185 Hardware name: Renesas H3ULCB Kingfisher board based on r8a77951 (DT) pstate: 40000005 (nZcv daif -PAN -UAO) pc : vsp1_dlm_destroy+0xe4/0x11c lr : vsp1_dlm_destroy+0xc8/0x11c sp : ffff800012963b60 x29: ffff800012963b60 x28: ffff0006f83fc440 x27: 0000000000000000 x26: ffff0006f5e13e80 x25: ffff0006f5e13ed0 x24: ffff0006f5e13ed0 x23: ffff0006f5e13ed0 x22: dead000000000122 x21: ffff0006f5e3a080 x20: ffff0006f5df2938 x19: ffff0006f5df2980 x18: 0000000000000003 x17: 0000000000000000 x16: 0000000000000016 x15: 0000000000000003 x14: 00000000000393c0 x13: ffff800011a5ec18 x12: ffff800011d8d000 x11: ffff0006f83fcc68 x10: ffff800011a53d70 x9 : ffff8000111f3000 x8 : 0000000000000000 x7 : 0000000000210d00 x6 : 0000000000000000 x5 : ffff800010872e60 x4 : 0000000000000004 x3 : 0000000078068000 x2 : ffff800012781000 x1 : 0000000000002c00 x0 : 0000000000000000 Call trace: vsp1_dlm_destroy+0xe4/0x11c vsp1_wpf_destroy+0x10/0x20 vsp1_entity_destroy+0x24/0x4c vsp1_destroy_entities+0x54/0x130 vsp1_remove+0x1c/0x40 platform_drv_remove+0x28/0x50 __device_release_driver+0x178/0x220 device_driver_detach+0x44/0xc0 unbind_store+0xe0/0x104 drv_attr_store+0x20/0x30 sysfs_kf_write+0x48/0x70 kernfs_fop_write+0x148/0x230 __vfs_write+0x18/0x40 vfs_write+0xdc/0x1c4 ksys_write+0x68/0xf0 __arm64_sys_write+0x18/0x20 el0_svc_common.constprop.0+0x70/0x170 do_el0_svc+0x20/0x80 el0_sync_handler+0x134/0x1b0 el0_sync+0x140/0x180 Code: b40000c2 f9403a60 d2800084 a9400663 (f9401400) ---[ end trace 3875369841fb288a ]--- Fixes: f3b98e3c4d2e16 ("media: vsp1: Provide support for extended command pools") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Eugeniu Rosca Reviewed-by: Kieran Bingham Tested-by: Kieran Bingham Reviewed-by: Laurent Pinchart Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 68bc36fc3f8907be647513444d62c6fff7aa474f Author: Mansur Alisha Shaik Date: Fri May 1 08:28:00 2020 +0200 media: venus: fix multiple encoder crash commit e0eb34810113dbbf1ace57440cf48d514312a373 upstream. Currently we are considering the instances which are available in core->inst list for load calculation in min_loaded_core() function, but this is incorrect because by the time we call decide_core() for second instance, the third instance not filled yet codec_freq_data pointer. Solve this by considering the instances whose session has started. Cc: stable@vger.kernel.org # v5.7+ Fixes: 4ebf969375bc ("media: venus: introduce core selection") Tested-by: Douglas Anderson Signed-off-by: Mansur Alisha Shaik Signed-off-by: Stanimir Varbanov Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 55a397051f577917722252cf49cfa5c1fd0d379f Author: Paul Cercueil Date: Mon Jun 22 23:45:48 2020 +0200 pinctrl: ingenic: Properly detect GPIO direction when configured for IRQ commit 84e7a946da71f678affacea301f6d5cb4d9784e8 upstream. The PAT1 register contains information about the IRQ type (edge/level) for input GPIOs with IRQ enabled, and the direction for non-IRQ GPIOs. So it makes sense to read it only if the GPIO has no interrupt configured, otherwise input GPIOs configured for level IRQs are misdetected as output GPIOs. Fixes: ebd6651418b6 ("pinctrl: ingenic: Implement .get_direction for GPIO chips") Reported-by: João Henrique Signed-off-by: Paul Cercueil Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200622214548.265417-2-paul@crapouillou.net Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit 8370ea3297d597ff586aa07bf6409a65bc1964da Author: Paul Cercueil Date: Mon Jun 22 23:45:47 2020 +0200 pinctrl: ingenic: Enhance support for IRQ_TYPE_EDGE_BOTH commit 1c95348ba327fe8621d3680890c2341523d3524a upstream. Ingenic SoCs don't natively support registering an interrupt for both rising and falling edges. This has to be emulated in software. Until now, this was emulated by switching back and forth between IRQ_TYPE_EDGE_RISING and IRQ_TYPE_EDGE_FALLING according to the level of the GPIO. While this worked most of the time, when used with GPIOs that need debouncing, some events would be lost. For instance, between the time a falling-edge interrupt happens and the interrupt handler configures the hardware for rising-edge, the level of the pin may have already risen, and the rising-edge event is lost. To address that issue, instead of switching back and forth between IRQ_TYPE_EDGE_RISING and IRQ_TYPE_EDGE_FALLING, we now switch back and forth between IRQ_TYPE_LEVEL_LOW and IRQ_TYPE_LEVEL_HIGH. Since we always switch in the interrupt handler, they actually permit to detect level changes. In the example above, if the pin level rises before switching the IRQ type from IRQ_TYPE_LEVEL_LOW to IRQ_TYPE_LEVEL_HIGH, a new interrupt will raise as soon as the handler exits, and the rising-edge event will be properly detected. Fixes: e72394e2ea19 ("pinctrl: ingenic: Merge GPIO functionality") Reported-by: João Henrique Signed-off-by: Paul Cercueil Tested-by: João Henrique Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200622214548.265417-1-paul@crapouillou.net Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit c0a6bc2b4f980e75964092492b34c3c4147d80db Author: Michael Ellerman Date: Tue Aug 4 22:44:06 2020 +1000 powerpc: Fix circular dependency between percpu.h and mmu.h commit 0c83b277ada72b585e6a3e52b067669df15bcedb upstream. Recently random.h started including percpu.h (see commit f227e3ec3b5c ("random32: update the net random state on interrupt and activity")), which broke corenet64_smp_defconfig: In file included from /linux/arch/powerpc/include/asm/paca.h:18, from /linux/arch/powerpc/include/asm/percpu.h:13, from /linux/include/linux/random.h:14, from /linux/lib/uuid.c:14: /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 'next_tlbcam_idx' 139 | DECLARE_PER_CPU(int, next_tlbcam_idx); This is due to a circular header dependency: asm/mmu.h includes asm/percpu.h, which includes asm/paca.h, which includes asm/mmu.h Which means DECLARE_PER_CPU() isn't defined when mmu.h needs it. We can fix it by moving the include of paca.h below the include of asm-generic/percpu.h. This moves the include of paca.h out of the #ifdef __powerpc64__, but that is OK because paca.h is almost entirely inside #ifdef CONFIG_PPC64 anyway. It also moves the include of paca.h out of the #ifdef CONFIG_SMP, which could possibly break something, but seems to have no ill effects. Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity") Cc: stable@vger.kernel.org # v5.8 Reported-by: Stephen Rothwell Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200804130558.292328-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman commit 42bce5c5f78e639073f71e0152c083832b1837b9 Author: Michael Ellerman Date: Fri Jul 24 19:25:25 2020 +1000 powerpc: Allow 4224 bytes of stack expansion for the signal frame commit 63dee5df43a31f3844efabc58972f0a206ca4534 upstream. We have powerpc specific logic in our page fault handling to decide if an access to an unmapped address below the stack pointer should expand the stack VMA. The code was originally added in 2004 "ported from 2.4". The rough logic is that the stack is allowed to grow to 1MB with no extra checking. Over 1MB the access must be within 2048 bytes of the stack pointer, or be from a user instruction that updates the stack pointer. The 2048 byte allowance below the stack pointer is there to cover the 288 byte "red zone" as well as the "about 1.5kB" needed by the signal delivery code. Unfortunately since then the signal frame has expanded, and is now 4224 bytes on 64-bit kernels with transactional memory enabled. This means if a process has consumed more than 1MB of stack, and its stack pointer lies less than 4224 bytes from the next page boundary, signal delivery will fault when trying to expand the stack and the process will see a SEGV. The total size of the signal frame is the size of struct rt_sigframe (which includes the red zone) plus __SIGNAL_FRAMESIZE (128 bytes on 64-bit). The 2048 byte allowance was correct until 2008 as the signal frame was: struct rt_sigframe { struct ucontext uc; /* 0 1440 */ /* --- cacheline 11 boundary (1408 bytes) was 32 bytes ago --- */ long unsigned int _unused[2]; /* 1440 16 */ unsigned int tramp[6]; /* 1456 24 */ struct siginfo * pinfo; /* 1480 8 */ void * puc; /* 1488 8 */ struct siginfo info; /* 1496 128 */ /* --- cacheline 12 boundary (1536 bytes) was 88 bytes ago --- */ char abigap[288]; /* 1624 288 */ /* size: 1920, cachelines: 15, members: 7 */ /* padding: 8 */ }; 1920 + 128 = 2048 Then in commit ce48b2100785 ("powerpc: Add VSX context save/restore, ptrace and signal support") (Jul 2008) the signal frame expanded to 2304 bytes: struct rt_sigframe { struct ucontext uc; /* 0 1696 */ <-- /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */ long unsigned int _unused[2]; /* 1696 16 */ unsigned int tramp[6]; /* 1712 24 */ struct siginfo * pinfo; /* 1736 8 */ void * puc; /* 1744 8 */ struct siginfo info; /* 1752 128 */ /* --- cacheline 14 boundary (1792 bytes) was 88 bytes ago --- */ char abigap[288]; /* 1880 288 */ /* size: 2176, cachelines: 17, members: 7 */ /* padding: 8 */ }; 2176 + 128 = 2304 At this point we should have been exposed to the bug, though as far as I know it was never reported. I no longer have a system old enough to easily test on. Then in 2010 commit 320b2b8de126 ("mm: keep a guard page below a grow-down stack segment") caused our stack expansion code to never trigger, as there was always a VMA found for a write up to PAGE_SIZE below r1. That meant the bug was hidden as we continued to expand the signal frame in commit 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") (Feb 2013): struct rt_sigframe { struct ucontext uc; /* 0 1696 */ /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */ struct ucontext uc_transact; /* 1696 1696 */ <-- /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */ long unsigned int _unused[2]; /* 3392 16 */ unsigned int tramp[6]; /* 3408 24 */ struct siginfo * pinfo; /* 3432 8 */ void * puc; /* 3440 8 */ struct siginfo info; /* 3448 128 */ /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */ char abigap[288]; /* 3576 288 */ /* size: 3872, cachelines: 31, members: 8 */ /* padding: 8 */ /* last cacheline: 32 bytes */ }; 3872 + 128 = 4000 And commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit userspace to 512 bytes") (Feb 2014): struct rt_sigframe { struct ucontext uc; /* 0 1696 */ /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */ struct ucontext uc_transact; /* 1696 1696 */ /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */ long unsigned int _unused[2]; /* 3392 16 */ unsigned int tramp[6]; /* 3408 24 */ struct siginfo * pinfo; /* 3432 8 */ void * puc; /* 3440 8 */ struct siginfo info; /* 3448 128 */ /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */ char abigap[512]; /* 3576 512 */ <-- /* size: 4096, cachelines: 32, members: 8 */ /* padding: 8 */ }; 4096 + 128 = 4224 Then finally in 2017, commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") exposed us to the existing bug, because it changed the stack VMA to be the correct/real size, meaning our stack expansion code is now triggered. Fix it by increasing the allowance to 4224 bytes. Hard-coding 4224 is obviously unsafe against future expansions of the signal frame in the same way as the existing code. We can't easily use sizeof() because the signal frame structure is not in a header. We will either fix that, or rip out all the custom stack expansion checking logic entirely. Fixes: ce48b2100785 ("powerpc: Add VSX context save/restore, ptrace and signal support") Cc: stable@vger.kernel.org # v2.6.27+ Reported-by: Tom Lane Tested-by: Daniel Axtens Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200724092528.1578671-2-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman commit b88bb42598e32f4b6ef4294fe6e44dae3254f632 Author: Christophe Leroy Date: Mon Jun 15 13:18:39 2020 +0000 powerpc/ptdump: Fix build failure in hashpagetable.c commit 7c466b0807960edc13e4b855be85ea765df9a6cd upstream. H_SUCCESS is only defined when CONFIG_PPC_PSERIES is defined. != H_SUCCESS means != 0. Modify the test accordingly. Fixes: 65e701b2d2a8 ("powerpc/ptdump: drop non vital #ifdefs") Cc: stable@vger.kernel.org Reported-by: kernel test robot Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/795158fc1d2b3dff3bf7347881947a887ea9391a.1592227105.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman commit a0c538b3349e9e3e0066a3c80e600a435ccd5d32 Author: Paul Aurich Date: Thu Jul 9 22:01:16 2020 -0700 cifs: Fix leak when handling lease break for cached root fid commit baf57b56d3604880ccb3956ec6c62ea894f5de99 upstream. Handling a lease break for the cached root didn't free the smb2_lease_break_work allocation, resulting in a leak: unreferenced object 0xffff98383a5af480 (size 128): comm "cifsd", pid 684, jiffies 4294936606 (age 534.868s) hex dump (first 32 bytes): c0 ff ff ff 1f 00 00 00 88 f4 5a 3a 38 98 ff ff ..........Z:8... 88 f4 5a 3a 38 98 ff ff 80 88 d6 8a ff ff ff ff ..Z:8........... backtrace: [<0000000068957336>] smb2_is_valid_oplock_break+0x1fa/0x8c0 [<0000000073b70b9e>] cifs_demultiplex_thread+0x73d/0xcc0 [<00000000905fa372>] kthread+0x11c/0x150 [<0000000079378e4e>] ret_from_fork+0x22/0x30 Avoid this leak by only allocating when necessary. Fixes: a93864d93977 ("cifs: add lease tracking to the cached root fid") Signed-off-by: Paul Aurich CC: Stable # v4.18+ Reviewed-by: Aurelien Aptel Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 12f14dc355398a3da00a881efa10cba4af479142 Author: Max Filippov Date: Fri Jul 31 12:37:32 2020 -0700 xtensa: fix xtensa_pmu_setup prototype commit 6d65d3769d1910379e1cfa61ebf387efc6bfb22c upstream. Fix the following build error in configurations with CONFIG_XTENSA_VARIANT_HAVE_PERF_EVENTS=y: arch/xtensa/kernel/perf_event.c:420:29: error: passing argument 3 of ‘cpuhp_setup_state’ from incompatible pointer type Cc: stable@vger.kernel.org Fixes: 25a77b55e74c ("xtensa/perf: Convert the hotplug notifier to state machine callbacks") Signed-off-by: Max Filippov Signed-off-by: Greg Kroah-Hartman commit 84affd7847d66644944989aadf7ea3fc2534b291 Author: Max Filippov Date: Fri Jul 31 12:38:05 2020 -0700 xtensa: add missing exclusive access state management commit a0fc1436f1f4f84e93144480bf30e0c958d135b6 upstream. The result of the s32ex opcode is recorded in the ATOMCTL special register and must be retrieved with the getex opcode. Context switch between s32ex and getex may trash the ATOMCTL register and result in duplicate update or missing update of the atomic variable. Add atomctl8 field to the struct thread_info and use getex to swap ATOMCTL bit 8 as a part of context switch. Clear exclusive access monitor on kernel entry. Cc: stable@vger.kernel.org Fixes: f7c34874f04a ("xtensa: add exclusive atomics support") Signed-off-by: Max Filippov Signed-off-by: Greg Kroah-Hartman commit d7ccfcd8445971bd2fce1ee2b8ba4e566e76754a Author: Lorenzo Bianconi Date: Mon Jul 13 13:40:19 2020 +0200 iio: imu: st_lsm6dsx: reset hw ts after resume commit a1bab9396c2d98c601ce81c27567159dfbc10c19 upstream. Reset hw time samples generator after system resume in order to avoid disalignment between system and device time reference since FIFO batching and time samples generator are disabled during suspend. Fixes: 213451076bd3 ("iio: imu: st_lsm6dsx: add hw timestamp support") Tested-by: Sean Nyekjaer Signed-off-by: Lorenzo Bianconi Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 13d49c2043a3c9dc40ece310b57962f5afaeaefa Author: Alexandru Ardelean Date: Mon Jul 6 14:02:57 2020 +0300 iio: dac: ad5592r: fix unbalanced mutex unlocks in ad5592r_read_raw() commit 65afb0932a81c1de719ceee0db0b276094b10ac8 upstream. There are 2 exit paths where the lock isn't held, but try to unlock the mutex when exiting. In these places we should just return from the function. A neater approach would be to cleanup the ad5592r_read_raw(), but that would make this patch more difficult to backport to stable versions. Fixes 56ca9db862bf3: ("iio: dac: Add support for the AD5592R/AD5593R ADCs/DACs") Reported-by: Charles Stanhope Signed-off-by: Alexandru Ardelean Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 75a04a370d4339a7d430ef3fcd89fea854689dda Author: Christian Eggers Date: Mon Jul 27 12:16:05 2020 +0200 dt-bindings: iio: io-channel-mux: Fix compatible string in example code commit add48ba425192c6e04ce70549129cacd01e2a09e upstream. The correct compatible string is "gpio-mux" (see bindings/mux/gpio-mux.txt). Cc: stable@vger.kernel.org # v4.13+ Reviewed-by: Peter Rosin Signed-off-by: Christian Eggers Link: https://lore.kernel.org/r/20200727101605.24384-1-ceggers@arri.de Signed-off-by: Rob Herring Signed-off-by: Greg Kroah-Hartman commit 3b60dbf7256391b249fd415f8370d43d97508fa3 Author: Shaokun Zhang Date: Thu Jun 18 21:35:44 2020 +0800 arm64: perf: Correct the event index in sysfs commit 539707caa1a89ee4efc57b4e4231c20c46575ccc upstream. When PMU event ID is equal or greater than 0x4000, it will be reduced by 0x4000 and it is not the raw number in the sysfs. Let's correct it and obtain the raw event ID. Before this patch: cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed event=0x001 After this patch: cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed event=0x4001 Signed-off-by: Shaokun Zhang Cc: Will Deacon Cc: Mark Rutland Cc: Link: https://lore.kernel.org/r/1592487344-30555-3-git-send-email-zhangshaokun@hisilicon.com [will: fixed formatting of 'if' condition] Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 38c8b29edf4a9099f764f7cad9048b6e32d2394a Author: Pavel Machek Date: Mon Aug 3 11:35:06 2020 +0200 btrfs: fix return value mixup in btrfs_get_extent commit 881a3a11c2b858fe9b69ef79ac5ee9978a266dc9 upstream. btrfs_get_extent() sets variable ret, but out: error path expect error to be in variable err so the error code is lost. Fixes: 6bf9e4bd6a27 ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Nikolay Borisov Signed-off-by: Pavel Machek (CIP) Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 30fb5166dfa736c15e015732f8559bf127d4bcd6 Author: Josef Bacik Date: Thu Jul 30 11:18:09 2020 -0400 btrfs: make sure SB_I_VERSION doesn't get unset by remount commit faa008899a4db21a2df99833cb4ff6fa67009a20 upstream. There's some inconsistency around SB_I_VERSION handling with mount and remount. Since we don't really want it to be off ever just work around this by making sure we don't get the flag cleared on remount. There's a tiny cpu cost of setting the bit, otherwise all changes to i_version also change some of the times (ctime/mtime) so the inode needs to be synced. We wouldn't save anything by disabling it. Reported-by: Eric Sandeen CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik Reviewed-by: David Sterba [ add perf impact analysis ] Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 12274420151323b7d4d763a844ae9efaff915d5f Author: Qu Wenruo Date: Fri Jul 31 19:29:11 2020 +0800 btrfs: trim: fix underflow in trim length to prevent access beyond device boundary commit c57dd1f2f6a7cd1bb61802344f59ccdc5278c983 upstream. [BUG] The following script can lead to tons of beyond device boundary access: mkfs.btrfs -f $dev -b 10G mount $dev $mnt trimfs $mnt btrfs filesystem resize 1:-1G $mnt trimfs $mnt [CAUSE] Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit"), we try to avoid trimming ranges that's already trimmed. So we check device->alloc_state by finding the first range which doesn't have CHUNK_TRIMMED and CHUNK_ALLOCATED not set. But if we shrunk the device, that bits are not cleared, thus we could easily got a range starts beyond the shrunk device size. This results the returned @start and @end are all beyond device size, then we call "end = min(end, device->total_bytes -1);" making @end smaller than device size. Then finally we goes "len = end - start + 1", totally underflow the result, and lead to the beyond-device-boundary access. [FIX] This patch will fix the problem in two ways: - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device This is the root fix - Add extra safety check when trimming free device extents We check and warn if the returned range is already beyond current device. Link: https://github.com/kdave/btrfs-progs/issues/282 Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit cde8857e168c6efedec93c79dedb0cb70d7bce1a Author: Filipe Manana Date: Wed Jul 29 10:17:50 2020 +0100 btrfs: fix memory leaks after failure to lookup checksums during inode logging commit 4f26433e9b3eb7a55ed70d8f882ae9cd48ba448b upstream. While logging an inode, at copy_items(), if we fail to lookup the checksums for an extent we release the destination path, free the ins_data array and then return immediately. However a previous iteration of the for loop may have added checksums to the ordered_sums list, in which case we leak the memory used by them. So fix this by making sure we iterate the ordered_sums list and free all its checksums before returning. Fixes: 3650860b90cc2a ("Btrfs: remove almost all of the BUG()'s from tree-log.c") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Johannes Thumshirn Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit dee6e1135c21326a7730446178157ada88d67d74 Author: Qu Wenruo Date: Tue Jul 28 16:39:26 2020 +0800 btrfs: inode: fix NULL pointer dereference if inode doesn't need compression commit 1e6e238c3002ea3611465ce5f32777ddd6a40126 upstream. [BUG] There is a bug report of NULL pointer dereference caused in compress_file_extent(): Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs] NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs] LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] Call Trace: [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable) [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs] [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs] [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0 [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660 [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0 [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c ---[ end trace f16954aa20d822f6 ]--- [CAUSE] For the following execution route of compress_file_range(), it's possible to hit NULL pointer dereference: compress_file_extent() |- pages = NULL; |- start = async_chunk->start = 0; |- end = async_chunk = 4095; |- nr_pages = 1; |- inode_need_compress() == false; <<< Possible, see later explanation | Now, we have nr_pages = 1, pages = NULL |- cont: |- ret = cow_file_range_inline(); |- if (ret <= 0) { |- for (i = 0; i < nr_pages; i++) { |- WARN_ON(pages[i]->mapping); <<< Crash To enter above call execution branch, we need the following race: Thread 1 (chattr) | Thread 2 (writeback) --------------------------+------------------------------ | btrfs_run_delalloc_range | |- inode_need_compress = true | |- cow_file_range_async() btrfs_ioctl_set_flag() | |- binode_flags |= | BTRFS_INODE_NOCOMPRESS | | compress_file_range() | |- inode_need_compress = false | |- nr_page = 1 while pages = NULL | | Then hit the crash [FIX] This patch will fix it by checking @pages before doing accessing it. This patch is only designed as a hot fix and easy to backport. More elegant fix may make btrfs only check inode_need_compress() once to avoid such race, but that would be another story. Reported-by: Luciano Chavez Fixes: 4d3a800ebb12 ("btrfs: merge nr_pages input and output parameter in compress_pages") CC: stable@vger.kernel.org # 4.14.x: cecc8d9038d16: btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Qu Wenruo Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 7726619a51873ac0ac73d31f7852e0eb01a0833b Author: Josef Bacik Date: Mon Jul 27 10:28:05 2020 -0400 btrfs: only search for left_info if there is no right_info in try_merge_free_space commit bf53d4687b8f3f6b752f091eb85f62369a515dfd upstream. In try_to_merge_free_space we attempt to find entries to the left and right of the entry we are adding to see if they can be merged. We search for an entry past our current info (saved into right_info), and then if right_info exists and it has a rb_prev() we save the rb_prev() into left_info. However there's a slight problem in the case that we have a right_info, but no entry previous to that entry. At that point we will search for an entry just before the info we're attempting to insert. This will simply find right_info again, and assign it to left_info, making them both the same pointer. Now if right_info _can_ be merged with the range we're inserting, we'll add it to the info and free right_info. However further down we'll access left_info, which was right_info, and thus get a use-after-free. Fix this by only searching for the left entry if we don't find a right entry at all. The CVE referenced had a specially crafted file system that could trigger this use-after-free. However with the tree checker improvements we no longer trigger the conditions for the UAF. But the original conditions still apply, hence this fix. Reference: CVE-2019-19448 Fixes: 963030817060 ("Btrfs: use hybrid extents+bitmap rb tree for free space") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 70b523613288519bf4b1a305805d626144e59c0f Author: David Sterba Date: Thu Jul 23 19:08:55 2020 +0200 btrfs: fix messages after changing compression level by remount commit 27942c9971cc405c60432eca9395e514a2ae9f5e upstream. Reported by Forza on IRC that remounting with compression options does not reflect the change in level, or at least it does not appear to do so according to the messages: mount -o compress=zstd:1 /dev/sda /mnt mount -o remount,compress=zstd:15 /mnt does not print the change to the level to syslog: [ 41.366060] BTRFS info (device vda): use zstd compression, level 1 [ 41.368254] BTRFS info (device vda): disk space caching is enabled [ 41.390429] BTRFS info (device vda): disk space caching is enabled What really happens is that the message is lost but the level is actualy changed. There's another weird output, if compression is reset to 'no': [ 45.413776] BTRFS info (device vda): use no compression, level 4 To fix that, save the previous compression level and print the message in that case too and use separate message for 'no' compression. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 0f320f3728b888417ca6e5de80a4dd21fbe2384d Author: Josef Bacik Date: Wed Jul 22 11:12:46 2020 -0400 btrfs: don't show full path of bind mounts in subvol= commit 3ef3959b29c4a5bd65526ab310a1a18ae533172a upstream. Chris Murphy reported a problem where rpm ostree will bind mount a bunch of things for whatever voodoo it's doing. But when it does this /proc/mounts shows something like /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0 /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo/bar 0 0 Despite subvolid=256 being subvol=/foo. This is because we're just spitting out the dentry of the mount point, which in the case of bind mounts is the source path for the mountpoint. Instead we should spit out the path to the actual subvol. Fix this by looking up the name for the subvolid we have mounted. With this fix the same test looks like this /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0 /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo 0 0 Reported-by: Chris Murphy CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 3505fbfa588ddd9ebe848e548284b46e5612d742 Author: Filipe Manana Date: Wed Jul 22 12:28:37 2020 +0100 btrfs: fix race between page release and a fast fsync commit 3d6448e631591756da36efb3ea6355ff6f383c3a upstream. When releasing an extent map, done through the page release callback, we can race with an ongoing fast fsync and cause the fsync to miss a new extent and not log it. The steps for this to happen are the following: 1) A page is dirtied for some inode I; 2) Writeback for that page is triggered by a path other than fsync, for example by the system due to memory pressure; 3) When the ordered extent for the extent (a single 4K page) finishes, we unpin the corresponding extent map and set its generation to N, the current transaction's generation; 4) The btrfs_releasepage() callback is invoked by the system due to memory pressure for that no longer dirty page of inode I; 5) At the same time, some task calls fsync on inode I, joins transaction N, and at btrfs_log_inode() it sees that the inode does not have the full sync flag set, so we proceed with a fast fsync. But before we get into btrfs_log_changed_extents() and lock the inode's extent map tree: 6) Through btrfs_releasepage() we end up at try_release_extent_mapping() and we remove the extent map for the new 4Kb extent, because it is neither pinned anymore nor locked. By calling remove_extent_mapping(), we remove the extent map from the list of modified extents, since the extent map does not have the logging flag set. We unlock the inode's extent map tree; 7) The task doing the fast fsync now enters btrfs_log_changed_extents(), locks the inode's extent map tree and iterates its list of modified extents, which no longer has the 4Kb extent in it, so it does not log the extent; 8) The fsync finishes; 9) Before transaction N is committed, a power failure happens. After replaying the log, the 4K extent of inode I will be missing, since it was not logged due to the race with try_release_extent_mapping(). So fix this by teaching try_release_extent_mapping() to not remove an extent map if it's still in the list of modified extents. Fixes: ff44c6e36dc9dc ("Btrfs: do not hold the write_lock on the extent tree while logging") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit b75c439472a808d552f603b61d8fa5d90dd7de6f Author: Josef Bacik Date: Tue Jul 21 11:24:27 2020 -0400 btrfs: don't WARN if we abort a transaction with EROFS commit f95ebdbed46a4d8b9fdb7bff109fdbb6fc9a6dc8 upstream. If we got some sort of corruption via a read and call btrfs_handle_fs_error() we'll set BTRFS_FS_STATE_ERROR on the fs and complain. If a subsequent trans handle trips over this it'll get EROFS and then abort. However at that point we're not aborting for the original reason, we're aborting because we've been flipped read only. We do not need to WARN_ON() here. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 758afd58b62b967d0f6dd3aee4d5e64625894b9f Author: Josef Bacik Date: Tue Jul 21 10:17:50 2020 -0400 btrfs: sysfs: use NOFS for device creation commit a47bd78d0c44621efb98b525d04d60dc4d1a79b0 upstream. Dave hit this splat during testing btrfs/078: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc6-default+ #1191 Not tainted ------------------------------------------------------ kswapd0/75 is trying to acquire lock: ffffa040e9d04ff8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] but task is already holding lock: ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x56f/0xaa0 lock_acquire+0xa3/0x440 fs_reclaim_acquire.part.0+0x25/0x30 __kmalloc_track_caller+0x49/0x330 kstrdup+0x2e/0x60 __kernfs_new_node.constprop.0+0x44/0x250 kernfs_new_node+0x25/0x50 kernfs_create_link+0x34/0xa0 sysfs_do_create_link_sd+0x5e/0xd0 btrfs_sysfs_add_devices_dir+0x65/0x100 [btrfs] btrfs_init_new_device+0x44c/0x12b0 [btrfs] btrfs_ioctl+0xc3c/0x25c0 [btrfs] ksys_ioctl+0x68/0xa0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0xe0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}: __lock_acquire+0x56f/0xaa0 lock_acquire+0xa3/0x440 __mutex_lock+0xa0/0xaf0 btrfs_chunk_alloc+0x137/0x3e0 [btrfs] find_free_extent+0xb44/0xfb0 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x143/0x7a0 [btrfs] btrfs_cow_block+0x15f/0x310 [btrfs] push_leaf_right+0x150/0x240 [btrfs] split_leaf+0x3cd/0x6d0 [btrfs] btrfs_search_slot+0xd14/0xf70 [btrfs] btrfs_insert_empty_items+0x64/0xc0 [btrfs] __btrfs_commit_inode_delayed_items+0xb2/0x840 [btrfs] btrfs_async_run_delayed_root+0x10e/0x1d0 [btrfs] btrfs_work_helper+0x2f9/0x650 [btrfs] process_one_work+0x22c/0x600 worker_thread+0x50/0x3b0 kthread+0x137/0x150 ret_from_fork+0x1f/0x30 -> #0 (&delayed_node->mutex){+.+.}-{3:3}: check_prev_add+0x98/0xa20 validate_chain+0xa8c/0x2a00 __lock_acquire+0x56f/0xaa0 lock_acquire+0xa3/0x440 __mutex_lock+0xa0/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] btrfs_evict_inode+0x3bf/0x560 [btrfs] evict+0xd6/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x121/0x1a0 do_shrink_slab+0x175/0x420 shrink_slab+0xb1/0x2e0 shrink_node+0x192/0x600 balance_pgdat+0x31f/0x750 kswapd+0x206/0x510 kthread+0x137/0x150 ret_from_fork+0x1f/0x30 other info that might help us debug this: Chain exists of: &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(&fs_info->chunk_mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/75: #0: ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 #1: ffffffff8b0b50b8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x54/0x2e0 #2: ffffa040e057c0e8 (&type->s_umount_key#26){++++}-{3:3}, at: trylock_super+0x16/0x50 stack backtrace: CPU: 2 PID: 75 Comm: kswapd0 Not tainted 5.8.0-rc6-default+ #1191 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack+0x78/0xa0 check_noncircular+0x16f/0x190 check_prev_add+0x98/0xa20 validate_chain+0xa8c/0x2a00 __lock_acquire+0x56f/0xaa0 lock_acquire+0xa3/0x440 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] __mutex_lock+0xa0/0xaf0 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] ? __lock_acquire+0x56f/0xaa0 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] ? lock_acquire+0xa3/0x440 ? btrfs_evict_inode+0x138/0x560 [btrfs] ? btrfs_evict_inode+0x2fe/0x560 [btrfs] ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] btrfs_evict_inode+0x3bf/0x560 [btrfs] evict+0xd6/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x121/0x1a0 do_shrink_slab+0x175/0x420 shrink_slab+0xb1/0x2e0 shrink_node+0x192/0x600 balance_pgdat+0x31f/0x750 kswapd+0x206/0x510 ? _raw_spin_unlock_irqrestore+0x3e/0x50 ? finish_wait+0x90/0x90 ? balance_pgdat+0x750/0x750 kthread+0x137/0x150 ? kthread_stop+0x2a0/0x2a0 ret_from_fork+0x1f/0x30 This is because we're holding the chunk_mutex while adding this device and adding its sysfs entries. We actually hold different locks in different places when calling this function, the dev_replace semaphore for instance in dev replace, so instead of moving this call around simply wrap it's operations in NOFS. CC: stable@vger.kernel.org # 4.14+ Reported-by: David Sterba Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit eb69b5ac0263a44d867d38b903fb46c492470c94 Author: Josef Bacik Date: Tue Jul 21 10:38:37 2020 -0400 btrfs: return EROFS for BTRFS_FS_STATE_ERROR cases commit fbabd4a36faaf74c83142d0b3d950c11ec14fda1 upstream. Eric reported seeing this message while running generic/475 BTRFS: error (device dm-3) in btrfs_sync_log:3084: errno=-117 Filesystem corrupted Full stack trace: BTRFS: error (device dm-0) in btrfs_commit_transaction:2323: errno=-5 IO failure (Error while writing out transaction) BTRFS info (device dm-0): forced readonly BTRFS warning (device dm-0): Skipping commit of aborted transaction. ------------[ cut here ]------------ BTRFS: error (device dm-0) in cleanup_transaction:1894: errno=-5 IO failure BTRFS: Transaction aborted (error -117) BTRFS warning (device dm-0): direct IO failed ino 3555 rw 0,0 sector 0x1c6480 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3555 rw 0,0 sector 0x1c6488 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3555 rw 0,0 sector 0x1c6490 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3555 rw 0,0 sector 0x1c6498 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3555 rw 0,0 sector 0x1c64a0 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3555 rw 0,0 sector 0x1c64a8 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3555 rw 0,0 sector 0x1c64b0 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3555 rw 0,0 sector 0x1c64b8 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3555 rw 0,0 sector 0x1c64c0 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3572 rw 0,0 sector 0x1b85e8 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3572 rw 0,0 sector 0x1b85f0 len 4096 err no 10 WARNING: CPU: 3 PID: 23985 at fs/btrfs/tree-log.c:3084 btrfs_sync_log+0xbc8/0xd60 [btrfs] BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d4288 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d4290 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d4298 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d42a0 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d42a8 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d42b0 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d42b8 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d42c0 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d42c8 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 3548 rw 0,0 sector 0x1d42d0 len 4096 err no 10 CPU: 3 PID: 23985 Comm: fsstress Tainted: G W L 5.8.0-rc4-default+ #1181 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 RIP: 0010:btrfs_sync_log+0xbc8/0xd60 [btrfs] RSP: 0018:ffff909a44d17bd0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001 RDX: ffff8f3be41cb940 RSI: ffffffffb0108d2b RDI: ffffffffb0108ff7 RBP: ffff909a44d17e70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000037988 R12: ffff8f3bd20e4000 R13: ffff8f3bd20e4428 R14: 00000000ffffff8b R15: ffff909a44d17c70 FS: 00007f6a6ed3fb80(0000) GS:ffff8f3c3dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6a6ed3e000 CR3: 00000000525c0003 CR4: 0000000000160ee0 Call Trace: ? finish_wait+0x90/0x90 ? __mutex_unlock_slowpath+0x45/0x2a0 ? lock_acquire+0xa3/0x440 ? lockref_put_or_lock+0x9/0x30 ? dput+0x20/0x4a0 ? dput+0x20/0x4a0 ? do_raw_spin_unlock+0x4b/0xc0 ? _raw_spin_unlock+0x1f/0x30 btrfs_sync_file+0x335/0x490 [btrfs] do_fsync+0x38/0x70 __x64_sys_fsync+0x10/0x20 do_syscall_64+0x50/0xe0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f6a6ef1b6e3 Code: Bad RIP value. RSP: 002b:00007ffd01e20038 EFLAGS: 00000246 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 000000000007a120 RCX: 00007f6a6ef1b6e3 RDX: 00007ffd01e1ffa0 RSI: 00007ffd01e1ffa0 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000001 R09: 00007ffd01e2004c R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000009f R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [] copy_process+0x67b/0x1b00 softirqs last enabled at (0): [] copy_process+0x67b/0x1b00 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace af146e0e38433456 ]--- BTRFS: error (device dm-0) in btrfs_sync_log:3084: errno=-117 Filesystem corrupted This ret came from btrfs_write_marked_extents(). If we get an aborted transaction via EIO before, we'll see it in btree_write_cache_pages() and return EUCLEAN, which gets printed as "Filesystem corrupted". Except we shouldn't be returning EUCLEAN here, we need to be returning EROFS because EUCLEAN is reserved for actual corruption, not IO errors. We are inconsistent about our handling of BTRFS_FS_STATE_ERROR elsewhere, but we want to use EROFS for this particular case. The original transaction abort has the real error code for why we ended up with an aborted transaction, all subsequent actions just need to return EROFS because they may not have a trans handle and have no idea about the original cause of the abort. After patch "btrfs: don't WARN if we abort a transaction with EROFS" the stacktrace will not be dumped either. Reported-by: Eric Sandeen CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik Reviewed-by: David Sterba [ add full test stacktrace ] Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit aa4140edb69a1bebd57661fcebb98c782afdd307 Author: Qu Wenruo Date: Mon Jul 13 09:03:20 2020 +0800 btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree commit f3e3d9cc35252a70a2fd698762c9687718268ec6 upstream. [BUG] There is a bug report about bad signal timing could lead to read-only fs during balance: BTRFS info (device xvdb): balance: start -d -m -s BTRFS info (device xvdb): relocating block group 73001861120 flags metadata BTRFS info (device xvdb): found 12236 extents, stage: move data extents BTRFS info (device xvdb): relocating block group 71928119296 flags data BTRFS info (device xvdb): found 3 extents, stage: move data extents BTRFS info (device xvdb): found 3 extents, stage: update data pointers BTRFS info (device xvdb): relocating block group 60922265600 flags metadata BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 unknown BTRFS info (device xvdb): forced readonly BTRFS info (device xvdb): balance: ended with status: -4 [CAUSE] The direct cause is the -EINTR from the following call chain when a fatal signal is pending: relocate_block_group() |- clean_dirty_subvols() |- btrfs_drop_snapshot() |- btrfs_start_transaction() |- btrfs_delayed_refs_rsv_refill() |- btrfs_reserve_metadata_bytes() |- __reserve_metadata_bytes() |- wait_reserve_ticket() |- prepare_to_wait_event(); |- ticket->error = -EINTR; Normally this behavior is fine for most btrfs_start_transaction() callers, as they need to catch any other error, same for the signal, and exit ASAP. However for balance, especially for the clean_dirty_subvols() case, we're already doing cleanup works, getting -EINTR from btrfs_drop_snapshot() could cause a lot of unexpected problems. From the mentioned forced read-only report, to later balance error due to half dropped reloc trees. [FIX] Fix this problem by using btrfs_join_transaction() if btrfs_drop_snapshot() is called from relocation context. Since btrfs_join_transaction() won't get interrupted by signal, we can continue the cleanup. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba 3 Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 74161225797fbf59c6eb31a4dad1f4b6cf3e147d Author: David Sterba Date: Fri Jul 10 09:49:56 2020 +0200 btrfs: add missing check for nocow and compression inode flags commit f37c563bab4297024c300b05c8f48430e323809d upstream. User Forza reported on IRC that some invalid combinations of file attributes are accepted by chattr. The NODATACOW and compression file flags/attributes are mutually exclusive, but they could be set by 'chattr +c +C' on an empty file. The nodatacow will be in effect because it's checked first in btrfs_run_delalloc_range. Extend the flag validation to catch the following cases: - input flags are conflicting - old and new flags are conflicting - initialize the local variable with inode flags after inode ls locked Inode attributes take precedence over mount options and are an independent setting. Nocompress would be a no-op with nodatacow, but we don't want to mix any compression-related options with nodatacow. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit d28c2d0b64511401ea9f7530c1aac1af27701e24 Author: Qu Wenruo Date: Mon Jul 13 09:03:21 2020 +0800 btrfs: relocation: review the call sites which can be interrupted by signal commit 44d354abf33e92a5e73b965c84caf5a5d5e58a0b upstream. Since most metadata reservation calls can return -EINTR when get interrupted by fatal signal, we need to review the all the metadata reservation call sites. In relocation code, the metadata reservation happens in the following sites: - btrfs_block_rsv_refill() in merge_reloc_root() merge_reloc_root() is a pretty critical section, we don't want to be interrupted by signal, so change the flush status to BTRFS_RESERVE_FLUSH_LIMIT, so it won't get interrupted by signal. Since such change can be ENPSPC-prone, also shrink the amount of metadata to reserve least amount avoid deadly ENOSPC there. - btrfs_block_rsv_refill() in reserve_metadata_space() It calls with BTRFS_RESERVE_FLUSH_LIMIT, which won't get interrupted by signal. - btrfs_block_rsv_refill() in prepare_to_relocate() - btrfs_block_rsv_add() in prepare_to_relocate() - btrfs_block_rsv_refill() in relocate_block_group() - btrfs_delalloc_reserve_metadata() in relocate_file_extent_cluster() - btrfs_start_transaction() in relocate_block_group() - btrfs_start_transaction() in create_reloc_inode() Can be interrupted by fatal signal and we can handle it easily. For these call sites, just catch the -EINTR value in btrfs_balance() and count them as canceled. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 2af6a37b5ef85b3a8e1680a65c390811e74c86b7 Author: Josef Bacik Date: Fri Jul 17 15:12:28 2020 -0400 btrfs: move the chunk_mutex in btrfs_read_chunk_tree commit 01d01caf19ff7c537527d352d169c4368375c0a1 upstream. We are currently getting this lockdep splat in btrfs/161: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc5+ #20 Tainted: G E ------------------------------------------------------ mount/678048 is trying to acquire lock: ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs] but task is already holding lock: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}: __mutex_lock+0x8b/0x8f0 btrfs_init_new_device+0x2d2/0x1240 [btrfs] btrfs_ioctl+0x1de/0x2d20 [btrfs] ksys_ioctl+0x87/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __lock_acquire+0x1240/0x2460 lock_acquire+0xab/0x360 __mutex_lock+0x8b/0x8f0 clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] btrfs_mount_root.cold+0x13/0xfa [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); *** DEADLOCK *** 3 locks held by mount/678048: #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380 #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs] #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] stack backtrace: CPU: 2 PID: 678048 Comm: mount Tainted: G E 5.8.0-rc5+ #20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 Call Trace: dump_stack+0x96/0xd0 check_noncircular+0x162/0x180 __lock_acquire+0x1240/0x2460 ? asm_sysvec_apic_timer_interrupt+0x12/0x20 lock_acquire+0xab/0x360 ? clone_fs_devices+0x4d/0x170 [btrfs] __mutex_lock+0x8b/0x8f0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? rcu_read_lock_sched_held+0x52/0x60 ? cpumask_next+0x16/0x20 ? module_assert_mutex_or_preempt+0x14/0x40 ? __module_address+0x28/0xf0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? static_obj+0x4f/0x60 ? lockdep_init_map_waits+0x43/0x200 ? clone_fs_devices+0x4d/0x170 [btrfs] clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] ? super_setup_bdi_name+0x79/0xd0 btrfs_mount_root.cold+0x13/0xfa [btrfs] ? vfs_parse_fs_string+0x84/0xb0 ? rcu_read_lock_sched_held+0x52/0x60 ? kfree+0x2b5/0x310 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] ? cred_has_capability+0x7c/0x120 ? rcu_read_lock_sched_held+0x52/0x60 ? legacy_get_tree+0x30/0x50 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 ? memdup_user+0x4e/0x90 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and then read the device, which takes the device_list_mutex. The device_list_mutex needs to be taken before the chunk_mutex, so this is a problem. We only really need the chunk mutex around adding the chunk, so move the mutex around read_one_chunk. An argument could be made that we don't even need the chunk_mutex here as it's during mount, and we are protected by various other locks. However we already have special rules for ->device_list_mutex, and I'd rather not have another special case for ->chunk_mutex. CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Anand Jain Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 8e62bbdda933cbb3f8fbd92cb37fac8eba566c26 Author: Josef Bacik Date: Fri Jul 17 15:12:27 2020 -0400 btrfs: open device without device_list_mutex commit 18c850fdc5a801bad4977b0f1723761d42267e45 upstream. There's long existed a lockdep splat because we open our bdev's under the ->device_list_mutex at mount time, which acquires the bd_mutex. Usually this goes unnoticed, but if you do loopback devices at all suddenly the bd_mutex comes with a whole host of other dependencies, which results in the splat when you mount a btrfs file system. ====================================================== WARNING: possible circular locking dependency detected 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Not tainted ------------------------------------------------------ systemd-journal/509 is trying to acquire lock: ffff970831f84db0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs] but task is already holding lock: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (sb_pagefaults){.+.+}-{0:0}: __sb_start_write+0x13e/0x220 btrfs_page_mkwrite+0x59/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 -> #5 (&mm->mmap_lock#2){++++}-{3:3}: __might_fault+0x60/0x80 _copy_from_user+0x20/0xb0 get_sg_io_hdr+0x9a/0xb0 scsi_cmd_ioctl+0x1ea/0x2f0 cdrom_ioctl+0x3c/0x12b4 sr_block_ioctl+0xa4/0xd0 block_ioctl+0x3f/0x50 ksys_ioctl+0x82/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #4 (&cd->lock){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 sr_block_open+0xa2/0x180 __blkdev_get+0xdd/0x550 blkdev_get+0x38/0x150 do_dentry_open+0x16b/0x3e0 path_openat+0x3c9/0xa00 do_filp_open+0x75/0x100 do_sys_openat2+0x8a/0x140 __x64_sys_openat+0x46/0x70 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #3 (&bdev->bd_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 __blkdev_get+0x6a/0x550 blkdev_get+0x85/0x150 blkdev_get_by_path+0x2c/0x70 btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] open_fs_devices+0x88/0x240 [btrfs] btrfs_open_devices+0x92/0xa0 [btrfs] btrfs_mount_root+0x250/0x490 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x119/0x380 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x8c6/0xca0 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_run_dev_stats+0x36/0x420 [btrfs] commit_cowonly_roots+0x91/0x2d0 [btrfs] btrfs_commit_transaction+0x4e6/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_commit_transaction+0x48e/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}: __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 __mutex_lock+0x7b/0x820 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: &fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sb_pagefaults); lock(&mm->mmap_lock#2); lock(sb_pagefaults); lock(&fs_info->reloc_mutex); *** DEADLOCK *** 3 locks held by systemd-journal/509: #0: ffff97083bdec8b8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x12e/0x4b0 #1: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] #2: ffff97083144d6a8 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3f8/0x500 [btrfs] stack backtrace: CPU: 0 PID: 509 Comm: systemd-journal Not tainted 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x92/0xc8 check_noncircular+0x134/0x150 __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] __mutex_lock+0x7b/0x820 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? kvm_sched_clock_read+0x14/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0xc/0xb0 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] ? sched_clock+0x5/0x10 do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x7fa3972fdbfe Code: Bad RIP value. Fix this by not holding the ->device_list_mutex at this point. The device_list_mutex exists to protect us from modifying the device list while the file system is running. However it can also be modified by doing a scan on a device. But this action is specifically protected by the uuid_mutex, which we are holding here. We cannot race with opening at this point because we have the ->s_mount lock held during the mount. Not having the ->device_list_mutex here is perfectly safe as we're not going to change the devices at this point. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik Reviewed-by: David Sterba [ add some comments ] Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 71d24eb39872d4b285fbf69f2a7282d81951a223 Author: Johannes Thumshirn Date: Mon Jul 13 21:28:58 2020 +0900 btrfs: pass checksum type via BTRFS_IOC_FS_INFO ioctl commit 137c541821a83debb63b3fa8abdd1cbc41bdf3a1 upstream. With the recent addition of filesystem checksum types other than CRC32c, it is not anymore hard-coded which checksum type a btrfs filesystem uses. Up to now there is no good way to read the filesystem checksum, apart from reading the filesystem UUID and then query sysfs for the checksum type. Add a new csum_type and csum_size fields to the BTRFS_IOC_FS_INFO ioctl command which usually is used to query filesystem features. Also add a flags member indicating that the kernel responded with a set csum_type and csum_size field. For compatibility reasons, only return the csum_type and csum_size if the BTRFS_FS_INFO_FLAG_CSUM_INFO flag was passed to the kernel. Also clear any unknown flags so we don't pass false positives to user-space newer than the kernel. To simplify further additions to the ioctl, also switch the padding to a u8 array. Pahole was used to verify the result of this switch: The csum members are added before flags, which might look odd, but this is to keep the alignment requirements and not to introduce holes in the structure. $ pahole -C btrfs_ioctl_fs_info_args fs/btrfs/btrfs.ko struct btrfs_ioctl_fs_info_args { __u64 max_id; /* 0 8 */ __u64 num_devices; /* 8 8 */ __u8 fsid[16]; /* 16 16 */ __u32 nodesize; /* 32 4 */ __u32 sectorsize; /* 36 4 */ __u32 clone_alignment; /* 40 4 */ __u16 csum_type; /* 44 2 */ __u16 csum_size; /* 46 2 */ __u64 flags; /* 48 8 */ __u8 reserved[968]; /* 56 968 */ /* size: 1024, cachelines: 16, members: 10 */ }; Fixes: 3951e7f050ac ("btrfs: add xxhash64 to checksumming algorithms") Fixes: 3831bf0094ab ("btrfs: add sha256 to checksumming algorithm") CC: stable@vger.kernel.org # 5.5+ Signed-off-by: Johannes Thumshirn Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit ca21728e18d34fd5f449bb0581160e0eaee498a6 Author: Anand Jain Date: Fri Jul 10 14:37:38 2020 +0800 btrfs: don't traverse into the seed devices in show_devname commit 4faf55b03823e96c44dc4e364520000ed3b12fdb upstream. ->show_devname currently shows the lowest devid in the list. As the seed devices have the lowest devid in the sprouted filesystem, the userland tool such as findmnt end up seeing seed device instead of the device from the read-writable sprouted filesystem. As shown below. mount /dev/sda /btrfs mount: /btrfs: WARNING: device write-protected, mounted read-only. findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111 btrfs dev add -f /dev/sdb /btrfs umount /btrfs mount /dev/sdb /btrfs findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111 All sprouts from a single seed will show the same seed device and the same fsid. That's confusing. This is causing problems in our prototype as there isn't any reference to the sprout file-system(s) which is being used for actual read and write. This was added in the patch which implemented the show_devname in btrfs commit 9c5085c14798 ("Btrfs: implement ->show_devname"). I tried to look for any particular reason that we need to show the seed device, there isn't any. So instead, do not traverse through the seed devices, just show the lowest devid in the sprouted fsid. After the patch: mount /dev/sda /btrfs mount: /btrfs: WARNING: device write-protected, mounted read-only. findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111 btrfs dev add -f /dev/sdb /btrfs mount -o rw,remount /dev/sdb /btrfs findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sdb /btrfs 595ca0e6-b82e-46b5-b9e2-c72a6928be48 mount /dev/sda /btrfs1 mount: /btrfs1: WARNING: device write-protected, mounted read-only. btrfs dev add -f /dev/sdc /btrfs1 findmnt --output SOURCE,TARGET,UUID /btrfs1 SOURCE TARGET UUID /dev/sdc /btrfs1 ca1dbb7a-8446-4f95-853c-a20f3f82bdbb cat /proc/self/mounts | grep btrfs /dev/sdb /btrfs btrfs rw,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0 /dev/sdc /btrfs1 btrfs ro,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0 Reported-by: Martin K. Petersen CC: stable@vger.kernel.org # 4.19+ Tested-by: Martin K. Petersen Signed-off-by: Anand Jain Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 2965f12627217ae2922382eb3c95906c221028c9 Author: Filipe Manana Date: Thu Jul 2 12:32:40 2020 +0100 btrfs: remove no longer needed use of log_writers for the log root tree commit a93e01682e283f6de09d6ce8f805dc52a2e942fb upstream. When syncing the log, we used to update the log root tree without holding neither the log_mutex of the subvolume root nor the log_mutex of log root tree. We used to have two critical sections delimited by the log_mutex of the log root tree, so in the first one we incremented the log_writers of the log root tree and on the second one we decremented it and waited for the log_writers counter to go down to zero. This was because the update of the log root tree happened between the two critical sections. The use of two critical sections allowed a little bit more of parallelism and required the use of the log_writers counter, necessary to make sure we didn't miss any log root tree update when we have multiple tasks trying to sync the log in parallel. However after commit 06989c799f0481 ("Btrfs: fix race updating log root item during fsync") the log root tree update was moved into a critical section delimited by the subvolume's log_mutex. Later another commit moved the log tree update from that critical section into the second critical section delimited by the log_mutex of the log root tree. Both commits addressed different bugs. The end result is that the first critical section delimited by the log_mutex of the log root tree became pointless, since there's nothing done between it and the second critical section, we just have an unlock of the log_mutex followed by a lock operation. This means we can merge both critical sections, as the first one does almost nothing now, and we can stop using the log_writers counter of the log root tree, which was incremented in the first critical section and decremented in the second criticial section, used to make sure no one in the second critical section started writeback of the log root tree before some other task updated it. So just remove the mutex_unlock() followed by mutex_lock() of the log root tree, as well as the use of the log_writers counter for the log root tree. This patch is part of a series that has the following patches: 1/4 btrfs: only commit the delayed inode when doing a full fsync 2/4 btrfs: only commit delayed items at fsync if we are logging a directory 3/4 btrfs: stop incremening log_batch for the log root tree when syncing log 4/4 btrfs: remove no longer needed use of log_writers for the log root tree After the entire patchset applied I saw about 12% decrease on max latency reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of ram, using kvm and using a raw NVMe device directly (no intermediary fs on the host). The test was invoked like the following: mkfs.btrfs -f /dev/sdk mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk dbench -D /mnt/sdk -t 300 8 umount /mnt/dsk CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit df306bf056a6023ddef9cf3a24908be316e1c21a Author: Filipe Manana Date: Thu Jul 2 12:32:20 2020 +0100 btrfs: only commit delayed items at fsync if we are logging a directory commit 5aa7d1a7f4a2f8ca6be1f32415e9365d026e8fa7 upstream. When logging an inode we are committing its delayed items if either the inode is a directory or if it is a new inode, created in the current transaction. We need to do it for directories, since new directory indexes are stored as delayed items of the inode and when logging a directory we need to be able to access all indexes from the fs/subvolume tree in order to figure out which index ranges need to be logged. However for new inodes that are not directories, we do not need to do it because the only type of delayed item they can have is the inode item, and we are guaranteed to always log an up to date version of the inode item: *) for a full fsync we do it by committing the delayed inode and then copying the item from the fs/subvolume tree with copy_inode_items_to_log(); *) for a fast fsync we always log the inode item based on the contents of the in-memory struct btrfs_inode. We guarantee this is always done since commit e4545de5b035c7 ("Btrfs: fix fsync data loss after append write"). So stop running delayed items for a new inodes that are not directories, since that forces committing the delayed inode into the fs/subvolume tree, wasting time and adding contention to the tree when a full fsync is not required. We will only do it in case a fast fsync is needed. This patch is part of a series that has the following patches: 1/4 btrfs: only commit the delayed inode when doing a full fsync 2/4 btrfs: only commit delayed items at fsync if we are logging a directory 3/4 btrfs: stop incremening log_batch for the log root tree when syncing log 4/4 btrfs: remove no longer needed use of log_writers for the log root tree After the entire patchset applied I saw about 12% decrease on max latency reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of ram, using kvm and using a raw NVMe device directly (no intermediary fs on the host). The test was invoked like the following: mkfs.btrfs -f /dev/sdk mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk dbench -D /mnt/sdk -t 300 8 umount /mnt/dsk CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit e3c6a394389aa5a617093d41b239990fa0f5f998 Author: Filipe Manana Date: Thu Jul 2 12:32:31 2020 +0100 btrfs: stop incremening log_batch for the log root tree when syncing log commit 28a9579561bcb9082715e720eac93012e708ab94 upstream. We are incrementing the log_batch atomic counter of the root log tree but we never use that counter, it's used only for the log trees of subvolume roots. We started doing it when we moved the log_batch and log_write counters from the global, per fs, btrfs_fs_info structure, into the btrfs_root structure in commit 7237f1833601dc ("Btrfs: fix tree logs parallel sync"). So just stop doing it for the log root tree and add a comment over the field declaration so inform it's used only for log trees of subvolume roots. This patch is part of a series that has the following patches: 1/4 btrfs: only commit the delayed inode when doing a full fsync 2/4 btrfs: only commit delayed items at fsync if we are logging a directory 3/4 btrfs: stop incremening log_batch for the log root tree when syncing log 4/4 btrfs: remove no longer needed use of log_writers for the log root tree After the entire patchset applied I saw about 12% decrease on max latency reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of ram, using kvm and using a raw NVMe device directly (no intermediary fs on the host). The test was invoked like the following: mkfs.btrfs -f /dev/sdk mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk dbench -D /mnt/sdk -t 300 8 umount /mnt/dsk CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 7632468bc9a3023490157cab175a524a75427026 Author: Filipe Manana Date: Thu Jul 2 12:31:59 2020 +0100 btrfs: only commit the delayed inode when doing a full fsync commit 8c8648dd1f6d62aeb912deeb788b6ac33cb782e7 upstream. Commit 2c2c452b0cafdc ("Btrfs: fix fsync when extend references are added to an inode") forced a commit of the delayed inode when logging an inode in order to ensure we would end up logging the inode item during a full fsync. By committing the delayed inode, we updated the inode item in the fs/subvolume tree and then later when copying items from leafs modified in the current transaction into the log tree (with copy_inode_items_to_log()) we ended up copying the inode item from the fs/subvolume tree into the log tree. Logging an up to date version of the inode item is required to make sure at log replay time we get the link count fixup triggered among other things (replay xattr deletes, etc). The test case generic/040 from fstests exercises the bug which that commit fixed. However for a fast fsync we don't need to commit the delayed inode because we always log an up to date version of the inode item based on the struct btrfs_inode we have in-memory. We started doing this for fast fsyncs since commit e4545de5b035c7 ("Btrfs: fix fsync data loss after append write"). So just stop committing the delayed inode if we are doing a fast fsync, we are only wasting time and adding contention on fs/subvolume tree. This patch is part of a series that has the following patches: 1/4 btrfs: only commit the delayed inode when doing a full fsync 2/4 btrfs: only commit delayed items at fsync if we are logging a directory 3/4 btrfs: stop incremening log_batch for the log root tree when syncing log 4/4 btrfs: remove no longer needed use of log_writers for the log root tree After the entire patchset applied I saw about 12% decrease on max latency reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of ram, using kvm and using a raw NVMe device directly (no intermediary fs on the host). The test was invoked like the following: mkfs.btrfs -f /dev/sdk mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk dbench -D /mnt/sdk -t 300 8 umount /mnt/dsk CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 70fb164d5263b62f7cb2b13ab75ba90bd8b5f5ff Author: Tom Rix Date: Tue Jul 7 06:29:08 2020 -0700 btrfs: ref-verify: fix memory leak in add_block_entry commit d60ba8de1164e1b42e296ff270c622a070ef8fe7 upstream. clang static analysis flags this error fs/btrfs/ref-verify.c:290:3: warning: Potential leak of memory pointed to by 're' [unix.Malloc] kfree(be); ^~~~~ The problem is in this block of code: if (root_objectid) { struct root_entry *exist_re; exist_re = insert_root_entry(&exist->roots, re); if (exist_re) kfree(re); } There is no 'else' block freeing when root_objectid is 0. Add the missing kfree to the else branch. Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Tom Rix Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 54e9870d75df4816ad8cee677d8e4fd116cad040 Author: Qu Wenruo Date: Tue Jun 16 10:17:34 2020 +0800 btrfs: don't allocate anonymous block device for user invisible roots commit 851fd730a743e072badaf67caf39883e32439431 upstream. [BUG] When a lot of subvolumes are created, there is a user report about transaction aborted: BTRFS: Transaction aborted (error -24) WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] Call Trace: create_pending_snapshots+0x82/0xa0 [btrfs] btrfs_commit_transaction+0x275/0x8c0 [btrfs] btrfs_mksubvol+0x4b9/0x500 [btrfs] btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] btrfs_ioctl+0x11a4/0x2da0 [btrfs] do_vfs_ioctl+0xa9/0x640 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 33f2f83f3d5250e9 ]--- BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown BTRFS info (device sda1): forced readonly BTRFS warning (device sda1): Skipping commit of aborted transaction. BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown [CAUSE] The error is EMFILE (Too many files open) and comes from the anonymous block device allocation. The ids are in a shared pool of size 1<<20. The ids are assigned to live subvolumes, ie. the root structure exists in memory (eg. after creation or after the root appears in some path). The pool could be exhausted if the numbers are not reclaimed fast enough, after subvolume deletion or if other system component uses the anon block devices. [WORKAROUND] Since it's not possible to completely solve the problem, we can only minimize the time the id is allocated to a subvolume root. Firstly, we can reduce the use of anon_dev by trees that are not subvolume roots, like data reloc tree. This patch will do extra check on root objectid, to skip roots that don't need anon_dev. Currently it's only data reloc tree and orphan roots. Reported-by: Greed Rong Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit 3950596063ffaaa7b7c65188c84ce16b3a39f92d Author: Qu Wenruo Date: Tue Jun 16 10:17:37 2020 +0800 btrfs: free anon block device right after subvolume deletion commit 082b6c970f02fefd278c7833880cda29691a5f34 upstream. [BUG] When a lot of subvolumes are created, there is a user report about transaction aborted caused by slow anonymous block device reclaim: BTRFS: Transaction aborted (error -24) WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] Call Trace: create_pending_snapshots+0x82/0xa0 [btrfs] btrfs_commit_transaction+0x275/0x8c0 [btrfs] btrfs_mksubvol+0x4b9/0x500 [btrfs] btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] btrfs_ioctl+0x11a4/0x2da0 [btrfs] do_vfs_ioctl+0xa9/0x640 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 33f2f83f3d5250e9 ]--- BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown BTRFS info (device sda1): forced readonly BTRFS warning (device sda1): Skipping commit of aborted transaction. BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown [CAUSE] The anonymous device pool is shared and its size is 1M. It's possible to hit that limit if the subvolume deletion is not fast enough and the subvolumes to be cleaned keep the ids allocated. [WORKAROUND] We can't avoid the anon device pool exhaustion but we can shorten the time the id is attached to the subvolume root once the subvolume becomes invisible to the user. Reported-by: Greed Rong Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit e823a80e1fb35f3146857802951fd769b26376e8 Author: David Sterba Date: Thu Jun 25 12:35:28 2020 +0200 btrfs: allow use of global block reserve for balance item deletion commit 3502a8c0dc1bd4b4970b59b06e348f22a1c05581 upstream. On a filesystem with exhausted metadata, but still enough to start balance, it's possible to hit this error: [324402.053842] BTRFS info (device loop0): 1 enospc errors during balance [324402.060769] BTRFS info (device loop0): balance: ended with status: -28 [324402.172295] BTRFS: error (device loop0) in reset_balance_state:3321: errno=-28 No space left It fails inside reset_balance_state and turns the filesystem to read-only, which is unnecessary and should be fixed too, but the problem is caused by lack for space when the balance item is deleted. This is a one-time operation and from the same rank as unlink that is allowed to use the global block reserve. So do the same for the balance item. Status of the filesystem (100GiB) just after the balance fails: $ btrfs fi df mnt Data, single: total=80.01GiB, used=38.58GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=19.99GiB, used=19.48GiB GlobalReserve, single: total=512.00MiB, used=50.11MiB CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Johannes Thumshirn Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit d9167e629c5284e7ad153738857a1ba10f30af34 Author: Ansuel Smith Date: Mon Jun 15 23:06:04 2020 +0200 PCI: qcom: Add support for tx term offset for rev 2.1.0 commit de3c4bf648975ea0b1d344d811e9b0748907b47c upstream. Add tx term offset support to pcie qcom driver need in some revision of the ipq806x SoC. Ipq8064 needs tx term offset set to 7. Link: https://lore.kernel.org/r/20200615210608.21469-9-ansuelsmth@gmail.com Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver") Signed-off-by: Sham Muthayyan Signed-off-by: Ansuel Smith Signed-off-by: Lorenzo Pieralisi Acked-by: Stanimir Varbanov Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Greg Kroah-Hartman commit e9229c75fae5252bb2281e5674392b8170d02558 Author: Ansuel Smith Date: Mon Jun 15 23:06:03 2020 +0200 PCI: qcom: Define some PARF params needed for ipq8064 SoC commit 5149901e9e6deca487c01cc434a3ac4125c7b00b upstream. Set some specific value for Tx De-Emphasis, Tx Swing and Rx equalization needed on some ipq8064 based device (Netgear R7800 for example). Without this the system locks on kernel load. Link: https://lore.kernel.org/r/20200615210608.21469-8-ansuelsmth@gmail.com Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver") Signed-off-by: Ansuel Smith Signed-off-by: Lorenzo Pieralisi Reviewed-by: Rob Herring Acked-by: Stanimir Varbanov Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Greg Kroah-Hartman commit 2b8a66bbc991674186438a91b52aefe5480d7dfa Author: Rajat Jain Date: Mon Jul 6 16:32:40 2020 -0700 PCI: Add device even if driver attach failed commit 2194bc7c39610be7cabe7456c5f63a570604f015 upstream. device_attach() returning failure indicates a driver error while trying to probe the device. In such a scenario, the PCI device should still be added in the system and be visible to the user. When device_attach() fails, merely warn about it and keep the PCI device in the system. This partially reverts ab1a187bba5c ("PCI: Check device_attach() return value always"). Link: https://lore.kernel.org/r/20200706233240.3245512-1-rajatja@google.com Signed-off-by: Rajat Jain Signed-off-by: Bjorn Helgaas Reviewed-by: Greg Kroah-Hartman Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Greg Kroah-Hartman commit c71c41cdaeabc1bd120426daf6bb0f6e840eb66c Author: Kai-Heng Feng Date: Tue Jul 28 18:45:53 2020 +0800 PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken commit 45beb31d3afb651bb5c41897e46bd4fa9980c51c upstream. We are seeing AMD Radeon Pro W5700 doesn't work when IOMMU is enabled: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=63:00.0 address=0x42b5b01a0] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=63:00.0 address=0x42b5b01c0] The error also makes graphics driver fail to probe the device. It appears to be the same issue as commit 5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken") addresses, and indeed the same ATS quirk can workaround the issue. See-also: 5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken") See-also: d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken") See-also: 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208725 Link: https://lore.kernel.org/r/20200728104554.28927-1-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng Signed-off-by: Bjorn Helgaas Acked-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 91944d34c5179616e799685b5af31bca721f48df Author: Ashok Raj Date: Thu Jul 23 15:37:29 2020 -0700 PCI/ATS: Add pci_pri_supported() to check device or associated PF commit 3f9a7a13fe4cb6e119e4e4745fbf975d30bfac9b upstream. For SR-IOV, the PF PRI is shared between the PF and any associated VFs, and the PRI Capability is allowed for PFs but not for VFs. Searching for the PRI Capability on a VF always fails, even if its associated PF supports PRI. Add pci_pri_supported() to check whether device or its associated PF supports PRI. [bhelgaas: commit log, avoid "!!"] Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS") Link: https://lore.kernel.org/r/1595543849-19692-1-git-send-email-ashok.raj@intel.com Signed-off-by: Ashok Raj Signed-off-by: Bjorn Helgaas Reviewed-by: Lu Baolu Acked-by: Joerg Roedel Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman commit bd6416bc16ecbf6548ffb41da45fde94537cd59c Author: Rafael J. Wysocki Date: Fri Jun 26 19:42:34 2020 +0200 PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context() commit dae68d7fd4930315389117e9da35b763f12238f9 upstream. If context is not NULL in acpiphp_grab_context(), but the is_going_away flag is set for the device's parent, the reference counter of the context needs to be decremented before returning NULL or the context will never be freed, so make that happen. Fixes: edf5bf34d408 ("ACPI / dock: Use callback pointers from devices' ACPI hotplug contexts") Reported-by: Vasily Averin Cc: 3.15+ # 3.15+ Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 2b9f81f7dbbe9a7ae2b52140fd65058df8b33d6a Author: Guenter Roeck Date: Tue Aug 11 11:00:01 2020 -0700 genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq() commit e27b1636e9337d1a1d174b191e53d0f86421a822 upstream. rearm_wake_irq() does not unlock the irq descriptor if the interrupt is not suspended or if wakeup is not enabled on it. Restucture the exit conditions so the unlock is always ensured. Fixes: 3a79bc63d9075 ("PCI: irq: Introduce rearm_wake_irq()") Signed-off-by: Guenter Roeck Signed-off-by: Thomas Gleixner Acked-by: Rafael J. Wysocki Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200811180001.80203-1-linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman commit af0506d7904eb9530da48dd0b60aa58d0d857ac8 Author: Thomas Gleixner Date: Fri Jul 24 22:44:41 2020 +0200 genirq/affinity: Make affinity setting if activated opt-in commit f0c7baca180046824e07fc5f1326e83a8fd150c7 upstream. John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping Signed-off-by: Thomas Gleixner Tested-by: Marc Zyngier Acked-by: Marc Zyngier Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman commit dcdc0cd115745f282db6aa077380699424377777 Author: Steve French Date: Thu Jul 16 00:34:21 2020 -0500 smb3: warn on confusing error scenario with sec=krb5 commit 0a018944eee913962bce8ffebbb121960d5125d9 upstream. When mounting with Kerberos, users have been confused about the default error returned in scenarios in which either keyutils is not installed or the user did not properly acquire a krb5 ticket. Log a warning message in the case that "ENOKEY" is returned from the get_spnego_key upcall so that users can better understand why mount failed in those two cases. CC: Stable Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman