- 10 Jun, 2021 1 commit
-
-
RSX24 user authored
-
- 03 Jun, 2021 1 commit
-
-
Anthony Castaldo authored
LIBPFM4 update per Stephane Eranian commits on 05-27-2021
-
- 28 May, 2021 1 commit
-
-
Anthony Castaldo authored
Tested on Histamine (Zen2) Dopamine (Zen2) Morphine (Zen1) XSDK (Intel). commit 74b79969f2f752df3be404d9c23f9709d738062f Author: Stephane Eranian <eranian@gmail.com> Date: Thu May 27 14:30:54 2021 -0700 fix buffer overrun in Intel IcelakeX model table The following commit introduced a bug: 12aeb9f69438 enable Intel IcelakeX core PMU support By forgetting a NULL termination to the icx_models[] table. Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit e2bd6b5b573b124d5c07670cfc9f0923b6223288 Author: Stephane Eranian <eranian@gmail.com> Date: Wed May 26 22:15:12 2021 -0700 fix Intel Icelake man page date No Icelake in 2015! Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit 12aeb9f694382bbf82061ac0b28abb5d2178fe8d Author: Stephane Eranian <eranian@gmail.com> Date: Sun May 9 16:30:54 2021 -0700 enable Intel IcelakeX core PMU support This patch adds Intel IcelakeX (Icelake for servers) core PMU support. This is the same core PMU as for the client Icelake with the addition of events to cover remote and PMM accesses. Based on Intel's icelakex_core_v1.04.json from 01.org. Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit 9c3e9c025efc06f4ac4422d5e87a05d9776cbb94 Author: Vince Weaver <vincent.weaver@maine.edu> Date: Wed May 26 22:00:27 2021 -0700 fix detection of AMD64 Zen1 vs. Zen2 This patch fixes the test checking the model number for AMD64 Fam17h processors. There was a bug where it would detect some Zen1 processors as Zen2. Zen2 processors start at model number 48 and up. Signed-off-by:
Vince Weaver <vincent.weaver@maine.edu> commit dee24f6323023573f22dc68882cea44859c0b7ac Author: Stephane Eranian <eranian@gmail.com> Date: Wed May 12 11:08:35 2021 -0700 add ARM SPE events for Neoverse N1 core PMU This patches adds the four Statistical Profiling Extension (SPE) related core PMU events: - SAMPLE_POP - SAMPLE_FEED - SAMPLE_FILTRATE - SAMPLE_COLLISON Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit 21787c7cca3b8b4d02e5608bfef9bdfa7acd7d8e Author: Stephane Eranian <eranian@gmail.com> Date: Sun May 9 15:45:18 2021 -0700 fix pfm_get_os_event_encoding man page typos There is no PERF_OS_EVENT enum, should be PFM_OS_PERF_EVENT. Signed-off-by:
Stephane Eranian <eranian@gmail.com>
-
- 06 May, 2021 1 commit
-
-
Anthony Castaldo authored
libpfm4 update from Eranian on 05-03-2021. Update libpfm4, to be current with the following commit: The ZEN3 modification cannot be tested; we have no ZEN3 machine. The other changes are not machine specific; we did a smoke test (compile and execute papi_component_avail, papi_native_avail) on ICLs xsdk machine. commit 06197c0543476d40fad1c94d240e46a5d114f887 Author: Stephane Eranian eranian@gmail.com Date: Mon May 3 21:45:59 2021 -0700 enable RAPL for AMD64 Fam19h Zen3 processor As per AMD64 PPR for Fam19h model 01h, RAPL Package is supported, so enable it. Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit be0dd1e0f63cb3d0915bc368baebe778792b6955 Author: Namhyung Kim namhyung@google.com Date: Mon May 3 21:43:21 2021 -0700 Add cgroup-switches software event Linux v5.13 added the 'cgroup-switches' event so it should be supported by libpfm4 as well. Signed-off-by:
Namhyung Kim <namhyung@google.com> commit d624a97b8e2143e1b890ac1a892b4620acb736f5 Author: Stephane Eranian eranian@gmail.com Date: Sun May 2 23:43:17 2021 -0700 fix arg type in pfm_get_os_event_encoding() man page This patch replaces references to pfm_raw_pmu_encode_t with pfm_pmu_encode_t to reflect the actual data type used in the code. Thanks to Claudio Parra for reporting the issue. Signed-off-by:
Stephane Eranian <eranian@gmail.com> Approved-by: Damien Genet
-
- 05 May, 2021 1 commit
-
-
Anthony Castaldo authored
Yet another cuda context fix Approved-by: Damien Genet
-
- 04 May, 2021 3 commits
-
-
Anthony Castaldo authored
retained context when it is identical to the current context causes an error. Also updated all error exits to properly restore user context.
-
Anthony Castaldo authored
cannot be tested; we have no ZEN3 machine. The other changes are not machine specific; we did a smoke test (compile and execute papi_component_avail, papi_native_avail) on ICLs xsdk machine. commit 06197c0543476d40fad1c94d240e46a5d114f887 Author: Stephane Eranian <eranian@gmail.com> Date: Mon May 3 21:45:59 2021 -0700 enable RAPL for AMD64 Fam19h Zen3 processor As per AMD64 PPR for Fam19h model 01h, RAPL Package is supported, so enable it. Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit be0dd1e0f63cb3d0915bc368baebe778792b6955 Author: Namhyung Kim <namhyung@google.com> Date: Mon May 3 21:43:21 2021 -0700 Add cgroup-switches software event Linux v5.13 added the 'cgroup-switches' event so it should be supported by libpfm4 as well. Signed-off-by:
Namhyung Kim <namhyung@google.com> commit d624a97b8e2143e1b890ac1a892b4620acb736f5 Author: Stephane Eranian <eranian@gmail.com> Date: Sun May 2 23:43:17 2021 -0700 fix arg type in pfm_get_os_event_encoding() man page This patch replaces references to pfm_raw_pmu_encode_t with pfm_pmu_encode_t to reflect the actual data type used in the code. Thanks to Claudio Parra for reporting the issue. Signed-off-by:
Stephane Eranian <eranian@gmail.com>
-
Anthony Castaldo authored
Improved auto-detect ROCM root directory Approved-by: Damien Genet
-
- 03 May, 2021 5 commits
-
-
Anthony Castaldo authored
One line change: Correcting a typo that can cause a segfault. Approved-by: Damien Genet
-
Anthony Castaldo authored
-
Anthony Castaldo authored
-
Anthony Castaldo authored
-
Anthony Castaldo authored
Non-primary context issues fixed. Approved-by: Damien Genet
-
- 29 Apr, 2021 2 commits
-
-
Anthony Castaldo authored
-
Anthony Castaldo authored
-
- 28 Apr, 2021 2 commits
-
-
Anthony Castaldo authored
accomodate issues with non-primary contexts.
-
Anthony Castaldo authored
-
- 23 Apr, 2021 2 commits
-
-
Anthony Castaldo authored
-
Will Cohen authored
Check to ensure that mallocs allocated memory in papi_multiplex_cost.c Approved-by: Damien Genet
-
- 22 Apr, 2021 5 commits
-
-
Anthony Michael Castaldo authored
is not always necessary on systems that load modules. We recognize environment variables ROCM_PATH, ROCM_DIR, and ROCMDIR. At compile time, we have code in Rules.rocm that can examine the LD_LIBRARY_PATH variable and extract possible -Iinclude_paths for the compile. This uses 'awk', but if 'awk' is not present on the system it won't cause an error message. We will also still use PAPI_ROCM_ROOT at compile time, preferentially, when specified. README.md has been updated to reflect these changes.
-
Will Cohen authored
Makefile to generate papi-x.y.z.tar.gz directly from git repo Approved-by: Damien Genet
-
Anthony Castaldo authored
-
William Cohen authored
- 20 Apr, 2021 2 commits
-
-
William Cohen authored
The malloc function can return NULL if the function is unable to allocate memory. papi_multiplex_cost.c needs checks like papi_command_line.c has and exit the program with an error if any of the malloc operations fail.
-
Anthony Castaldo authored
This code corrects an oversight and works if the application has already created a non-primary context before calling PAPI_library_init(). A modification of HelloWorld.cu, named HelloWorld_NP_Ctx.cu, will test if if the code works with a non-primary context created; HelloWorld.cu tests without creating a non-primary context. This was tested on XSDK with two Titan V GPUs. Approved-by: Damien Genet
-
- 13 Apr, 2021 5 commits
-
-
Anthony Castaldo authored
-
Anthony Castaldo authored
already created a non-primary context before calling PAPI_library_init(). A modification of HelloWorld.cu, HelloWorld_NP_Ctx.cu, will test if if the code works with a non-primary context created; HelloWorld.cu tests without creating a non-primary context. This was tested on XSDK with two Titan V GPUs.
-
Peinan Zhang authored
Adding intel_gpu component Approved-by: adanalis Approved-by: Damien Genet
-
Peinan Zhang authored
-
adanalis authored
Automatically add "-lstdc++" to Makefiles if and only if the intel_gpu component is selected. Approved-by: Damien Genet
-
- 12 Apr, 2021 1 commit
-
-
Anthony authored
-
- 09 Apr, 2021 1 commit
-
-
adanalis authored
-
- 07 Apr, 2021 2 commits
-
-
Anthony Castaldo authored
libpfm4 update incorporating latest fixes received 04-06-2021. Update libpfm4, to be current with the following commit: The following fixes are for AMD Zen3 CPUs, untested by ICL, we have no access to Zen3 processors at this time. commit 6864dad7cf85fac9fff04bd814026e2fbc160175 Author: Stephane Eranian eranian@gmail.com Date: Tue Apr 6 22:37:51 2021 -0700 Fix AMD64 Fam19h L3 PMU support The PMU perf_events type was not correctly encoded because the .perf_name field was not initialized and therefore it defaulted to using the core PMU. The correct perf_name is "amd_l3". With that in place, the library now picks up the correct PMU type and associated programming restrictions, e.g., per-cpu mode only and code such as perf_examples/self should not be allow to succeed at perf_event_open(). Reported-by:
Steve Kaufmann <steven.kaufmann@hpe.com> Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit 99975b4738cf7f2550922f0761f2776159842c00 Author: Stephane Eranian eranian@google.com Date: Fri Apr 2 12:38:56 2021 -0700 fix grpid handling for Intel X86 uncore On SkylakeX the umask grpid field is overloaded to contain two subfield. The actual grpid and the required grpid (at offset 8). The encoding code has a bug where it would not use the accessor function get_grpid() to extract the group id from the field. Given that the grpid is used in statements such as: u = 1 << pe[e->event].umasks[a->idx].grpid; The code could run the risk of exceeding the max shift for a 16-bit value. The fix is to use accessor function to extract the grpid. The patch also adds a validation test to ensure events which would cause a large grpid are properly encoded. Signed-off-by:
Stephane Eranian <eranian@gmail.com> Approved-by: Damien Genet
-
Anthony Castaldo authored
we have no access to Zen3 processors at this time. Update libpfm4, to be current with the following commit: commit 6864dad7cf85fac9fff04bd814026e2fbc160175 Author: Stephane Eranian <eranian@gmail.com> Date: Tue Apr 6 22:37:51 2021 -0700 Fix AMD64 Fam19h L3 PMU support The PMU perf_events type was not correctly encoded because the .perf_name field was not initialized and therefore it defaulted to using the core PMU. The correct perf_name is "amd_l3". With that in place, the library now picks up the correct PMU type and associated programming restrictions, e.g., per-cpu mode only and code such as perf_examples/self should not be allow to succeed at perf_event_open(). Reported-by:
Steve Kaufmann <steven.kaufmann@hpe.com> Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit 99975b4738cf7f2550922f0761f2776159842c00 Author: Stephane Eranian <eranian@google.com> Date: Fri Apr 2 12:38:56 2021 -0700 fix grpid handling for Intel X86 uncore On SkylakeX the umask grpid field is overloaded to contain two subfield. The actual grpid and the required grpid (at offset 8). The encoding code has a bug where it would not use the accessor function get_grpid() to extract the group id from the field. Given that the grpid is used in statements such as: u = 1 << pe[e->event].umasks[a->idx].grpid; The code could run the risk of exceeding the max shift for a 16-bit value. The fix is to use accessor function to extract the grpid. The patch also adds a validation test to ensure events which would cause a large grpid are properly encoded. Signed-off-by:
Stephane Eranian <eranian@gmail.com>
-
- 05 Apr, 2021 2 commits
-
-
adanalis authored
-
Anthony Castaldo authored
Update libpfm4 for changes on 03-23-2021. Approved-by: Damien Genet Approved-by: Heike Jagode
-
- 26 Mar, 2021 1 commit
-
-
Anthony Castaldo authored
It affects the following processors we do not have to test on; A64FX (Fujitsu ARM),AMD Zen3, Intel TigerLake and RocketLake. Update libpfm4, to be current with the following commit: commit c132ab4948a828334a8fef00303a4b47f59bb4d9 Author: Stephane Eranian <eranian@gmail.com> Date: Tue Mar 23 10:11:40 2021 -0700 Add prefix to AMD Fam19h Zen3 L3 events To avoid potential conflict with other core PMU events and make it more explicit these are uncore L3 events following the model of Intel uncore PMUs. Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit a97908e8e6b6a28ae369dfbc9af97b52fe932273 Author: Stephane Eranian <eranian@gmail.com> Date: Tue Mar 23 00:31:40 2021 -0700 Enable Intel Tigerlake and Rocketlake core PMU support They are equivalent to Intel Icelake, so reuse the same event table. Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit 315941fc05f5a487e4eb5efd36ea10438336944b Author: Stephane Eranian <eranian@gmail.com> Date: Thu Mar 18 23:13:57 2021 -0700 add AMD64 Fam19h Zen3 L3 PMU support This patch adds the AMD Fam19h (Zen3) L3 PMU support consisting of 3 published events. new PMU model: amd64_fam19h_zen3_l3 Based on the public specifications PPR (#55898) Rev 0.35 - Feb 5, 2021. Available at: https://www.amd.com/system/files/TechDocs/55898_pub.zip Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit e2afb6186dab2419a4b6f79a6adf7cd9bb0f2340 Author: Stephane Eranian <eranian@gmail.com> Date: Mon Mar 15 12:04:48 2021 -0700 Add AMD64 Fam17h Zen2 RAPL support This patch adds RAPL support for AMD64 Fam17h Zen2 processors. On Zen2, only the RAPL_ENERGY_PKGS event is supported. Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit cc4ba27e55440f87359bee5176380db1ba4ef8af Author: Swarup Sahoo <swarup-chandra.sahoo@amd.com> Date: Tue Mar 2 01:49:51 2021 +0530 Add AMD64 Fam19h Zen3 core PMU support The patch adds a core PMU support for AMD Fam19h Zen3. new PMU model: amd64_fam19h_zen3 Based on the public specifications PPR (#55898) Rev 0.35 - Feb 5, 2021. Available at: https://www.amd.com/system/files/TechDocs/55898_pub.zip Signed-off-by:
Swarup Sahoo <swarup-chandra.sahoo@amd.com> commit 5333f3245954b038100530a17675bbbafdae3061 Author: Stephane Eranian <eranian@gmail.com> Date: Sun Jan 31 00:01:36 2021 -0800 Fix casting issues reported by PGI compiler The PGI compiler does not like: struct { unsigned long field; }; struct.field = -1, So clean this up and various others casting issues reported by Carl Ponder on the bugs. Signed-off-by:
Stephane Eranian <eranian@gmail.com> commit f6500e77563e606c8510ff26f57d321328bd8157 Author: Masahiko, Yamada <yamada.masahiko@fujitsu.com> Date: Wed Jan 27 20:12:59 2021 +0900 Changing the number of PMU counters and deleting the ARM(32-bit) mode for A64FX The current libpfm4 implementation treats PMCR_EL0.N = 0x6 like other ARM Reference processors. On an A64FX, PMCR_EL0.N = 0x8 (The number of PMU counters is 8.). Therefore, only 6 counters are available in the current implementation. The A64FX core also supports the AArch64 state and the A64 Instruction set. The AArch32 state and the A32, T32 Instruction set are not supported and cannot be transitioned to this Execution state. Currently, the libpfm manual(docs/man3/libpfm_arm_a64fx.3) states that A32/A64 can be used, but A32 cannot be used. I have created a patch with the above fixes, so please review and merge it. Originally, the specification of the A64FX which Fujitsu published should have described the above two points, but the description was omitted. A64FX Specification HPC Extension v1.1 will add:. - On a A64FX, PMCR_EL0.N = 0x8 (The number of PMU counters is 8.). - A64FX does not support the AArch32 state and the A32, T32 Instruction set and cannot transition to this Execution state.
-
- 18 Mar, 2021 1 commit
-
-
Frank Winkler authored
Modified recording of hl regions and improved hl performance report Approved-by: Damien Genet
-
- 11 Mar, 2021 1 commit
-
-
Frank Winkler authored
-