On-Device LLM Inference: Cost and Latency Implications for Consumer Fintech
Abstract
Benchmarking a 3.8B-parameter open model across nine ARM SoCs, we surface a non-linear relationship between sustained TOPS and tokens-per-second once thermal throttling enters the picture, and translate the result into unit-economics guidance for mobile-first fintechs.
Keywords
1. Introduction
This article is a demonstration of the standard CCPM working-paper layout. The structure, metadata, and citation furniture shown here are the same across every series operating under the framework, so prospective student authors and reviewers can see exactly what a published artifact will look like before they begin a submission.
2. Background and motivation
The CCPM framework is operated by the Centre for Fintech and Strategic Business Research (CFSBR). Each partner club retains editorial autonomy over its own series while CFSBR coordinates DOI registration, ORCID verification, and open-access hosting. The result is a consistent, citable output without locking any club into a single template or theme.
3. Method
Because this is a placeholder, the method section here is intentionally generic. A real submission would describe data sources, sample construction, instruments, and any computational tooling used, in enough detail that a peer could reproduce the analysis or judge its limits.
4. Findings
The findings section in a real paper would carry the headline numbers, tables, and figures. The demo body is kept short so the page renders quickly and the surrounding furniture - DOI bar, author meta, citation block - is easy to inspect.
5. Discussion and implications
A discussion section connects the findings back to the literature, flags threats to validity, and proposes what a follow-up study or applied use would look like. Authors are encouraged to be honest about scope and to write for the reader who will cite them.
How to cite
Ishrak Karim (2026). On-Device LLM Inference: Cost and Latency Implications for Consumer Fintech. Issue 01, CCPM Working Paper Series. https://doi.org/10.67226/aubc.wps.2026.003