Industry Brief·Issue 01 · 2026·Working Paper #003

On-Device LLM Inference: Cost and Latency Implications for Consumer Fintech

Author

Ishrak Karim

ORCID iD

0009-0001-9923-0142

Published

21 June 2026

Licence

CC BY-NC 4.0

Permanent identifier (DOI)

https://doi.org/10.67226/aubc.wps.2026.003

Abstract

Benchmarking a 3.8B-parameter open model across nine ARM SoCs, we surface a non-linear relationship between sustained TOPS and tokens-per-second once thermal throttling enters the picture, and translate the result into unit-economics guidance for mobile-first fintechs.

Keywords

Industry BriefCCPMOpen accessUndergraduate research

1. Introduction

This article is a demonstration of the standard CCPM working-paper layout. The structure, metadata, and citation furniture shown here are the same across every series operating under the framework, so prospective student authors and reviewers can see exactly what a published artifact will look like before they begin a submission.

2. Background and motivation

The CCPM framework is operated by the Centre for Fintech and Strategic Business Research (CFSBR). Each partner club retains editorial autonomy over its own series while CFSBR coordinates DOI registration, ORCID verification, and open-access hosting. The result is a consistent, citable output without locking any club into a single template or theme.

3. Method

Because this is a placeholder, the method section here is intentionally generic. A real submission would describe data sources, sample construction, instruments, and any computational tooling used, in enough detail that a peer could reproduce the analysis or judge its limits.

4. Findings

The findings section in a real paper would carry the headline numbers, tables, and figures. The demo body is kept short so the page renders quickly and the surrounding furniture - DOI bar, author meta, citation block - is easy to inspect.

5. Discussion and implications

A discussion section connects the findings back to the literature, flags threats to validity, and proposes what a follow-up study or applied use would look like. Authors are encouraged to be honest about scope and to write for the reader who will cite them.

How to cite

Ishrak Karim (2026). On-Device LLM Inference: Cost and Latency Implications for Consumer Fintech. Issue 01, CCPM Working Paper Series. https://doi.org/10.67226/aubc.wps.2026.003

← Back to all seriesDemo · Not a real publication