Scores: Accuracy / F1 Score / BERTScore (F1).
| Model Grouping | Model Name | Tag Extractionopen_in_new | Value Extractionopen_in_new | Formula Constructionopen_in_new | Formula Calculationopen_in_new | FinanceBenchopen_in_new | Financial Mathopen_in_new |
|---|---|---|---|---|---|---|---|
| Financial Models | Model | - | - | - | - | - | - |
| Base Models | Llama 3.1 8Bopen_in_new | 69.16 0.739 |
52.46 0.565 |
12.92 0.201 |
27.27 0.317 |
0.443 | 11.00 0.136 |
| Llama 3.1 70Bopen_in_new | 69.64 0.782 |
88.19 0.904 |
59.28 0.665 |
77.49 0.783 |
0.528 | 10.50 0.134 |
|
| DeepSeek V3open_in_new | 85.03 0.849 |
98.01 0.982 |
22.75 0.315 |
85.99 0.868 |
0.573 | 21.50 0.255 |
|
| GPT-4oopen_in_new | 81.60 0.864 |
97.01 0.974 |
79.76 0.820 |
83.59 0.857 |
0.564 | 27.00 0.296 |
|
| Gemini 2.0 FLopen_in_new | 80.27 0.811 |
98.02 0.980 |
61.90 0.644 |
53.57 0.536 |
0.552 | 19.00 0.204 |
|
| Fine-tuned Models | Llama 3.1 8B LoRAopen_in_new | 89.13 0.886 |
98.49 0.986 |
77.61 0.876 |
98.68 0.990 |
0.511 | 30.00 0.332 |
| Llama 3.1 8B QLoRAopen_in_new | 86.89 0.872 |
97.14 0.974 |
89.34 0.898 |
92.81 0.947 |
0.542 | 26.50 0.307 |
|
| Llama 3.1 8B DoRAopen_in_new | 80.44 0.896 |
98.57 0.988 |
88.02 0.882 |
98.92 0.993 |
0.477 | 28.50 0.317 |
|
| Llama 3.1 8B rsLoRAopen_in_new | 85.26 0.879 |
99.13 0.992 |
89.46 0.893 |
98.80 0.988 |
0.575 | 34.50 0.370 |
|
| Gemini 2.0 FL N/A | 85.03 0.907 |
99.20 0.992 |
67.85 0.786 |
54.76 0.548 |
0.544 | 66.00 0.785 |