How to parse text tables
Parsing text tables is fairly simple as long as they are regular - meaning there are repetitive patterns can be found in text. For instance this text:
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.12.13.1 98 0950.5785.5cd1 ARPA FastEthernet2.13
Internet 10.12.13.3 131 0150.7685.14d5 ARPA GigabitEthernet2.13
Internet 10.12.13.4 198 0950.5C8A.5c41 ARPA GigabitEthernet2.17
is a table and is easy to parse with TTP using this single pattern:
Internet {{ ip | IP }} {{ age | DIGIT }} {{ mac }} ARPA {{ interface }}
IP
and DIGIT
are regular expression formatters, indicating that special regexes need to be use to match ip and age variables. If we add additional entries in above text, that are different from existing ones, we will have to add more patterns in template and combine them in a group. For instance this text:
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.12.13.1 98 0950.5785.5cd1 ARPA FastEthernet2.13
Internet 10.12.13.3 131 0150.7685.14d5 ARPA GigabitEthernet2.13
Internet 10.12.13.4 198 0950.5C8A.5c41 ARPA GigabitEthernet2.17
Internet 10.12.14.5 - 0950.5C8A.5d42 ARPA GigabitEthernet3
Internet 10.12.15.6 164 0950.5C8A.5e43 ARPA GigabitEthernet4.21 *
would require two additional patterns to match all the lines:
<group name="table_data">
Internet {{ ip | IP | _start_ }} {{ age | DIGIT }} {{ mac }} ARPA {{ interface }}
Internet {{ ip | IP | _start_ }} - {{ mac }} ARPA {{ interface }}
Internet {{ ip | IP | _start_ }} {{ age | DIGIT }} {{ mac }} ARPA {{ interface }} *
</group>
We also have to use _start_ indicator, as each line is a complete match and on each subsequent match we need to save previous matches in results. However, above template can be simplified a bit:
<group name="table_data" method="table">
Internet {{ ip | IP }} {{ age }} {{ mac }} ARPA {{ interface }}
Internet {{ ip | IP }} {{ age }} {{ mac }} ARPA {{ interface }} *
</group>
Excluding DIGIT regex formatters will still allow to match all digits but will match hyphen symbol as well, in addition to that, TTP groups tag has method
attribute, this attribute makes every pattern in a group to be group start regex without the need to specify _start_ explicitly. Parsing text table data with above template will produce these results:
[ [ { 'table_data': [ { 'age': '98',
'interface': 'FastEthernet2.13',
'ip': '10.12.13.1',
'mac': '0950.5785.5cd1'},
{ 'age': '131',
'interface': 'GigabitEthernet2.13',
'ip': '10.12.13.3',
'mac': '0150.7685.14d5'},
{ 'age': '198',
'interface': 'GigabitEthernet2.17',
'ip': '10.12.13.4',
'mac': '0950.5C8A.5c41'},
{ 'age': '-',
'interface': 'GigabitEthernet3',
'ip': '10.12.14.5',
'mac': '0950.5C8A.5d42'},
{ 'age': '164',
'interface': 'GigabitEthernet4.21',
'ip': '10.12.15.6',
'mac': '0950.5C8A.5e43'}]}]]
TTP can help parsing text tables data for one more specific use case, for example this data:
VRF VRF-CUST-1 (VRF Id = 4); default RD 12345:241;
Old CLI format, supports IPv4 only
Flags: 0xC
Interfaces:
Te0/3/0.401 Te0/3/0.302 Te0/3/0.315
Te0/3/0.316 Te0/3/0.327
has text table embedded into it, and if we want to extract all the interfaces that belongs to this particular VRF, we can use this template:
<group name="vrf.{{ vrf_name }}">
VRF {{ vrf_name }} (VRF Id = {{ vrf_id}}); default RD {{ vrf_rd }};
<group name="interfaces">
Interfaces: {{ _start_ }}
{{ interfaces | ROW | joinmatches(",") }}
</group>
</group>
In above temple ROW
regex formatter will help to match all lines with words separated by 2 or more spaces between them, producing this results:
[
[
{
"vrf": {
"VRF-CUST-1": {
"interfaces": {
"interfaces": "Te0/3/0.401 Te0/3/0.302 Te0/3/0.315 Te0/3/0.316 Te0/3/0.327"
},
"vrf_id": "4",
"vrf_rd": "12345:241"
}
}
}
]
]
While TTP extracted all interfaces, they are combined in a single string, below template can be used to produce list of interfaces instead:
<group name="vrf.{{ vrf_name }}">
VRF {{ vrf_name }} (VRF Id = {{ vrf_id}}); default RD {{ vrf_rd }};
<group name="interfaces">
Interfaces: {{ _start_ }}
{{ interfaces | ROW | resub(" +", ",", 20) | split(',') | joinmatches }}
</group>
</group>
In this template same match result processed inline using resub
function to replace all consequential occurrence of spaces with singe comma character, after substitution, results processing continues through split
function, that split string into a list of items using comma character, finally, joinmatches
function tells TTP to join all matches in single list, producing these results:
[
[
{
"vrf": {
"VRF-CUST-1": {
"interfaces": {
"interfaces": [
"Te0/3/0.401",
"Te0/3/0.302",
"Te0/3/0.315",
"Te0/3/0.316",
"Te0/3/0.327"
]
},
"vrf_id": "4",
"vrf_rd": "12345:241"
}
}
}
]
]