When the British APL Association met in August Dan Baronet gave us a taster for the sort of thing that could be done with RegEx in Dyalog. I have a recording I am cleaning up but in the meantime here is a taster. First create a formatting function to push a string through ⎕xml twice just to make it easier to see whats going on.
>f←(⎕xml⍣2)
Set up a text string with some HTML:
f str
<a id="some-name" class="autoheadlink">
<h2>some text...</h2>
</a>
<a id="another-name" class="autoheadlink">
<h2>some text...</h2>
</a>
str←'<a id="some-name" class="autoheadlink"><h2>some text...</h2></a>'
View the formatted string:
f str
<a id="some-name" class="autoheadlink">
<h2>some text...</h2>
</a>
<a id="another-name" class="autoheadlink">
<h2>some text...</h2>
</a>
str←'<a id="some-name" class="autoheadlink"><h2>some text...</h2></a>'
We need to invert the structure for the header and the anchor keeping the text in place. So create a search VTV with some RegEx and another VTV to specify what to do with the result of our search.
s← '(<a.*auto.*>)(<h\d>)' '(</h\d>)(</a>)'
⍝ find an opening anchor-header pairing then the closing header-anchor pairing
r←'\2\1'
⍝ replace the 1st find with the 2nd and vice versa
Now do the search and replace on the string and format the result:
f(s ⎕R r )str
<h2>
<a id="some-name" class="autoheadlink">
<h2>some text...</h2>
</a>
<a id="another-name" class="autoheadlink">some text...</a>
</h2>
Close but no cigar. The search engulfed everything into the first and last header tag. So we tell it not to be so greedy.
f(s ⎕R r ⍠ 'Greedy' 0)str
<h2>
<a id="some-name" class="autoheadlink">some text...</a>
</h2>
<h2>
<a id="another-name" class="autoheadlink">some text...</a>
</h2>
RegEx can also be used with the ]locate user command to search the workspace. Fix some functions:
⎕fx'(a n)←foo' 'a n←⎕a ⎕d'
⎕fx'n←num' 'n←⎕d'
⎕fx'c←char' 'c←⎕a'
foo num char
ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 0123456789 ABCDEFGHIJKLMNOPQRSTUVWXYZ
With simple ]locate we can find a calls to quad functions:
]locate ⎕
∇ #.char (1 found)
[1] c←⎕A
∧
∇ #.f (1 found)
[0] ⎕XML ⍣ 2
∧
∇ #.foo (2 found)
[1] a n←⎕A ⎕D
∧ ∧
∇ #.num (1 found)
[1] n←⎕D
∧
But we can use ]locate with RegEx to restrict the search to find ⎕A ⎕D only.
]locate '⎕[AD]' -pattern
∇ #.char (1 found)
[1] c←⎕A
∧∧
∇ #.foo (2 found)
[1] a n←⎕A ⎕D
∧∧ ∧∧
∇ #.num (1 found)
[1] n←⎕D
∧∧
Or where have a pair of APL names (in this case variables) been assigned values from ⎕A or ⎕D.
]locate '⍺ ⍺←.*⎕[AD]' -pattern
∇ #.foo (1 found)
[1] a n←⎕A ⎕D
∧∧∧∧∧∧∧∧∧
Obviously there is much more to RegEx itself but even some simple wildcard searching can give you a big leg up.