RegEx with ⎕R and ]locate

John 'Jake' JacobAPLLeave a Comment

When the British APL Association met in August Dan Baronet gave us a taster for the sort of thing that could be done with RegEx in Dyalog. I have a recording I am cleaning up but in the meantime here is a taster. First create a formatting function to push a string through ⎕xml twice just to make it easier to see whats going on.

>f←(⎕xml⍣2)

Set up a text string with some HTML:


f str
<a id="some-name" class="autoheadlink">
<h2>some text...</h2>
</a>
<a id="another-name" class="autoheadlink">
<h2>some text...</h2>
</a>
str←'<a id="some-name" class="autoheadlink"><h2>some text...</h2></a>'

View the formatted string:

f str
<a id="some-name" class="autoheadlink">
<h2>some text...</h2>
</a>
<a id="another-name" class="autoheadlink">
<h2>some text...</h2>
</a>
str←'<a id="some-name" class="autoheadlink"><h2>some text...</h2></a>'

We need to invert the structure for the header and the anchor keeping the text in place. So create a search VTV with some RegEx and another VTV to specify what to do with the result of our search.

s← '(<a.*auto.*>)(<h\d>)' '(</h\d>)(</a>)' 
⍝ find an opening anchor-header pairing then the closing header-anchor pairing
r←'\2\1'
⍝ replace the 1st find with the 2nd and vice versa

Now do the search and replace on the string and format the result:

f(s ⎕R r )str
<h2> 
   <a id="some-name" class="autoheadlink"> 
   <h2>some text...</h2> 
   </a> 
   <a id="another-name" class="autoheadlink">some text...</a> 
</h2>

Close but no cigar. The search engulfed everything into the first and last header tag. So we tell it not to be so greedy.

f(s ⎕R r ⍠ 'Greedy' 0)str
<h2>
<a id="some-name" class="autoheadlink">some text...</a>
</h2>
<h2>
<a id="another-name" class="autoheadlink">some text...</a>
</h2>

RegEx can also be used with the ]locate user command to search the workspace. Fix some functions:

⎕fx'(a n)←foo' 'a n←⎕a ⎕d'
 ⎕fx'n←num' 'n←⎕d'
 ⎕fx'c←char' 'c←⎕a'

 foo num char
 ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 0123456789 ABCDEFGHIJKLMNOPQRSTUVWXYZ

With simple ]locate we can find a calls to quad functions:

]locate ⎕

∇ #.char (1 found) 
[1] c←⎕A 
      ∧

 ∇ #.f (1 found) 
[0] ⎕XML ⍣ 2 
    ∧

 ∇ #.foo (2 found) 
[1] a n←⎕A ⎕D 
        ∧   ∧

 ∇ #.num (1 found) 
[1] n←⎕D 
      ∧

But we can use ]locate with RegEx to restrict the search to find ⎕A ⎕D only.

]locate '⎕[AD]' -pattern

∇ #.char (1 found) 
[1] c←⎕A 
      ∧∧

 ∇ #.foo (2 found) 
[1] a n←⎕A ⎕D 
        ∧∧  ∧∧

 ∇ #.num (1 found) 
[1] n←⎕D 
      ∧∧

Or where have a pair of APL names (in this case variables) been assigned values from ⎕A or ⎕D.

]locate '⍺ ⍺←.*⎕[AD]' -pattern
∇ #.foo (1 found) 
[1] a n←⎕A ⎕D 
    ∧∧∧∧∧∧∧∧∧

Obviously there is much more to RegEx itself but even some simple wildcard searching can give you a big leg up.