X Close

Digital Education team blog

Home

Ideas and reflections from UCL's Digital Education team

Menu

Archive for the 'Tyson’s Tyrades' Category

Half the struggle with digital knowledge is knowing what it’s for

By Jim R Tyson, on 10 April 2024

Sometimes I hear an announcement about an update or improvement to some technology or ‘app’ (as the youngsters say) that sounds exciting and eminently worth investigating.  When I start sifting through Google results to find out more, I can spend a day or two sometimes working out what’s going on.  That’s OK, it’s part of my job to do this and then, if what I’ve learned is useful, to find ways to communicate it to other people.

For example, Excel now allows users to create ad hoc and custom functions using lambda().  Now, if you are a computer scientist, mathematician, philosopher or linguist, you will probably have heard of the lambda calculus, an important mathematical invention of the twentieth century that influenced all those disciplines.  It provided a way to formally characterise computation as function application and abstraction (roughly, don’t quote me on this – it was a long time ago).  Now, even students of computer science may sometimes encounter the calculus and end up wondering ‘OK, but what’s it for?’

Well, one way to demonsrate its practical use is by introducing the world of the lambda() function in Excel: it allows you to formally define new Excel functions.  I immediately spotted some  uses  for this.

Descriptive statistics for sub-populations

Excel provides all the most common and useful statistical calculations as basic functions such as count(), average(), var(), stdev() and for some functions there are conditional versions countif(), averageif(),which allow for subsetting your data.  So, it might be that you have two columns of data, the first is some interesting measure (temperature? height? resting bpm?) and the second some characteristic such as ethnicity or gender.

It might be that you want to know what is the average resting heart rate (for example) of the male participants in  your study.  You can do this using averageif(): =averageif(B1:B100, “male”, C1:C100).  Assuming that the sex data are in the range B1:B100 and the resting heart rate data are in C1:100.  In fact, Excel has an averageifs() to help with cases with multiple selection criteria.  So that’s good for data analysts using Excel because it’s a common analytical approach.

However, this only works for averageif(), countif(), maxif(), minif() – there is no conditional var(), stdev(), skew() or kurt().  Well, for variance and standard deviation, we can construct a pivot table to get the subpopulation analysis we want and that’s great.  But occasionally (probably not that often) we want to calculate the skew in resting heart rate for all male participants, and maybe even the kurtosis.  And here, after that long lead in, is where the lambda() function proves it’s worth.  First, lets look at how we would caculate the kurtosis in some measure as a function of gender, using a combination of built-in Excel formulas: Given a data table where column A is the categorical (eg gender) variable and B is the measure of interest:

1 A B
2 1 56
3 1 62
4 1 48
5 1 58
6 1 58
7 1 55
8 1 42
9 1 54
10 1 47
11 2 52
12 2 59
13 2 56
14 2 45
15 2 63
16 2 52
17 2 44

The formula we want is kurt(if(A1:A17=1,B1:B17))

So there are three parameters, the first is the range to which we apply the second (the selection criterion value (here, 1 = male)), the third is the range from which this conditional statement will select data.  I tested this formula and it works fine, it’s just a bit clunky to use compared to the built-in averageif() etc.  So, I decided to reconstruct it using the lambda() function, to produce my own kurtif() function.  In an empty cell, we put the lambda expression:

=LAMBDA(a,n,b, KURT(IF(a = n, b)))

with the three parameters represented (arbitrarily) by a, n and b.  A moment’s reflection and we see the relation between the lambda expression and our previous Excel formula.  From a practical point of view, the lambda expression tells us where in the calculation to plug in the values a, n and b to get our result.

When you enter this expression in a blank cell and hit return you will see the warning message

#CALC!

 

which is Excel recognising that the cell contains a lambda expression.  The next step is to name the new function kurtif().  Copy the new lambda expression to the clipboard (highlight and control-c) then from the formula tab on the ribbon select and open the name manager.  In the dialog that this opens press the new button; give kurtif as the name for the new function and then paste the lambda expression from the clipboard then OK and close the name manager dialog.

Now you can carry out the computation as:

=kurtif(A1:17,1,B1:B17)

which is simpler and has the advantage of looking very like averageif() and the other, similar functions.

A missing significance calculation in Excel

Excel has a very simple to use function that will calculate a correlation coefficient (R) from two arrays of data: =correl(Array1,Array2).  It is a minor annoyance that this calculation doesn’t return a p value for R, allowing us to test the null hypothesis that the true correlation between Array1 and Array2 is zero.  So, let’s assume that we have calculated R for two arrays of data, and we know that n is just the count of one array, and the result is in cell H1 (for no particular reason).

Now, it’s a fairly simple trick to calculate a t value based on the correlation coefficient.  The formula is

and since we have just calculated r, it is simple to calculate t with the formula (and put the result in H3):

=H1*(sqrt(count(Array1)-1))/sqrt(1*-H1^2)

The last (and for now separate) step is to find the significance for this t score with the two-tailed t distribution function with n-2 degrees of freedom

=t.dist.2t(H3, count(Array1)-2)

And there we have it.  So, it would be useful to have a little helper function we could apply simply,  to calculate  from r.  Here is the lambda function code:

=LAMBDA(r,n,(r*SQRT(n-2))/SQRT(1-(r^2)))

and we can name it and use it as before.  I named mine ‘convertRtoT’ and used it like this with the correlation coefficient in H1 and n = 30:

convertRtoT(H1,30)

So now we know what the lambda() function is for.  The example is perhaps a little obscure, but the principle – that half the struggle with digital knowledge is knowing what it’s for – holds for far more mundane cases: I’ve been learning Power BI, and while there are simple answers – power BI is for visualisation – it’s only after I’ve been through a few hours of tutorials that I’m really understanding what it’s about.

Homework

If you want to check out the example calculations, please go ahead.  I checked them all in Excel (and for the t to r conversion, I checked my result against R), but it’s always possible to make a bluder when copying and pasting.  But finding and fixing errors is good practice.  If you want more practice, then I would suggest creating a function skewif() that works like kurtif(), taking two arrays ( a score and a selection criterion) a gives the skewness for the cases selected by the criterion.  Good luck.

(this blog post was supported by the music of Iannis Xenakis, “Six Chansons No 1, ca sent le musc”)

Starting up Stata with personalised options

By Jim R Tyson, on 4 March 2024

There are often things one can do to personalise and improve ones experience with software that involve some cusotmization, and that may be easy on your own machine, but less easy if your machine is managed by the organisation (in this case UCL).  My Laptop is managed by UCL (although I do have some elevated rights).

In an effort to improve my Stata workflow and output, I have several graphing options that I want to apply to all graphis I produce in Stata.  Typically, I want the Title left justified, in black, to take up the whole width of the graph (rather than the plot region) and to appear top left (at 11 o’clock).  The graph region colour should be white with no axis lines for x or y axes and with no fill-colour or border colour.

To simplify this I put these in a global macro ‘graph_opts‘ and add the macro to the start of any graphing command as $graph_opts . Anyone who knows how lazy and inconsistent I am, will already be guessing that while I may aspire to do this, I more often just hack away at my graphs until they look (more or less) as I want.  This is the worst kind of laziness because a little effort in setting this up would make for less work.

So I decided to investigate – could I automate this?  And I can.  At first, my heart sank slightly when I realised I would have to deal with the system paths on my managed machine, but it turned out to be very straight forward.  You can use sysdir on the stata console to find your stata program files folder.  When you navigate to this folder, use dir *.do to check for the presence of the file sysprofile.do -this means you are in the right directory.  Now, create a new do file called profile.do  Any code you add to this file is executed on Stata start up.  Knowing that I added these lines to my own profile.do (I used the Stata do file editor, but any plain text editor such as Windows Notepad would do as well):

// For -twoway- graphs
global graph_opts ///
  title(, justification(left) color(black) span pos(11)) ///
  graphregion(color(white)) ///
  xscale(noline) xtit(,placement(left) justification(left)) ///
  yscale(noline) ylab(,angle(0) nogrid) ///
  legend(region(lc(none) fc(none)))

// For -graph- graphs
global graph_opts_1 ///
  title(, justification(left) color(black) span pos(11)) ///
  graphregion(color(white)) ///
  yscale(noline) ylab(,angle(0) nogrid) ///
  legend(region(lc(none) fc(none)))

Of course you will want to change these to meet your own preferences – which may mean a deep dive into the Stata documentation.  It is however worth it given the time and effort you will save in hacking at graph code (or [shudder] gph files) to ensure that your graphs are all consistently presented in your reports.

Software for Success

By Jim R Tyson, on 2 February 2023

Student research successWhat does it take to succeed in a student research project, or any research project for that matter?

Well, there’s a whole lot of stuff that Digital Skills Development can’t help with, and anyway, you’re all really good at that stuff: the scholarship, the domain knowledge, the research skills.  But, there’s an awful lot that we can offer.

Getting on top of the choices that face you now and planning what tools you will use will allow you to work out what skills you need to acquire and how you are going to acquire them.  And beefing up your digital capability will not only improve your chances of research success, but will add to your capital in an area that employers rate among top desirable job skills.

When people plan research projects, they often forget to work out what software tools and techniques they will use, what skills those tools require, and where they are going to get those skills.  Often, we think it will all just be obvious and somehow it will come together.  Well, in a way it usually does, but with a little planning and foreknowledge, we can transform these decisions from afterthought to opportunity.

Digital Skills Development has six demonstration sessions to put you on the road to software success.  Each session introduces tools to tackle specific tasks for your research project.  We look at:

  1. writing: is there life beyond Word?  Is there any reason to go there?  How do I cope with fussy formatting requirements?
    Upcoming session: DSD: Software for success: Writing tools Fri 17-Feb-2023 12-1pm
  2. using survey tools: which is the best one for your research project?
    Upcoming session: DSD: Software for success: Survey tools Tues 21-Feb-2023 11-12noon
  3. winning with charts: which is the best chart type for your data?
    Upcoming session: DSD: Software for Success: Winning with charts Wed 15-Feb-2023 12-1pm
  4. data visualisation: what tools are available for visually presenting your data?
    Upcoming session: DSD: Software for success: Data visualisation Thu 16-Feb-2023 10-12 pm
  5. data analysis: is it worth learning to code, or can I cope by wrestling with my data in Excel?  I don’t do numbers, how can software help me?
    No upcoming sessions: DSD: Software for success: Data analysis & statistical tools join the interest list to be told about future dates.
  6. managing literature: imagine a world where your library and database searches link seamlessly  with your citation system and a database of annotated PDFs.  That world can be yours.
    No upcoming sessions: DSD: Software for success: Working with Bibliography and Citation Apps join the interest list to be told about future dates.

If you haven’t thought about what tools you will use for each of these tasks, or if you have thought about it but you’re just not sure what to do, these sessions are for you.  There will be demonstrations of different tools and approaches with guidance and discussion of what tool is best for the job.  If you think you know what software you are going to use, then we invite you to come along and  be challenged: there may be tools on offer that could smooth the way to a successful research project.

Now is the time to move beyond those good old coping strategies and tame the software beast.  These sessions will help you do it.

RStudio v1.4 – new stuff

By Jim R Tyson, on 16 June 2021

I am a massive fan of RStudio.  Not just for R development and data analysis.  I use RStudio a lot in writing learning materials, recently for R, but also for Pyton and Stata using literate programming techniques and the learnr package (yes, you can include Stata code in markdown documents with a little work!)

There are a whole bunch of (no doubt wonderful) things in this Preview release that I haven’t yet bothered to look at, but somethings have got my immediate attention.

The visual markdown editor

I have mixed feelings about this. I know that visual editing – that is, something partway towards WYSIWYG, a la Word – is appreciated by lots of people, but I loathe it. I took up LaTeX a long time ago to get away from Microsoft Word (and, not to boast, I am a very proficient Word user). But, even I found that 90 per cent of the time, LaTeX was too complicated for what I needed. Hoorah for Markdown.

RStudio actually provided my first introduction to Markdown and I revelled in it from the beginning, especially combined with Pandoc: one source many ouputs! At last the world was beginning to understand.  Write in one simple lightweight format and get HTML, PDF, DOCX and other formats automatically.  And of course it put literate programming within easy reach of all R programmers and learners. With the learnr package writing R study materials is a breeze.

But, still some people don’t like plain text editing. Well, the 1.4 Preview shows off the new visual editor. It’s not a complete WYSYWIG offer like Word, but it does show you a live close to end-result preview and has menus to formatting, layout, tables, images, citations. If you really don’t like typing text this may be just what you are looking for to push you that last step into literate data analysis with R and RMarkdown.

Inserting citations with Zotero

Yes, zotero users can now use the source editor to insert citations with point and click – just like Word users. There is no need to first export the references to a BibTeX file first – RStudio handles that for you. Using BibTeX is another thing that people have sometimes mentioned when talking about the difficulty of writing in Rmarkdown.

New Python functionality

And then, oh joy, the new python functionality. I find that very few people are aware that it’s a breeze to combine Python and R code using Rmarkdown documents, although it may take some effort to understand all the set-up requirements for python chunks at first: it took me 15 minutes the first time I tried to run import numpy as np!

Now, this new release adds tools for configuring python, conda and virtual environments. For me the real advance though is somewhat simpler: now you can see python data objects in the RStudio environment pane and view python dataframes in the normal way.

Rainbows!

The last of the new features I know I will use is the introduction of ‘rainbow’ parentheses. Nothing to with Pride month apparently, just adding colour coded bracketing to help you balance your parentheses.

Time to give R (and Python) with RStudio another look

If the user interface has put you off moving to R and RStudio, then now is definitely a time to have another look. Especially for Stata users, complexity and ease of use really aren’t a reason to prefer Stata any more and the move to R coding really isn’t that difficult.

Once more: Accessible documents from LaTeX

By Jim R Tyson, on 7 March 2021

This is blog outlines some changes to the advice I gave previously on how to produce accessible documents using LaTeX. The changes concern the production of PDFs for use digitally, and conversion from LaTeX to HTML.

ISD general guidance on producing accessible materials on its Accessibility Fundamentals pages still holds.

In that previous blog entry, I included as an aim to ‘get as close as possible to producing ‘tagged PDF’ or PDF/UA documents using LaTeX’. This is not currently doable. I replace it with the aim to ‘get as close as possible to producing reasonable accessible documents using LaTeX’. Given the long standing difficulties meeting accessibility requirements from LaTeX source in PDF the advice must be to produce HTML documents when accessibility is required.

In particular, I do not now recommend using the LaTeX package accessibility.sty to create tagged documents. Development of the package has been halted and the author no longer supports its use. If you are interested in the effort to produce tagged PDF from LaTeX source, then you should read this article from the TeX Usergroup newsletter, Tugboat. The author of the package mentioned in the article himself believes it is not yet ready for use in production. But, he writes, “with the tagpdf package it is already possible for adventurous users with a bit of knowledge in TEX programming to tag quite large documents”. I am not adventurous or knowledgeable enough to rise to that challenge.

With respect to mathematical content, I had previously recommended Pandoc which can convert to HTML with machine readable mathematical content. I have since looked more closely at this issue and I now prefer to use tex4ht which has some useful features, including the ability to include the LaTeX code for mathematical content in a page. It is also the package recommended by TUG. There is good documentation on the TUG website. However, tex4ht does not produce Microsoft Word documents from LaTeX, and so Pandoc is still the best tool if that is required. And Pandoc does still do the job if you don’t need extra features.

In the light of these and other issues, I have made the switch completely to using RMarkdown. This allows me to mix lightweight mark up, LaTeX mathematical code and HTML in one document. Using HTML to insert graphics allows me to include alt text which is not otherwise possible.

There is still to my knowledge no solution for presentations made with Beamer or similar packages. Whereas I previously suggested using the package pdfcomment to annotate images on slides made with LaTeX, I do not now since I have discovered that the comments are not well understood by screenreader software.

The current situation means that we can do very little to support colleagues with accessibility issues in LaTeX workflows and especially with respect to presentations and providing alternative text for images, beyond the advice we have already provided.

Accessible documents from LaTeX

By Jim R Tyson, on 22 July 2020

Some advice and information in this blog is superceded by this post.

Note: in this piece, many of the specific LaTeX examples are taken from package vignettes or documentation.  All packages mentioned have CTAN links.

ISD has published good general guidance on producing accessible materials on its Accessibility Fundamentals pages, with links to a host of useful resources.

While it is relatively straightforward to follow the guidelines and meet the standards set for users of Microsoft Office and for web developers, it is still not clear to many of us LaTeX (and I include markdown) users what we should and can (and maybe cannot) do.

I want to make a few points about the what, and then outline a few essential hows.

There are three aims here:

  1. get as close as possible to producing ‘tagged PDF’ or PDF/UA documents using LaTeX;
  2. produce HTML from LaTeX for screenreader software;
  3. produce Microsoft Word from LaTeX for consumers who need to modify a document themselves for accessibility purposes.

These aims are met by using the LaTeX package accessibility and the open source document conversion utility pandoc.

The package accessibility is found on this CTAN page.  To produced structured, tagged PDF include

\usepackage[[tagged, highstructure]]{accessibility}

in the document preamble.

I am not adressing using Beamer here: the same general considerations apply and ordinary LaTeX techniques can be used. Beamer cannot currently be used with the accessibility package mentioned above. To add alternative text to a Beamer presentation you can use the package pdfcomment from this CTAN archive page. At the moment, I can offer no good solution for dealing with existing Beamer presentations, but ISD is looking into what might be doable, including working on the compatibility of the accessibility package and Beamer. I am now using markdown to produce html presentations rather than PDF.

Before I get into specifics, I want to emphasize that where possible, we should try to provide people with documents that they can modify to suit their needs and that therefore in many cases a Microsoft Office or HTML document is more usefully accessible than a PDF. Pandoc makes conversion from LaTeX to HTML very simple and you can use LaTeX mathematics in your documents to be converted to HTML with either MathML or Mathjax options.

Text

LaTeX users producing text documents are probably already covering the need for clearly structured text with headings by using the \section{} family of commands. It is worth considering use of the package hyperref.sty so that you can create clickable cross-references in your documents and a clickable table of contents. Screenreader users will find hyperlinked sections and a clickable table of contents very useful.  Hyperref can also take care of the language metadata of your PDF. Programs that access your PDF should be able to determine the language (or main language) of the document.  One way to do this is to include this hyperref command in the preamble:

\usepackage[pdflang={en-GB}]{hyperref}

Using a sans serif font, like computer modern sans serif will help make your document easier to read. LaTeX users who are typesetting mathematics should note that what research has been published has not – to my knowledge – addressed the issue of font choice for mathematics (or logic, or linguistics, chemistry and so on). Just as important is to use a good size for text. I try to use 12pt body text and 14pt and 16pt headers. The code to change font is as follows and should be in your preamble

\renewcommand{\familydefault}{\sfdefault}

You can use the package setspace to change linespacing in a document to 1.5, with the command

\onehalfspace

Many readers benefit from slightly wider than normal margins.  The default margins for LaTeX documents are already quite generous leading to a line-scan length that is comfortable for most readers.  If you do wish to change the margins, you should use the package geometry from this CTAN archive. There are a number of ways to use geometry.  You can use it with options in your preamble like this:

\usepackage[margin=1.5in]{geometry}

or

\usepackage[total={6.5in,8.75in},top=1.2in, left=0.9in, includefoot]{geometry}

Or you can use the command \geometry{} in your preamble like this:

\geometry{a4paper, margin=2in}

Use bold \textbf{} for emphasis and avoid italic. If you wish to modify an existing document that uses \emph{} (which we have conscientiously preferred for decades) you can include the following code

\makeatletter
 \DeclareRobustCommand{\em}{%
    \@nomath\em \if b\expandafter\@car\f@series\@nil
    \normalfont \else \bfseries \fi}
\makeatother

in the preamble of your document to change the default appearance of emphasis.

You can use the package xcolor available from this TeX archive page and command \pagecolor{Ivory} (for example) to change the background colour of a PDF for electronic use.

Hyperlinks

Make your hyperlinks clearly distinguishable from text; make them meaningful (don’t use the URL itself or text like ‘click here’; make sure that any colour contrast complies with the guidelines on this WCAG colour contrast guidance page.) If your document is likely to be disseminated in print form, then it is useful to add a short and easily typable URL for print format readers, eg https://tinyurl.com/contrastguidance.

To get a properly presented URL use code like this:

\href{http://www.ucl.ac.uk/isd}{ISD home page.}

To control the colours used with hyperlinks you can include something like the following in your preamble after calling the hyperref package:

\hypersetup{
    colorlinks=true,
    linkcolor=blue,
    filecolor=magenta,
    urlcolor=cyan
}

Images and tables

The package accessibility mentioned earlier, provides a LaTeX command \alt{} which can be used to add alternative text in any float environment.  Unfortunately \alt{} from package accessibility cannot be used with the Beamer presenation package.

While good captioning for images and tables will enhance accessibility, where necessary alternative text should describe not just what data is in a table for example EU GDP by Country 2010 to 2018 but what its relevance is : EU GDP by Country 2018 to 2020 showing the trend of reduced growth over time. The reader may choose to skip the data table sometimes if the alternative text is clear enough. Also be sure to use \lable{} and \ref{} to enable screenreader software to quickly locate relevant data or images.

If you use images which are essentially decorative, then use \alt{} to let the screenreader software know that.

Mathematical content

LaTeX source code including mathematical content can produce screenreader friendly HTML via pandoc. The best result with most modern browsers (including Edge, Safari, Chrome and Firefox) is achieved using the MathJax option on conversion. The instructions to do this are on the pandoc demo page. In all examples so far tested (and we will test more, and more fully) the mathematical content was read semantically rather than typographically so that a fraction is read “fraction with denominator X and numerator y” (with some minor variation, ie sometimes reading “ratio” rather than “fraction”).

Links

Matthew Towers has written a useful page about accessibility and pdf files, although it has been overtaken by events with respect to useful LaTeX packages.

The TeX User Group (tug) web page on PDF accessibility and PDF standards.