jbclements / csu-fad-parser

Parses FAD reports generated as part of CSU reporting into a usable database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This library contains code that parses the FAD reports ("Faculty Assignment
by Department") generated by
the California State University systems. These reports are sent from
each CSU campus to the Chancellor's office, and they include information
on each class; who taught it, how many students took it, when and where
it was taught, and many many esoteric details (the a-ccu, the classification,
the D-WTU, the I-WTU, and much much more!). It also includes details on
which department each faculty member is assigned to, and how their time
is officially divided between instructional and administrative tasks.

Parsing this report is moderately horrible. Parts of the report appear
to be assembled by hand, which is never good for consistency. More
significantly, it appears that there was a major format change between
the 2142 and 2144 quarters; much of the parser is the same between
the two reports, but there are also many many differences. I have
*no idea* whether this format change is specific to the Cal Poly
campus, where all of my reports come from.

This parser is not based on *any* formal specification of the format
of the report. Instead, it's built from many many hours of visual inspection
and analysis of these reports. It's written fairly conservatively, in
the sense that when the parser sees something unexpected, it generally
chokes, on the principle that surprising is bad, and that the user should
probably intervene at that point.

The best use for this software is to generate a database using it, and
that's what I've done. Currently, this software generates text files
that can be imported by PostgreSQL. It appears to me that PostgreSQL is
a much better database than MySQL.

This software is incomplete; it doesn't currently build tables for the
sequences inside of each section, and it doesn't build a table for the
"special" assignments. Maybe I'll write that, and maybe I won't. More 
generally, I wrote this code so that I'd be able to understand and use
it later, but there are no guarantees that you'll be able to use it,
unless you == me.

It also has no documentation whatsoever.

It is, however, a package, and it probably represents the most complete
specification of the FAD outside of the program (COBOL? who knows) that
actually generates the FAD.

History for the development of this code preceding this date is only
available as part of a private repo.

This parser does not contain any actual FAD data; it just knows how to
read and parse them.

-- JBC, 2016-12-10

Note for myself. Start by opening update-db.rkt, follow the instructions.

-- JBC, 2020-08-05

Okay, I can do better than that.

- Make sure you have the MS Word version of the FAD, not the PDF.
- Use MS Word "Save As" to export as text. Choose "LF Only" as line
  ending. I have no idea why it's so hard to get standard line endings
  in MS word, I swear they're just trying to make life hard.
  Note: why isn't the FAD just available as a text file? I have no idea.
  Well, my assumption is that it *is* a text file, but some standard
  pipeline converts it to MS Word and a pdf along the way. It turns out
  that extracting text from a PDF is hard and unreliable, exporting
  it from MS Word is much more reliable.
- move resulting text file to (e.g.) ~/onedrive/datasets/FAD/fad-2218.txt
- open ./fad-parser/update-db.rkt, follow instructions

Also, note that installing this package currently triggers some compilation
warnings; these don't appear to affect the operation of the package, it looks
like they may be breakage that occurred during some refactoring:

raco setup: error: during making for <pkgs>/csu-fad-parser/fad-parser/scripts
raco setup:   csu-fad-parser/fad-parser/scripts/spreadsheet-tmp.rkt:16:21: Instructor-header: unbound identifier
raco setup:     in: Instructor-header
raco setup:     compiling: <pkgs>/csu-fad-parser/fad-parser/scripts/spreadsheet-tmp.rkt
raco setup: error: during making for <pkgs>/csu-fad-parser/fad-parser
raco setup:   csu-fad-parser/fad-parser/export-to-spreadsheet.rkt:168:24: Faculty-Offering-subject: unbound identifier
raco setup:     in: Faculty-Offering-subject
raco setup:     compiling: <pkgs>/csu-fad-parser/fad-parser/export-to-spreadsheet.rkt



-- JBC, 2021-11-29

About

Parses FAD reports generated as part of CSU reporting into a usable database

License:GNU General Public License v3.0


Languages

Language:Racket 100.0%