Single Project Launch Script - main.van.run#
uv run -m main.van.run
Upon launching this script, you will be prompted to select what operation you would like to perform, and which
project you wish to run it on. These operations are listed in mode_def.yaml, and typing
help module_name will provide a short description of the operation.
walk_scrape#
This operation will launch bots for each VAN committee associated with the project in proj_conf.yaml.
These bots will iterate through every list for today's date in the Minivan Activity Report in that committee. If the project
is not configured to force download every list in the committee, it will only download lists associated with the project in the
table van.turf in BigQuery. It is advised to log in to the VAN account being used to operate the bot before running this
operation to clear out any notifications or pop ups in the home screen or Minivan Activity Report as these will cause the script
to crash.
When the bots launch, they will use existing chrome profiles from the chrome_profiles folder if available, and make new ones as necessary. If the profile in use is not already logged into VAN, it will use the login information in the van_token.yaml file to attempt to log in. Once logged in the profile will remain logged in for up to 30 days. Because we currently don't have a way to differentiate between the texts sent for the 2FA for different VAN committees, it is important to stagger launching the scripts if you know you will need to log in so the script uses the correct text for the correct login. The script will automatically stagger logins for projects with multiple instances, but if you are launching multiple projects in parallel, be sure that no scripts are logging in at the same time.
The list files downloaded from VAN will appear in the /main/van/data_files/[project code]/[VAN instance name] folder, and they
will be renamed in the format [state]_[VAN instance name]_[office]_[paycom ID]_[first name]_[last name]_[ActionID name].xls.
If the script could not find a Paycom ID to match the ActionID, the paycom ID will be filled with the ActionID until corrected, and
if a canvasser synced multiple lists that day, there will be a suffix on the file name before the file extension --(x) where x is
a number.
If you need to download lists manually for any reason (usually because a project is selectively downloading lists based on
van.turfand it didn't find a match) you can place them in the folder above with that naming convention above before running the script and it will ingest that file with the rest of them. If there are underscores in the ActionID they will need to be removed; the file name must contain exactly the number of underscores above. If there are multiple lists from the same canvassers, please be sure to include the--(x)suffix. All fields are case insensitive.If the script ends early while downloading lists, it will compile the list files it has downloaded into a "part file", which will be saved in
/main/van/part_files/and it will archive the list files. When you relaunch the script, it will check this folder for part files and skip past any lists it has already downloaded when it checks the Minivan Activity Report. It's good practice to make sure at the beginning of the night that this folder does not have part files from previous runs in it. The script will normally archive any part files in this folder at the end of a successful run, and the script prunes duplicates from the table at the end of the process, but it introduces a possibility for error.
Once all lists have been downloaded, the list files will be ingested and any corrections to Paycom IDs made on previous runs will be
applied to the associated ActionIDs. At this point the script will check all of the Paycom IDs in the table and cross reference them
with data from Paycom. Any IDs that do not match an ID associated with the project in Paycom will be flagged invalid and any IDs
where the first and last names in Paycom don't match what was found from the ActionID will be flagged warning.
After the scan is complete, the script will generate a table of all detected discrepancies and display them in the terminal. You will
be prompted to enter whether you need to make corrections to each type of error present (invalids and warnings). Invalid IDs almost
always require correction, warnings must be verified to make sure, for example, the canvasser does not have a Paycom ID for another
canvasser in their ActionID. If you confirm that you need to make corrections to a type of discrepancy, you will be asked if you have
made changes to the paycom.employee table (if, for instance, you have made a temporary change to a canvasser's project, you would want
to respond to this with yes). The script will then iterate through every discrepancy of that type and prompt you to enter the correct
ID for that canvasser, if that ID does not require correction or you are unable to mmake a correction (due to the canvasser's ID being
associated with another project or some other reason), you can enter NA to skip it. Once you have cycled through all of the
discrepancies, the table will be scanned for discrepancies again and if any remain, you will go through the same cycle of prompts.
If you answer no to all prompts the script will proceed to upload the table, remove any doors already present in van.doors from the
table, and generate backup files.
If the script does not complete its run and archive all of the backup files, they will be located in the same folder as part files. If the script successfully archives its files, they can be found in
/main/van/logs/file_archives/. The script generates two backups during its run, one before duplicates fromvan.doorsare removed and one after. The number at the end of the file name is a timestamp, and so the one with the later timestamp will have duplicates removed.
Output from the process can be found in the van_raw.doors_[project code]_live table in BigQuery
walk_backup#
This operation is identical to walk_scrape, except that it does not download any data from VAN, and instead ingests a backup file
from /main/van/part_files/ for the selected project before scanning Paycom IDs and prompting you for corrections.
qc#
This operation pulls data from the van_raw.doors_[project code]_live table generated by the walk_scrape or walk_backup operation,
and compiles it into list-level data. It will also generate metrics based on the list data, as well as performing an intial qc process
based on the qc standards specified for the project. Output can be found in the van_raw.lists_qc_[project code]_live table in Bigquery.
It is good practice to review this data for errors before continuing as it becomes more difficult to alter once it is merged
into van.lists_qc.
merge#
This operation merges the data from van_raw.doors_[project code]_live and van_raw.lists_qc_[project code]_live into the van.doors
and van.lists_qc tables respectively. Once merged it will also update tables containing data derived from QC data.
drop#
This operation drops all temporary BigQuery tables created for the project from the van_raw dataset, and generates a backup of the
van.lists_qc table in the van_snapshots dataset.
Compound Operations#
These operations combine the other functions for ease of use:
- full_scrape executes walk_scrape and qc
- full_backup executes walk_backup and qc
- merge_d executes merge and drop